From frankose at ifi.uio.no Fri Jun 1 00:34:27 2007 From: frankose at ifi.uio.no (Frank Olaf Sem-Jacobsen) Date: Fri, 01 Jun 2007 09:34:27 +0200 Subject: [ofa-general] osm_node_get_physp_ptr, port numbers Message-ID: <465FCC03.5080105@ifi.uio.no> Hi, I'm just starting to get my bearings in the opensm code, and there's one thing I have not been able to figure out yet. What is the relationship between the port number parameter given to osm_node_get_physp_ptr and the actual port number of the switch? Can I assume that sending for instance the port number 6 to osm_node_get_physp_ptr will give me the ports I see as number 6 from without the switch? Any clarifications are greatly appreciated. -- Frank Olaf Sem-Jacobsen From vlad at lists.openfabrics.org Fri Jun 1 02:40:33 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Fri, 1 Jun 2007 02:40:33 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070601-0200 daily build status Message-ID: <20070601094033.97E7DE60861@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.14 Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.15 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.13 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.15 Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From halr at voltaire.com Fri Jun 1 04:00:41 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Jun 2007 07:00:41 -0400 Subject: [ofa-general] osm_node_get_physp_ptr, port numbers In-Reply-To: <465FCC03.5080105@ifi.uio.no> References: <465FCC03.5080105@ifi.uio.no> Message-ID: <1180695638.7116.237109.camel@hal.voltaire.com> Hi Frank, On Fri, 2007-06-01 at 03:34, Frank Olaf Sem-Jacobsen wrote: > Hi, > > I'm just starting to get my bearings in the opensm code, and there's one > thing I have not been able to figure out yet. What is the relationship > between the port number parameter given to osm_node_get_physp_ptr and > the actual port number of the switch? Can I assume that sending for > instance the port number 6 to osm_node_get_physp_ptr will give me the > ports I see as number 6 from without the switch? > > Any clarifications are greatly appreciated. The port numbers are relative to the individual switch chips as each switch chip is a separate IB switch node. If you have a switch box with a single switch chip (generally 24 or 8 port switches), then the mapping is 1:1 between the two. This is not the case with chassis based switches which have numerous switch chips on separate board where some ports go external and others are internal. There is a grouping function in ibnetdiscover which shows the external ports for the some chassis based switches but this mapping is not yet supported in OpenSM. What switch(es) are you using ? -- Hal From halr at voltaire.com Fri Jun 1 08:14:55 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Jun 2007 11:14:55 -0400 Subject: [ofa-general] Re: [PATCH] opensm/sminfo: mutex cleanup fix In-Reply-To: <20070531223341.GA23029@sashak.voltaire.com> References: <20070531204524.GX13193@sashak.voltaire.com> <20070531223341.GA23029@sashak.voltaire.com> Message-ID: <1180710889.7116.253133.camel@hal.voltaire.com> On Thu, 2007-05-31 at 18:33, Sasha Khapyorsky wrote: > This fixes mutex cleanups in SMInfo processor. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From swise at opengridcomputing.com Fri Jun 1 09:00:46 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 01 Jun 2007 11:00:46 -0500 Subject: [ofa-general] problem with mvapich2 over iwarp Message-ID: <466042AE.4000006@opengridcomputing.com> Sundeep/Sean, I'm helping a customer who is trying to run mvapich2 over chelsio's rnic. They're running a simple program that does an mpi init, 1000 barriers, then a finalize. They're using ofed-1.2-rc3, mpiexec-0.82, and mvapich2-0.9.8-p2 (not the mvapich2 from the ofed kit). Also they aren't using mpd to start up stuff. They're using pmi I guess (I'm not sure what pmi is, but the mpiexec has -comm=pmi. BTW: I can run the same program fine on my 8 node cluster using mpd and the ofa mvapich2 code. On their cluster a 4 node/4 process job hangs in finalize almost always. When it hangs, one process is always stuck in rdma_destroy_id(). Here's the stack: (gdb) bt #0 0x0000003c7cf0ae2b in __lll_mutex_lock_wait () from /lib64/tls/libpthread.so.0 #1 0x000000000068db20 in ?? () #2 0x0000000060040a0a in ?? () #3 0x0000003c7cf08800 in pthread_cond_destroy@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 #4 0x0000002a9579a09c in ucma_destroy_kern_id (fd=0, handle=6871424) at src/cma.c:403 #5 0x0000002a9579a163 in rdma_destroy_id (id=0x68d980) at src/cma.c:425 #6 0x0000000000423ef9 in ib_finalize_rdma_cm () #7 0x00000000004183f6 in MPIDI_CH3I_CM_Finalize () #8 0x000000000044b03b in MPIDI_CH3_Finalize () #9 0x000000000043169e in MPID_Finalize () #10 0x000000000040c3ef in PMPI_Finalize () #11 0x0000000000403af4 in main () (gdb) I'm not sure I belive this stack trace fully, because ucm_destroy_kern_id() doesn't call pthread_cond_destroy(). However rdma_destroy_id() does. So I'm thinking that ucma_destroy_id() has already been executed and rdma_destroy_id() is freeing the cm_id and we get stuck in pthread_cond_destroy() destroying the pthread condition object. I'm wondering if ya'll have ever seen this kind of hang? I can kill the process and it exits, so I don't think we're stuck down in the kernel IWCM or anything. Any thoughts? Thanks, Steve. From sean.hefty at intel.com Fri Jun 1 09:17:23 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 1 Jun 2007 09:17:23 -0700 Subject: [ofa-general] RE: problem with mvapich2 over iwarp In-Reply-To: <466042AE.4000006@opengridcomputing.com> Message-ID: <000401c7a468$56837bc0$ff0da8c0@amr.corp.intel.com> >(gdb) bt >#0 0x0000003c7cf0ae2b in __lll_mutex_lock_wait () from >/lib64/tls/libpthread.so.0 >#1 0x000000000068db20 in ?? () >#2 0x0000000060040a0a in ?? () >#3 0x0000003c7cf08800 in pthread_cond_destroy@@GLIBC_2.3.2 () from >/lib64/tls/libpthread.so.0 >#4 0x0000002a9579a09c in ucma_destroy_kern_id (fd=0, handle=6871424) at >src/cma.c:403 >#5 0x0000002a9579a163 in rdma_destroy_id (id=0x68d980) at src/cma.c:425 >#6 0x0000000000423ef9 in ib_finalize_rdma_cm () >#7 0x00000000004183f6 in MPIDI_CH3I_CM_Finalize () >#8 0x000000000044b03b in MPIDI_CH3_Finalize () >#9 0x000000000043169e in MPID_Finalize () >#10 0x000000000040c3ef in PMPI_Finalize () >#11 0x0000000000403af4 in main () >(gdb) > >I'm not sure I belive this stack trace fully, because >ucm_destroy_kern_id() doesn't call pthread_cond_destroy(). However >rdma_destroy_id() does. So I'm thinking that ucma_destroy_id() has >already been executed and rdma_destroy_id() is freeing the cm_id and we >get stuck in pthread_cond_destroy() destroying the pthread condition object. > >I'm wondering if ya'll have ever seen this kind of hang? I can kill the > process and it exits, so I don't think we're stuck down in the >kernel IWCM or anything. > >Any thoughts? I haven't seen any hangs like this, but I will perform a code inspection to see if any issues can be found. - Sean From narravul at cse.ohio-state.edu Fri Jun 1 10:05:32 2007 From: narravul at cse.ohio-state.edu (Sundeep Narravula) Date: Fri, 1 Jun 2007 13:05:32 -0400 (EDT) Subject: [ofa-general] Re: problem with mvapich2 over iwarp In-Reply-To: <466042AE.4000006@opengridcomputing.com> Message-ID: Steve, We have not seen this hang before. Not sure what is happening at this point. I will try to see through the code for this behavior. btw, mvapich2-0.9.8-p2 and the ofa mvapich2 code are identical at this point. --Sundeep. On Fri, 1 Jun 2007, Steve Wise wrote: > Sundeep/Sean, > > I'm helping a customer who is trying to run mvapich2 over chelsio's > rnic. They're running a simple program that does an mpi init, 1000 > barriers, then a finalize. They're using ofed-1.2-rc3, mpiexec-0.82, > and mvapich2-0.9.8-p2 (not the mvapich2 from the ofed kit). Also they > aren't using mpd to start up stuff. They're using pmi I guess (I'm not > sure what pmi is, but the mpiexec has -comm=pmi. BTW: I can run the > same program fine on my 8 node cluster using mpd and the ofa mvapich2 code. > > On their cluster a 4 node/4 process job hangs in finalize almost always. > When it hangs, one process is always stuck in rdma_destroy_id(). > > Here's the stack: > > (gdb) bt > #0 0x0000003c7cf0ae2b in __lll_mutex_lock_wait () from > /lib64/tls/libpthread.so.0 > #1 0x000000000068db20 in ?? () > #2 0x0000000060040a0a in ?? () > #3 0x0000003c7cf08800 in pthread_cond_destroy@@GLIBC_2.3.2 () from > /lib64/tls/libpthread.so.0 > #4 0x0000002a9579a09c in ucma_destroy_kern_id (fd=0, handle=6871424) at > src/cma.c:403 > #5 0x0000002a9579a163 in rdma_destroy_id (id=0x68d980) at src/cma.c:425 > #6 0x0000000000423ef9 in ib_finalize_rdma_cm () > #7 0x00000000004183f6 in MPIDI_CH3I_CM_Finalize () > #8 0x000000000044b03b in MPIDI_CH3_Finalize () > #9 0x000000000043169e in MPID_Finalize () > #10 0x000000000040c3ef in PMPI_Finalize () > #11 0x0000000000403af4 in main () > (gdb) > > I'm not sure I belive this stack trace fully, because > ucm_destroy_kern_id() doesn't call pthread_cond_destroy(). However > rdma_destroy_id() does. So I'm thinking that ucma_destroy_id() has > already been executed and rdma_destroy_id() is freeing the cm_id and we > get stuck in pthread_cond_destroy() destroying the pthread condition object. > > I'm wondering if ya'll have ever seen this kind of hang? I can kill the > process and it exits, so I don't think we're stuck down in the > kernel IWCM or anything. > > Any thoughts? > > Thanks, > > Steve. > From swise at opengridcomputing.com Fri Jun 1 10:29:49 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 01 Jun 2007 12:29:49 -0500 Subject: [ofa-general] Re: problem with mvapich2 over iwarp In-Reply-To: <000401c7a468$56837bc0$ff0da8c0@amr.corp.intel.com> References: <000401c7a468$56837bc0$ff0da8c0@amr.corp.intel.com> Message-ID: <4660578D.2030306@opengridcomputing.com> Sean Hefty wrote: >> (gdb) bt >> #0 0x0000003c7cf0ae2b in __lll_mutex_lock_wait () from >> /lib64/tls/libpthread.so.0 >> #1 0x000000000068db20 in ?? () >> #2 0x0000000060040a0a in ?? () >> #3 0x0000003c7cf08800 in pthread_cond_destroy@@GLIBC_2.3.2 () from >> /lib64/tls/libpthread.so.0 >> #4 0x0000002a9579a09c in ucma_destroy_kern_id (fd=0, handle=6871424) at >> src/cma.c:403 >> #5 0x0000002a9579a163 in rdma_destroy_id (id=0x68d980) at src/cma.c:425 >> #6 0x0000000000423ef9 in ib_finalize_rdma_cm () >> #7 0x00000000004183f6 in MPIDI_CH3I_CM_Finalize () >> #8 0x000000000044b03b in MPIDI_CH3_Finalize () >> #9 0x000000000043169e in MPID_Finalize () >> #10 0x000000000040c3ef in PMPI_Finalize () >> #11 0x0000000000403af4 in main () >> (gdb) >> >> I'm not sure I belive this stack trace fully, because >> ucm_destroy_kern_id() doesn't call pthread_cond_destroy(). However >> rdma_destroy_id() does. So I'm thinking that ucma_destroy_id() has >> already been executed and rdma_destroy_id() is freeing the cm_id and we >> get stuck in pthread_cond_destroy() destroying the pthread condition object. >> >> I'm wondering if ya'll have ever seen this kind of hang? I can kill the >> process and it exits, so I don't think we're stuck down in the >> kernel IWCM or anything. >> >> Any thoughts? > > I haven't seen any hangs like this, but I will perform a code inspection to see > if any issues can be found. > > - Sean Thanks, Perhaps someone is freeing the cond object twice. That could cause a hang... From hanafim.ctr at asc.hpc.mil Fri Jun 1 12:38:00 2007 From: hanafim.ctr at asc.hpc.mil (MAHMOUD HANAFI) Date: Fri, 01 Jun 2007 15:38:00 -0400 Subject: [ofa-general] Need OFED1.1 ib_srp max_hw_sectors_kb help! In-Reply-To: <004401c7a482$42a5bbd0$7f01a8c0@ddnereo.datadirectnet.com> References: <004401c7a482$42a5bbd0$7f01a8c0@ddnereo.datadirectnet.com> Message-ID: <46607598.9020002@asc.hpc.mil> Some test data for OFED1.1. I am going to run OFED1.2 and IBGOLD. * 1 Lun per Tier(8+1) with block size of 4096KB * Using xdd writing to 4 luns. * Using 1 IB host port to 1 IB DDN Port. The 700MB/sec appear to be a host limit. Because using 2 IB Host ports and 2 DDN IB ports IO still max out at 700MB/sec Note that write level off at 512KB Request Size. I did verified IO Request Lengths on DDN = WRITE DirectIO== IO Size Throughput KB MB/sec --- --------- 16 46.711 32 93.293 64 185.098 128 348.522 256 547.118 512 671.227 1024 697.149 2048 680.645 4096 692.067 8192 710.564 = READ DirectIO== IO Size Throughput KB MB/sec --- --------- 16 54.856 32 104.526 64 191.592 128 312.586 256 462.460 512 471.877 1024 509.806 2048 535.050 4096 543.130 8192 565.176 Martin W. Schlining III wrote: > Wonder why it halves the value? I'll have to try that myself. > > If you are using OFED 1.2, you can also load the ib_srp module with an > option to increase the size of the scatter gather lists. The default size is > 12 which is way too small. I don't think this option exists for OFED 1.1. In > 1.1, you have to modify ib_srp.h and recompile the module ib_srp.o. > > modprobe ib_srp srp_sg_tablesize=256 > > Fiber channel drivers also set the max_sect field in their drivers to 65535 > (0xffff) to eliminate any restrictions. Perhaps the same value for SRP will > help? > > - Martin > > -----Original Message----- > From: MAHMOUD HANAFI [mailto:hanafim.ctr at asc.hpc.mil] > Sent: Friday, June 01, 2007 2:56 PM > To: Martin W. Schlining III > Cc: 'MAHMOUD HANAFI' > Subject: Re: [ofa-general] Need OFED1.1 ib_srp max_hw_sectors_kb help! > > I didn't get a answer on the email list but I figured it out. > You can pass "max_sect=xxx" option during initialization of the srp traget. > If you have upgraded to > OFED1.2 you can set a line in /etc/srp_daemon.conf "A max_sect=2096" (you > will need the "A") and then run the srp_daemon.sh. Only thing I have noticed > is what ever you set in the max_sect value it always takes 1/2 of the value. > > > > -Mahmoud > > Martin W. Schlining III wrote: >> Hello, >> >> Did you ever get an answer to this? I'm a bit curious myself what the >> response was. >> >> Regards, >> Martin Schlining >> Senior Software Engineer >> mschlining at datadirectnet.com >> >> -----Original Message----- >> From: general-bounces at lists.openfabrics.org >> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of MAHMOUD >> HANAFI >> Sent: Tuesday, May 29, 2007 6:58 PM >> To: general at lists.openfabrics.org >> Subject: [ofa-general] Need OFED1.1 ib_srp max_hw_sectors_kb help! >> >> All, >> >> I am using OFED1.1 with CISCO HCA/switch and DDN Storage. I am able to >> load and perform IO to the DDN via srp driver. But, the >> max_hw_sectors_kb for the device is getting set to 64kb. Any one else >> seen this issue? Same host and storage with fiber channel doesn't have >> this problem. It set max_hw_sectors_kb correctly to 4096KB. >> >> Thanks, >> -- >> Mahmoud Hanafi >> Senior System Administrator >> ASC/MSRC >> www.asc.hpc.mil >> 2435 5th Street >> WPAFB, OHIO 45433 >> (937) 255-1536 >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> >> > > -- > Mahmoud Hanafi > Senior System Administrator > ASC/MSRC > www.asc.hpc.mil > 2435 5th Street > WPAFB, OHIO 45433 > (937) 255-1536 > > -- Mahmoud Hanafi Senior System Administrator ASC/MSRC www.asc.hpc.mil 2435 5th Street WPAFB, OHIO 45433 (937) 255-1536 From vlad at lists.openfabrics.org Sat Jun 2 02:39:52 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Sat, 2 Jun 2007 02:39:52 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070602-0200 daily build status Message-ID: <20070602093952.475A8E6085F@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.13 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.12 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.13 Passed on x86_64 with linux-2.6.13 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.17 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.21.1 Passed on powerpc with linux-2.6.14 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.20 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Failed: From ssmallioub at masder.com Sat Jun 2 04:25:16 2007 From: ssmallioub at masder.com (Elnora Cohen) Date: Sat, 2 Jun 2007 07:25:16 -0400 Subject: [ofa-general] $269.90 Adobe Suite 3 Message-ID: <74562276.83331947364766@masder.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cumulates.png Type: image/png Size: 18112 bytes Desc: not available URL: From sullular4887 at charter.net Sat Jun 2 17:23:45 2007 From: sullular4887 at charter.net (SONG LILE) Date: Sat, 2 Jun 2007 17:23:45 -0700 Subject: [ofa-general] CONTACT ME Message-ID: <1371112581.1180830225622.JavaMail.root@fepweb13> Good Day, Please Read. My name is Mr.Song Lile, i am the director of operations in Hang Seng Bank Hong Kong. I have a business proposal in the tune of $19.5m. After the successful transfer, we shall share in ratio of 30% for you and 70% for me.Should you be interested, please contact me through my private email (privacy_song_lile111 at yahoo.com.hk) so we can commence all arrangements and I will give you more information on how we would handle this project. Please treat this business with utmost confidentiality and send me the following. Full names,Private phone number,Current residential address, Occupation,Age and Proffession. Kind Regards, Mr. Song Lile. From tduffy_linux at yahoo.com Sat Jun 2 20:05:39 2007 From: tduffy_linux at yahoo.com (TrueSwitch on behalf of tomduffy@gmail.com) Date: Sat, 2 Jun 2007 23:05:39 -0400 (EDT) Subject: [ofa-general] tomduffy@gmail.com has a new Yahoo! Mail address Message-ID: <19637331.1180839939772.JavaMail.vmail@service1.colo.trueswitch.com> An HTML attachment was scrubbed... URL: From eli at mellanox.co.il Sat Jun 2 23:50:43 2007 From: eli at mellanox.co.il (Eli Cohen) Date: Sun, 03 Jun 2007 09:50:43 +0300 Subject: [ofa-general] Re: [PATCH] libibverbs/examples: free invalid pointer In-Reply-To: References: <1180614624.7053.14.camel@mtls03> Message-ID: <1180853473.10841.1.camel@mtls03> On Thu, 2007-05-31 at 10:35 -0700, Roland Dreier wrote: > Thanks, but I think I fixed this bug in all the pingpong examples (not > just srq_pingpong) at the beginning of May. > > - R. Thanks. I was looking at an outdated tree... From vlad at lists.openfabrics.org Sun Jun 3 02:42:04 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Sun, 3 Jun 2007 02:42:04 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070603-0200 daily build status Message-ID: <20070603094204.EBB67E60870@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.16 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.15 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.12 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.15 Passed on ia64 with linux-2.6.16 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.14 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.17 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.19 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.14 Passed on x86_64 with linux-2.6.21.1 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.15 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.13 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ia64 with linux-2.6.16.21-0.8-default Failed: From jackm at dev.mellanox.co.il Sun Jun 3 06:39:38 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Sun, 3 Jun 2007 16:39:38 +0300 Subject: [ofa-general] [PATCH] libibverbs: initialize qp state to RESET at qp creation time Message-ID: <200706031639.38921.jackm@dev.mellanox.co.il> Roland, libmlx4 commit af7707cecdfd5ca8a38b4d855070ebfc310a339f (Initialize send queue entry ownership bits) is broken without the fix below. Since qp state is uninitialized, mlx4_qp_init_sq_ownership() frequently ends up not being invoked. --- Must initialize qp state to RESET at qp creation time. Signed-off-by: Jack Morgenstein diff --git a/src/verbs.c b/src/verbs.c index febf32a..f5cf4d3 100644 --- a/src/verbs.c +++ b/src/verbs.c @@ -406,6 +406,7 @@ struct ibv_qp *__ibv_create_qp(struct ibv_pd *pd, qp->recv_cq = qp_init_attr->recv_cq; qp->srq = qp_init_attr->srq; qp->qp_type = qp_init_attr->qp_type; + qp->state = IBV_QPS_RESET; qp->events_completed = 0; pthread_mutex_init(&qp->mutex, NULL); pthread_cond_init(&qp->cond, NULL); From jackm at dev.mellanox.co.il Sun Jun 3 06:43:20 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Sun, 3 Jun 2007 16:43:20 +0300 Subject: [ofa-general] [PATCH] IB/mlx4: rq size computation fix Message-ID: <200706031643.21047.jackm@dev.mellanox.co.il> rq.max should be at least 1. Signed-off-by: Jack Morgenstein diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index dc137de..0d5baf5 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -196,7 +196,7 @@ static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, cap->max_recv_sge > dev->dev->caps.max_rq_sg) return -EINVAL; - qp->rq.max = cap->max_recv_wr ? roundup_pow_of_two(cap->max_recv_wr) : 0; + qp->rq.max = cap->max_recv_wr ? roundup_pow_of_two(cap->max_recv_wr) : 1; qp->rq.wqe_shift = ilog2(roundup_pow_of_two(cap->max_recv_sge * sizeof (struct mlx4_wqe_data_seg))); From gmbobo at iol.pt Sun Jun 3 06:42:51 2007 From: gmbobo at iol.pt (=?iso-8859-1?Q?Mr.=20Gabriel=20Mbobo?=) Date: Sun, 03 Jun 2007 14:42:51 +0100 Subject: [ofa-general] Compliments Message-ID: Good day, I represent a top mining company executive in South Africa. I have a very sensitive and private brief from this top executive to ask for your partnership to re-profile funds totally Forty Two Million United States Dollars. ( $42,000,000.00) I will give the details of how we intend to proceed,this is a legitimate transaction. You will be paid 15% for your "management fees", if I am able to reach terms with you. If you are interested, please write me back by email and provide me with your full names and telephone numbers and address and I will provide further details. Please keep this close to your chest as much as possible; we are still in acting service. I wait in anticipation of your fullest co-operation. I am available to entertain any questions concerning the clarity of this transaction. Regards, Mr. Gabriel Mbobo. _______________________________________________________________________________________ Quer 5.000 euros? So na Conta Viva da GE Money. Saiba mais em: http://www.iol.pt/correio/rodape.php?dst=0705281 From frankose at ifi.uio.no Sun Jun 3 10:29:12 2007 From: frankose at ifi.uio.no (Frank Olaf Sem-Jacobsen) Date: Sun, 03 Jun 2007 19:29:12 +0200 Subject: [ofa-general] Log output upon death Message-ID: <4662FA68.2080305@ifi.uio.no> Time for my second naive question (too bad the archives do not have any search function). Much as expected RunSimTest dies for an unknown reason while routing my topology, and I am attempting to debug by adding various debug log entries. However, as things seem to be threaded (?) there does not seem to be any direct relationship between where the application fails and where the log output stops, the log usually stops abruptly in the middle of a line. Also, the log entries stop in various parts of the code instead of the same place each time (I could though have many errors ;) ). Is there a possible way to synchronise this such that the log file will reflect the last log entry by opensm before it dies? Are there any other ingenious ways of debugging the route building function? As always, any help is greatly appreciated. -- Frank Olaf Sem-Jacobsen From eitan at mellanox.co.il Sun Jun 3 10:48:17 2007 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sun, 3 Jun 2007 20:48:17 +0300 Subject: [ofa-general] Log output upon death In-Reply-To: <4662FA68.2080305@ifi.uio.no> References: <4662FA68.2080305@ifi.uio.no> Message-ID: <6C2C79E72C305246B504CBA17B5500C90199B3FF@mtlexch01.mtl.com> Hi Frank, >From your description it is unclear if it is the ibmssh (the shell that interprets the RunSimTest code) Or OpenSM has crashed. The best way to debug such issues (sudden death) is to compile the executables (both opensm and ibmssh) with debug info (by adding -ggdb to CFLAGS or better configure --enable-debug) and then allow the system to create core file (in bash use: ulimit -c unlimitted; in tcsh limit core unlimit). Then you will get a core dump file. Yo ushould try to open it in gdb and it will tell you what executable generated the core. Then you start gdb with the correct executable and core file and use the "where" command to debug. You can switch between threads by using the thread command. If you want me to have a look at the failure you can send me the "input" files you use (topo file and ibnl directory). Eitan Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > Frank Olaf Sem-Jacobsen > Sent: Sunday, June 03, 2007 8:29 PM > To: general at lists.openfabrics.org > Subject: [ofa-general] Log output upon death > > Time for my second naive question (too bad the archives do > not have any search function). > > Much as expected RunSimTest dies for an unknown reason while > routing my topology, and I am attempting to debug by adding > various debug log entries. However, as things seem to be > threaded (?) there does not seem to be any direct > relationship between where the application fails and where > the log output stops, the log usually stops abruptly in the > middle of a line. Also, the log entries stop in various > parts of the code instead of the same place each time (I > could though have many errors ;) ). > > Is there a possible way to synchronise this such that the log > file will reflect the last log entry by opensm before it > dies? Are there any other ingenious ways of debugging the > route building function? > > As always, any help is greatly appreciated. > -- > Frank Olaf Sem-Jacobsen > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From steven.farouk79 at gmail.com Sun Jun 3 11:30:56 2007 From: steven.farouk79 at gmail.com (freelotto lottery) Date: Sun, 3 Jun 2007 11:30:56 -0700 Subject: [ofa-general] Re: Attention: Notification Of Your Winnings, CONGRATULATIONS!!! Message-ID: *AFFILIATED OFFICE OF FREELOTTO U.K **82 Victoria Street, Victoria London SW1 U.K * *** * *NOTIFICATION OF WINNING* : *We are pleased to inform you of the release, of the recent results of the FREELOTTO INTERNATIONAL PROMOTION PROGRAM held on 30 **th May, 2007**. You were entered as dependent clients with: Reference Serial Number: ** F2-003-036** and Batch number **FR/45-300-06** .** Your email address attached to the ticket number: **54-20-17-52-34-30* *that drew the lucky winning number, which consequently won the Daily Jackpot in the first category,in four parts. You have been approved for a payment of $1,000, 000.00 ( **One** **Million** **United** **State** Dollars) in cash credited to file reference number**:** TFR/9900034943/JPT* *. Congratulations!!!* *To read the FreeLotto click here: **http://www.freelotto.com* *FreeLotto Winning Draw Results for June 3rd 2007** $50, 000.00: 4-5-34-41-3-37 $200,000.00: 22-43-6-9-28-26 $10,000.00: 12-32-17-14-24-10 $100,000.00 : 2-27-22-47-16-21 * *Daily Jackpot $1,000,000.00: 54-20-17-52-34-30 Super Bulk $10,000,000.00 : 37-2-48-41-46-25-43 * *Please contact the underlisted claims release office for immediate pay out of your winning fund: * *Mrs.Olivia Malik* *( Freelotto Fiduciary Department ) 82 Victoria Street Victoria London SW1 U.K Tel: +44 704 571 5302 Email:oliviamalik81079 at yahoo.com* ***He is your agent, and he is responsible for the processing and transfer of your winnings to you. After receiving your check from our office in U.Kother relevant documents you may need to claim your winning will be delivered to you by our paying bank as soon as you validate your claims.* *The freelotto internet drew is held every six months and is so organized to encourage the use of the internet and computer worldwide. We are proud to say that over 300 millions Euros are won annually in more than 118 countries Worldwide. * *Sincerely, Mr.Steven .Farouk* ***Chairman & CEO.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Sun Jun 3 12:27:22 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Jun 2007 15:27:22 -0400 Subject: [ofa-general] Log output upon death In-Reply-To: <4662FA68.2080305@ifi.uio.no> References: <4662FA68.2080305@ifi.uio.no> Message-ID: <1180898838.7116.450041.camel@hal.voltaire.com> On Sun, 2007-06-03 at 13:29, Frank Olaf Sem-Jacobsen wrote: > Time for my second naive question (too bad the archives do not have any > search function). > > Much as expected RunSimTest dies for an unknown reason while routing my > topology, and I am attempting to debug by adding various debug log > entries. However, as things seem to be threaded (?) there does not seem > to be any direct relationship between where the application fails and > where the log output stops, the log usually stops abruptly in the middle > of a line. Also, the log entries stop in various parts of the code > instead of the same place each time (I could though have many errors ;) ). > > Is there a possible way to synchronise this such that the log file will > reflect the last log entry by opensm before it dies? There's force_log_flush in opensm.opts which should help with this. -- Hal > Are there any > other ingenious ways of debugging the route building function? > > As always, any help is greatly appreciated. From rdreier at cisco.com Sun Jun 3 13:35:14 2007 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 03 Jun 2007 13:35:14 -0700 Subject: [ofa-general] Re: [PATCH] libibverbs: initialize qp state to RESET at qp creation time In-Reply-To: <200706031639.38921.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Sun, 3 Jun 2007 16:39:38 +0300") References: <200706031639.38921.jackm@dev.mellanox.co.il> Message-ID: thanks, I applied this to master, stable and stable-1.0 branches. From halr at voltaire.com Sun Jun 3 14:04:04 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Jun 2007 17:04:04 -0400 Subject: [ofa-general] OpenIB management libraries release Message-ID: <1180904641.7116.456208.camel@hal.voltaire.com> http://www.openfabrics.org/~halr/ md5sum 212f78cf6b370a2b5d44a773cd640446 libibcommon-1.0.3.tar.gz 7ba5da1f33a2df48ab34c12479852930 libibumad-1.0.5.tar.gz 1352954756833ad6a516e9a461949768 libibmad-1.0.5.tar.gz From rdreier at cisco.com Sun Jun 3 16:23:48 2007 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 03 Jun 2007 16:23:48 -0700 Subject: [ofa-general] Re: [PATCH] IB/mlx4: rq size computation fix In-Reply-To: <200706031643.21047.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Sun, 3 Jun 2007 16:43:20 +0300") References: <200706031643.21047.jackm@dev.mellanox.co.il> Message-ID: > rq.max should be at least 1. Is this true? It seems this would break QPs that use SRQ. (I agree we do need to make sure rq.max and rq.max_gs are at least 1 for QPs with a receive queue, but it seems this patch will actually break things when a QP doesn't have a receive queue, because the send queue offset will be wrong) From hakanlcrtx at cuisine-emoi.com Mon Jun 4 01:27:17 2007 From: hakanlcrtx at cuisine-emoi.com (Beatrice Carrillo) Date: Mon, 4 Jun 2007 04:27:17 -0400 Subject: [ofa-general] CREATIVE SUITE 3 READY TO DOWNLOAD Message-ID: <459461342343.452392227729@cuisine-emoi.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: skelp.png Type: image/png Size: 18078 bytes Desc: not available URL: From fiatt42 at bigdaddymails.com Mon Jun 4 02:24:38 2007 From: fiatt42 at bigdaddymails.com (Betty Saenz) Date: Mon, 4 Jun 2007 18:24:38 +0900 Subject: [ofa-general] An covel and bracey Message-ID: <001001c7a6d5$9c2984c0$0182528c@rok> love with her daughter. The beautiful girl looked upon this heartless dangerous conspiracy which had been formed against the king. Alexander a striking contrast to the exuberant prolificness of New Grenada. It is, account of its thus marking the eastern frontier of the country, it husband remonstrated with her against this atrocious proposal. "It would ascent as that of a few hundred feet in hundreds of miles would be characteristics of the country, with safety and pleasure. In a word, the characteristics of the country, with safety and pleasure. In a word, the ---------- Here is one hot new s to ck with lots of exciting news and what seems to be a bright future! ----- Strategy X Inc. (SGXI) A global risk mitigation specialist corporation. Price Today: 0.009 Recommendation: Buy aggresively (500+% pump expected) SGXI news: Strategy X Outlines Vertical Market Pursuit of the 2007 U.S. Department of Homeland Security Grants... For the complete release, please see your brokers website. ---------- the same surface with the sea, only, instead of blue waters topped with so afraid of his terrible mother, that he did not dare to remain in and barren desert, during the period of the annual inundations. This those being the regions in which idleness reigns. The great remedy, too, admiration and pleasure. We have not the wings of the eagle, but the these rainless regions all is necessarily silence, desolation, and last Cleopatra seized a number of Lathyrus's servants, the eunuchs who death, in order to prevent the older brothers from disputing the and degeneracy of national character as the world advances in age, will escaped with his life, as the mob had surrounded the palace and were were upon the throne. In the mean time, we will here only add, that From vlad at lists.openfabrics.org Mon Jun 4 02:41:53 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Mon, 4 Jun 2007 02:41:53 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070604-0200 daily build status Message-ID: <20070604094154.2C3D4E60825@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.16 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.17 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.14 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.21.1 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From eli at mellanox.co.il Mon Jun 4 07:16:35 2007 From: eli at mellanox.co.il (Eli Cohen) Date: Mon, 04 Jun 2007 17:16:35 +0300 Subject: [ofa-general] [PATCH] libmlx4: doorbell allocator Message-ID: <1180966625.10841.30.camel@mtls03> Use type of the constant 1 identical to the type of the variable holding the bit mask to prevent using the same bit twice. For example, on 64 bit machines, int is 32 bits and long is 64 bits. So 1 << 0 is equal 1 << 32 whereas the correct usage should be 1L << shift_val Found by Dotan Barak at Mellanox Signed-off-by: Eli Cohen --- Index: libmlx4/src/dbrec.c =================================================================== --- libmlx4.orig/src/dbrec.c 2007-06-04 12:53:57.000000000 +0300 +++ libmlx4/src/dbrec.c 2007-06-04 16:53:31.000000000 +0300 @@ -110,7 +110,7 @@ /* nothing */; j = ffsl(page->free[i]); - page->free[i] &= ~(1 << (j - 1)); + page->free[i] &= ~(1L << (j - 1)); db = page->buf.buf + (i * 8 * sizeof (long) + (j - 1)) * db_size[type]; out: @@ -135,7 +135,7 @@ goto out; i = ((void *) db - page->buf.buf) / db_size[type]; - page->free[i / (8 * sizeof (long))] |= 1 << (i % (8 * sizeof (long))); + page->free[i / (8 * sizeof (long))] |= 1L << (i % (8 * sizeof (long))); if (!--page->use_cnt) { if (page->prev) From halr at voltaire.com Mon Jun 4 08:46:52 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 04 Jun 2007 11:46:52 -0400 Subject: [ofa-general] [PATCH] ibnetdiscover: Add link width and speed to topology file output Message-ID: <1180972011.4533.3711.camel@hal.voltaire.com> ibnetdiscover: Add link width and speed to topology file output Signed-off-by: Hal Rosenstock diff --git a/infiniband-diags/include/ibnetdiscover.h b/infiniband-diags/include/ibnetdiscover.h index 4c2a6c7..7f2512e 100644 --- a/infiniband-diags/include/ibnetdiscover.h +++ b/infiniband-diags/include/ibnetdiscover.h @@ -72,6 +72,8 @@ struct Port { int lmc; int state; int physstate; + int linkwidth; + int linkspeed; Node *node; Port *remoteport; /* null if SMA */ diff --git a/infiniband-diags/man/ibnetdiscover.8 b/infiniband-diags/man/ibnetdiscover.8 index 7d9c49c..37a896c 100644 --- a/infiniband-diags/man/ibnetdiscover.8 +++ b/infiniband-diags/man/ibnetdiscover.8 @@ -1,4 +1,4 @@ -.TH IBNETDISCOVER 8 "June 2, 2007" "OpenIB" "OpenIB Diagnostics" +.TH IBNETDISCOVER 8 "June 4, 2007" "OpenIB" "OpenIB Diagnostics" .SH NAME ibnetdiscover \- discover InfiniBand topology @@ -131,45 +131,45 @@ devid=0x5a06 sysimgguid=0x5442ba00003000 switchguid=0x5442ba00003080 Switch 24 "S-005442ba00003080" # "ISR9024 Voltaire" base port 0 lid 6 lmc 0 -[22] "H-0008f10403961354"[1] # "MT23108 InfiniHost Mellanox Technologies" lid 4 -[10] "S-0008f10400410015"[1] # "SW-6IB4 Voltaire" lid 3 -[8] "H-0008f10403960558"[2] # "MT23108 InfiniHost Mellanox Technologies" lid 14 -[6] "S-0008f10400410015"[3] # "SW-6IB4 Voltaire" lid 3 -[12] "H-0008f10403960558"[1] # "MT23108 InfiniHost Mellanox Technologies" lid 10 +[22] "H-0008f10403961354"[1] # "MT23108 InfiniHost Mellanox Technologies" lid 4 4xSDR +[10] "S-0008f10400410015"[1] # "SW-6IB4 Voltaire" lid 3 4xSDR +[8] "H-0008f10403960558"[2] # "MT23108 InfiniHost Mellanox Technologies" lid 14 4xSDR +[6] "S-0008f10400410015"[3] # "SW-6IB4 Voltaire" lid 3 4xSDR +[12] "H-0008f10403960558"[1] # "MT23108 InfiniHost Mellanox Technologies" lid 10 4xSDR vendid=0x8f1 devid=0x5a05 switchguid=0x8f10400410015 Switch 8 "S-0008f10400410015" # "SW-6IB4 Voltaire" base port 0 lid 3 lmc 0 -[6] "H-0008f10403960984"[1] # "MT23108 InfiniHost Mellanox Technologies" lid 16 -[4] "H-005442b100004900"[1] # "MT23108 InfiniHost Mellanox Technologies" lid 12 -[1] "S-005442ba00003080"[10] # "ISR9024 Voltaire" lid 6 -[3] "S-005442ba00003080"[6] # "ISR9024 Voltaire" lid 6 +[6] "H-0008f10403960984"[1] # "MT23108 InfiniHost Mellanox Technologies" lid 16 4xSDR +[4] "H-005442b100004900"[1] # "MT23108 InfiniHost Mellanox Technologies" lid 12 4xSDR +[1] "S-005442ba00003080"[10] # "ISR9024 Voltaire" lid 6 1xSDR +[3] "S-005442ba00003080"[6] # "ISR9024 Voltaire" lid 6 4xSDR vendid=0x2c9 devid=0x5a44 caguid=0x8f10403960984 Ca 2 "H-0008f10403960984" # "MT23108 InfiniHost Mellanox Technologies" -[1] "S-0008f10400410015"[6] # lid 16 lmc 1 "SW-6IB4 Voltaire" lid 3 +[1] "S-0008f10400410015"[6] # lid 16 lmc 1 "SW-6IB4 Voltaire" lid 3 4xSDR vendid=0x2c9 devid=0x5a44 caguid=0x5442b100004900 Ca 2 "H-005442b100004900" # "MT23108 InfiniHost Mellanox Technologies" -[1] "S-0008f10400410015"[4] # lid 12 lmc 1 "SW-6IB4 Voltaire" lid 3 +[1] "S-0008f10400410015"[4] # lid 12 lmc 1 "SW-6IB4 Voltaire" lid 3 4xSDR vendid=0x2c9 devid=0x5a44 caguid=0x8f10403961354 Ca 2 "H-0008f10403961354" # "MT23108 InfiniHost Mellanox Technologies" -[1] "S-005442ba00003080"[22] # lid 4 lmc 1 "ISR9024 Voltaire" lid 6 +[1] "S-005442ba00003080"[22] # lid 4 lmc 1 "ISR9024 Voltaire" lid 6 4xSDR vendid=0x2c9 devid=0x5a44 caguid=0x8f10403960558 Ca 2 "H-0008f10403960558" # "MT23108 InfiniHost Mellanox Technologies" -[2] "S-005442ba00003080"[8] # lid 14 lmc 1 "ISR9024 Voltaire" lid 6 -[1] "S-005442ba00003080"[12] # lid 10 lmc 1 "ISR9024 Voltaire" lid 6 +[2] "S-005442ba00003080"[8] # lid 14 lmc 1 "ISR9024 Voltaire" lid 6 4xSDR +[1] "S-005442ba00003080"[12] # lid 10 lmc 1 "ISR9024 Voltaire" lid 6 1xSDR .fi When grouping is used, IB nodes are organized into chasses which are diff --git a/infiniband-diags/src/ibnetdiscover.c b/infiniband-diags/src/ibnetdiscover.c index 1338913..3dc2173 100644 --- a/infiniband-diags/src/ibnetdiscover.c +++ b/infiniband-diags/src/ibnetdiscover.c @@ -46,7 +46,7 @@ #include #include -#define __BUILD_VERSION_TAG__ 1.2.2 +#define __BUILD_VERSION_TAG__ 1.2.3 #include #include #include @@ -63,6 +63,26 @@ static char *node_type_str[] = { "iwarp rnic" }; +static char *linkwidth_str[] = { + "??", + "1x", + "4x", + "??", + "8x", + "??", + "??", + "??", + "12x" +}; + +static char *linkspeed_str[] = { + "???", + "SDR", + "???", + "DDR", + "QDR" +}; + static int timeout = 2000; /* ms */ static int dumplevel = 0; static int verbose; @@ -80,6 +100,24 @@ int maxhops_discovered = 0; struct ChassisList *chassis = NULL; +static char * +get_linkwidth_str(int linkwidth) +{ + if (linkwidth > 8) + return linkwidth_str[0]; + else + return linkwidth_str[linkwidth]; +} + +static char * +get_linkspeed_str(int linkspeed) +{ + if (linkspeed > 4) + return linkspeed_str[0]; + else + return linkspeed_str[linkspeed]; +} + int get_port(Port *port, int portnum, ib_portid_t *portid) { @@ -95,9 +133,11 @@ get_port(Port *port, int portnum, ib_por mad_decode_field(pi, IB_PORT_LMC_F, &port->lmc); mad_decode_field(pi, IB_PORT_STATE_F, &port->state); mad_decode_field(pi, IB_PORT_PHYS_STATE_F, &port->physstate); + mad_decode_field(pi, IB_PORT_LINK_WIDTH_ACTIVE_F, &port->linkwidth); + mad_decode_field(pi, IB_PORT_LINK_SPEED_ACTIVE_F, &port->linkspeed); - DEBUG("portid %s portnum %d: lid %d state %d physstate %d", - portid2str(portid), portnum, port->lid, port->state, port->physstate); + DEBUG("portid %s portnum %d: lid %d state %d physstate %d %s %s", + portid2str(portid), portnum, port->lid, port->state, port->physstate, get_linkwidth_str(port->linkwidth), get_linkspeed_str(port->linkspeed)); return 1; } /* @@ -135,6 +175,8 @@ get_node(Node *node, Port *port, ib_port mad_decode_field(pi, IB_PORT_LMC_F, &port->lmc); mad_decode_field(pi, IB_PORT_STATE_F, &port->state); mad_decode_field(pi, IB_PORT_PHYS_STATE_F, &port->physstate); + mad_decode_field(pi, IB_PORT_LINK_WIDTH_ACTIVE_F, &port->linkwidth); + mad_decode_field(pi, IB_PORT_LINK_SPEED_ACTIVE_F, &port->linkspeed); if (node->type != SWITCH_NODE) return 0; @@ -571,12 +613,14 @@ out_switch_port(Port *port, int group) rem_nodename = clean_nodedesc(port->remoteport->node->nodedesc); ext_port_str = out_ext_port(port->remoteport, group); - fprintf(f, "\t%s[%d]%s\t\t# \"%s\" lid %d\n", + fprintf(f, "\t%s[%d]%s\t\t# \"%s\" lid %d %s%s\n", node_name(port->remoteport->node), port->remoteport->portnum, ext_port_str ? ext_port_str : "", rem_nodename, - port->remoteport->node->type == SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->lid); + port->remoteport->node->type == SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->lid, + get_linkwidth_str(port->linkwidth), + get_linkspeed_str(port->linkspeed)); if (rem_nodename && (port->remoteport->node->type == SWITCH_NODE)) free(rem_nodename); @@ -601,9 +645,11 @@ out_ca_port(Port *port, int group) port->remoteport->node->nodedesc); else rem_nodename = clean_nodedesc(port->remoteport->node->nodedesc); - fprintf(f, "\t\t# lid %d lmc %d \"%s\" lid %d\n", + fprintf(f, "\t\t# lid %d lmc %d \"%s\" lid %d %s%s\n", port->lid, port->lmc, rem_nodename, - port->remoteport->node->type == SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->lid); + port->remoteport->node->type == SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->lid, + get_linkwidth_str(port->linkwidth), + get_linkspeed_str(port->linkspeed)); if (rem_nodename && (port->remoteport->node->type == SWITCH_NODE)) free(rem_nodename); } From yosefe at voltaire.com Mon Jun 4 10:20:56 2007 From: yosefe at voltaire.com (Yosef Etigin) Date: Mon, 04 Jun 2007 20:20:56 +0300 Subject: [ofa-general] [PATCH] rdma_cm: fix port type (fix bug 557) Message-ID: <466449F8.2030100@voltaire.com> This fixes bug 557 If next_port is signed int, and is randomized to be negative, it will fail accesses to the idr data structure and therefore cause errors in rdma_cm users. Signed-off-by: Yosef Etigin -- diff -urN ofa_kernel-1.2/drivers/infiniband/core/cma.c ofa_kernel-1.2.b/drivers/infiniband/core/cma.c --- ofa_kernel-1.2/drivers/infiniband/core/cma.c 2007-06-04 20:12:12.000000000 +0300 +++ ofa_kernel-1.2.b/drivers/infiniband/core/cma.c 2007-06-04 20:14:27.000000000 +0300 @@ -77,7 +77,7 @@ static DEFINE_IDR(tcp_ps); static DEFINE_IDR(udp_ps); static DEFINE_IDR(ipoib_ps); -static int next_port; +static unsigned next_port; struct cma_device { struct list_head list; From sean.hefty at intel.com Mon Jun 4 11:35:49 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 4 Jun 2007 11:35:49 -0700 Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557) In-Reply-To: <466449F8.2030100@voltaire.com> Message-ID: <001d01c7a6d7$2cb81730$ff0da8c0@amr.corp.intel.com> >If next_port is signed int, and is randomized to be negative, it will fail >accesses to the idr data structure and therefore cause errors in rdma_cm users. next_port is initialized as follows: get_random_bytes(&next_port, sizeof next_port); next_port = (next_port % (sysctl_local_port_range[1] - sysctl_local_port_range[0])) + sysctl_local_port_range[0]; Even if next_port is initialized to a negative value by get_random_bytes, I would expect next_port to be set to a positive value between local_port_range[0] and local_port_range[1] by the next statement. I'm not seeing the error my my math/logic here. - Sean From gurhan.ozen at gmail.com Mon Jun 4 11:47:24 2007 From: gurhan.ozen at gmail.com (G.O.) Date: Mon, 4 Jun 2007 14:47:24 -0400 Subject: [ofa-general] does RHEL5 Xen work with OFED? In-Reply-To: <5849f1820704120925n10871803gb729e7767a64fecf@mail.gmail.com> References: <5849f1820704052125ob1d309do323eae651ea9ed91@mail.gmail.com> <20070410181810.GD10218@mellanox.co.il> <5849f1820704120204q7f88f098qb69c1399668a4be9@mail.gmail.com> <20070412141417.GM24730@mellanox.co.il> <5849f1820704120925n10871803gb729e7767a64fecf@mail.gmail.com> Message-ID: <5849f1820706041147x54e8d38at5fb5e66141090202@mail.gmail.com> Hi Michael, I am getting "Device ib0 does not seem to be present, delaying initialization." warning. I tried creating a new network-bridge by using ib0 interface on Dom-0 as the net device as well but didn't work. Thanks, gurhan On 4/12/07, G. O. wrote: > On 4/12/07, Michael S. Tsirkin wrote: > > > Quoting G.O. : > > > Subject: Re: [ofa-general] does RHEL5 Xen work with OFED? > > > > > > On 4/10/07, Michael S. Tsirkin wrote: > > > >> Quoting G.O. : > > > >> Subject: Re: [ofa-general] does RHEL5 Xen work with OFED? > > > >> > > > >> On 4/5/07, Scott Weitzenkamp (sweitzen) wrote: > > > >> >Can I access OFED IPoIB and SRP/iSER devices from within a Xen virtual > > > >> >machine? > > > >> > > > > >> > > > >> I haven't tested SRP/iSER , but IPoIB works only on dom0 kernel. > > > >> You can't use any infiniband stuff on the guest OSes . > > > >> > > > >> Gurhan > > > > > > > >What doesn't work? I would expect both IPoIB and SRP > > > >behave in more or less the same way as any network/storage > > > >devices, and get virtualized by Xen. > > > > > > > > > > Nothing works. Guest kernel didn't even create > > > /sys/class/infiniband/* files. 'Far as the guest kernel is concerned, > > > HCA doesn't even seem to exist. > > > > > > Just as a FYI, I have only tried on paravirtualized guests, didn't > > > try it with fully-virtualized guests. > > > > Why would you want to see /sys/class/infiniband/? > > There things are only there for direct HW access, guests do not get that. > > > > You should be able to use SRP and IPoIB - you set it up in host (dom0) > > and guests use it as any other network/storage device through the > > virtualization layer. > > > > Hi Michael, > IIRC, i had got the "can't find device, initialization delayed" > errors. I'll play around with it again with the GA release when I get > a chance and will let you know. Might happen as early as next week. > > Thanks, > Gurhan > > > > -- > > MST > > > From sweitzen at cisco.com Mon Jun 4 11:53:51 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 4 Jun 2007 11:53:51 -0700 Subject: [ofa-general] does RHEL5 Xen work with OFED? In-Reply-To: <5849f1820706041147x54e8d38at5fb5e66141090202@mail.gmail.com> References: <5849f1820704052125ob1d309do323eae651ea9ed91@mail.gmail.com> <20070410181810.GD10218@mellanox.co.il> <5849f1820704120204q7f88f098qb69c1399668a4be9@mail.gmail.com> <20070412141417.GM24730@mellanox.co.il> <5849f1820704120925n10871803gb729e7767a64fecf@mail.gmail.com> <5849f1820706041147x54e8d38at5fb5e66141090202@mail.gmail.com> Message-ID: Yep, my understanding is IPoIB cannot be used with Xen network bridging at this time, the bridging can't handle IPoIB ARP addresses. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: G.O. [mailto:gurhan.ozen at gmail.com] > Sent: Monday, June 04, 2007 11:47 AM > To: Michael S. Tsirkin > Cc: Scott Weitzenkamp (sweitzen); EWG; openib > Subject: Re: [ofa-general] does RHEL5 Xen work with OFED? > > Hi Michael, > I am getting "Device ib0 does not seem to be present, delaying > initialization." warning. > I tried creating a new network-bridge by using ib0 interface on Dom-0 > as the net device as well but didn't work. > > Thanks, > gurhan > > > On 4/12/07, G. O. wrote: > > On 4/12/07, Michael S. Tsirkin wrote: > > > > Quoting G.O. : > > > > Subject: Re: [ofa-general] does RHEL5 Xen work with OFED? > > > > > > > > On 4/10/07, Michael S. Tsirkin wrote: > > > > >> Quoting G.O. : > > > > >> Subject: Re: [ofa-general] does RHEL5 Xen work with OFED? > > > > >> > > > > >> On 4/5/07, Scott Weitzenkamp (sweitzen) > wrote: > > > > >> >Can I access OFED IPoIB and SRP/iSER devices from > within a Xen virtual > > > > >> >machine? > > > > >> > > > > > >> > > > > >> I haven't tested SRP/iSER , but IPoIB works only > on dom0 kernel. > > > > >> You can't use any infiniband stuff on the guest OSes . > > > > >> > > > > >> Gurhan > > > > > > > > > >What doesn't work? I would expect both IPoIB and SRP > > > > >behave in more or less the same way as any network/storage > > > > >devices, and get virtualized by Xen. > > > > > > > > > > > > > Nothing works. Guest kernel didn't even create > > > > /sys/class/infiniband/* files. 'Far as the guest > kernel is concerned, > > > > HCA doesn't even seem to exist. > > > > > > > > Just as a FYI, I have only tried on paravirtualized > guests, didn't > > > > try it with fully-virtualized guests. > > > > > > Why would you want to see /sys/class/infiniband/? > > > There things are only there for direct HW access, guests > do not get that. > > > > > > You should be able to use SRP and IPoIB - you set it up > in host (dom0) > > > and guests use it as any other network/storage device through the > > > virtualization layer. > > > > > > > Hi Michael, > > IIRC, i had got the "can't find device, initialization delayed" > > errors. I'll play around with it again with the GA release > when I get > > a chance and will let you know. Might happen as early as next week. > > > > Thanks, > > Gurhan > > > > > > -- > > > MST > > > > > > From or.gerlitz at gmail.com Mon Jun 4 12:12:20 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Mon, 4 Jun 2007 22:12:20 +0300 Subject: [ofa-general] Re: multiple independent IB fabrics connected to the same node Message-ID: <15ddcffd0706041212u2c2ddfc7pdfa1a390c4fdb576@mail.gmail.com> On 5/29/07, Bob Kossey wrote: > > Another related question. Does OFED 1.2 now support multiple > independent IB fabrics > (multiple SMs, etc) connected to multiple HCAs on the same node? Are > there any > qualifications about which dimensions are supported with this, such as > ipoib HA, SRP HA, > other types of failover, etc.? > Hi Bob, Generally speaking, as far as i am aware, by design the openib stack --does-- support such a configuration but it must be validated, I cc the maintainer here, in case they see something that they think is broken under such a config. However, note that such a config is somehow problematic (broken) for High Availability, specifically looking on IPoIB HA, say you have two nodes, n1 and n2 connected to two subnet S1 and S2 and now the n1/S1 link is broken and bonding does fail over to the IPoIB interface on S2 such that n1/S2 is the active link. At this point, for n2 to commuinicate with n1 it --must-- failover also to S2, when it would not have to do so if S1 and S2 were the same fabric. This is only a simple (non) use case to examplify the problem here. My take on that you better avoid rely on HA between subnets using tools like bonding. If you use higher level HA tools it --might-- make sense to plan for using two independent subnets. Or. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Mon Jun 4 12:37:39 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 04 Jun 2007 12:37:39 -0700 Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557) In-Reply-To: <001d01c7a6d7$2cb81730$ff0da8c0@amr.corp.intel.com> References: <001d01c7a6d7$2cb81730$ff0da8c0@amr.corp.intel.com> Message-ID: <46646A03.2040508@ichips.intel.com> > Even if next_port is initialized to a negative value by get_random_bytes, I > would expect next_port to be set to a positive value between local_port_range[0] > and local_port_range[1] by the next statement. I'm not seeing the error my my > math/logic here. My my English needs help, but here's the definitions for '%' in C89 and C99 according to Wikipedia: C89 - sign of result is not defined C99 - result has same sign as dividend Could the compiler be causing the difference on this? - Sean From rdreier at cisco.com Mon Jun 4 12:49:28 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 04 Jun 2007 12:49:28 -0700 Subject: [ofa-general] Re: [PATCH] libmlx4: doorbell allocator In-Reply-To: <1180966625.10841.30.camel@mtls03> (Eli Cohen's message of "Mon, 04 Jun 2007 17:16:35 +0300") References: <1180966625.10841.30.camel@mtls03> Message-ID: applied, thanks From yosefe at voltaire.com Mon Jun 4 12:55:09 2007 From: yosefe at voltaire.com (Yosef Eitgin) Date: Mon, 4 Jun 2007 22:55:09 +0300 Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557) References: <001d01c7a6d7$2cb81730$ff0da8c0@amr.corp.intel.com> <46646A03.2040508@ichips.intel.com> Message-ID: <39C75744D164D948A170E9792AF8E7CA0A819B@exil.voltaire.com> >> Even if next_port is initialized to a negative value by get_random_bytes, I >> would expect next_port to be set to a positive value between local_port_range[0] >> and local_port_range[1] by the next statement. I'm not seeing the error my my >> math/logic here. > >My my English needs help, but here's the definitions for '%' in C89 and >C99 according to Wikipedia: > >C89 - sign of result is not defined >C99 - result has same sign as dividend > >Could the compiler be causing the difference on this? > >- Sean > Possible. I was using the OFED build environment in sles10sp1, and without the patch next_port sometimes gets a negative value. This might be the reason it was difficult to reproduce this. Anyway, in order to cover all possibilities (such as C99), I think that next_port should be unsigned. --Yossi From hanafim.ctr at asc.hpc.mil Mon Jun 4 12:53:43 2007 From: hanafim.ctr at asc.hpc.mil (MAHMOUD HANAFI) Date: Mon, 04 Jun 2007 15:53:43 -0400 Subject: [ofa-general] IB_GOLD ib_srp question Message-ID: <46646DC7.4030800@asc.hpc.mil> I am not sure if this is the best place to ask this or not.... Does any one know how to change "max_hw_sectors_kb" using ib_gold 1.8.3. I know you can set it using max_sect on OFED1.2. The default for Ib_gold is 128KB which is to small. Thanks, -- Mahmoud Hanafi Senior System Administrator ASC/MSRC www.asc.hpc.mil 2435 5th Street WPAFB, OHIO 45433 (937) 255-1536 From jgunthorpe at obsidianresearch.com Mon Jun 4 13:24:12 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Mon, 4 Jun 2007 14:24:12 -0600 Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557) In-Reply-To: <46646A03.2040508@ichips.intel.com> References: <001d01c7a6d7$2cb81730$ff0da8c0@amr.corp.intel.com> <46646A03.2040508@ichips.intel.com> Message-ID: <20070604202412.GH32050@obsidianresearch.com> On Mon, Jun 04, 2007 at 12:37:39PM -0700, Sean Hefty wrote: > >Even if next_port is initialized to a negative value by get_random_bytes, I > >would expect next_port to be set to a positive value between > >local_port_range[0] > >and local_port_range[1] by the next statement. I'm not seeing the error > >my my > >math/logic here. > > My my English needs help, but here's the definitions for '%' in C89 and > C99 according to Wikipedia: The C99 '%' operator is actually a remainder operator, not a modulo operator.. These two things are identical until you consider the effect of negative numbers: -1 modulo 4 = 3 -1 modulo -4 = -1 -1 remainder 4 = -1 # C99 defintion of % -1 remainder -4 = -1 Lagunages that have both a remainder and a modulo operator operate as above. Other languages often like to call remainder modulo, so it is all very confusing. For C, it is best if you never use signed numbers with % since prior to C99 it was undefined if it is remainder or modulo. Also, in general, most people don't want remainder when they think of % in C. Jason From mshefty at ichips.intel.com Mon Jun 4 14:53:01 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 04 Jun 2007 14:53:01 -0700 Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557) In-Reply-To: <39C75744D164D948A170E9792AF8E7CA0A819B@exil.voltaire.com> References: <001d01c7a6d7$2cb81730$ff0da8c0@amr.corp.intel.com> <46646A03.2040508@ichips.intel.com> <39C75744D164D948A170E9792AF8E7CA0A819B@exil.voltaire.com> Message-ID: <466489BD.50608@ichips.intel.com> > Possible. I was using the OFED build environment in sles10sp1, and without the > patch next_port sometimes gets a negative value. This might be the reason it was > difficult to reproduce this. Anyway, in order to cover all possibilities (such > as C99), I think that next_port should be unsigned. The problem makes sense to me now, and it explains why it wasn't easily reproducible on other platforms. I'm not sure if we should convert next_port to an unsigned value, or just ensure that it's not negative. It's defined as an int since idr_get_new_above() expects an int. Do we need an explicit cast when calling idr_get_new_above(), or how about just casting next_port to unsigned when initializing it? - Sean From troy at scl.ameslab.gov Mon Jun 4 16:52:46 2007 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Mon, 04 Jun 2007 18:52:46 -0500 Subject: [ofa-general] Perfquery XmtWords, not XmtBytes... Message-ID: <4664A5CE.4080505@scl.ameslab.gov> It appears that Perfquery (and the performance counter api's we are using for fountain/goanna) are reporting data in 32 bit (4-byte) *words* and not bytes. Can someone please clear up my confusion on this, and maybe correct the documentation as well? From halr at voltaire.com Mon Jun 4 17:17:58 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 04 Jun 2007 20:17:58 -0400 Subject: [ofa-general] Perfquery XmtWords, not XmtBytes... In-Reply-To: <4664A5CE.4080505@scl.ameslab.gov> References: <4664A5CE.4080505@scl.ameslab.gov> Message-ID: <1181002677.12997.17099.camel@hal.voltaire.com> On Mon, 2007-06-04 at 19:52, Troy Benjegerdes wrote: > It appears that Perfquery (and the performance counter api's we are > using for fountain/goanna) are reporting data in 32 bit (4-byte) *words* > and not bytes. > > Can someone please clear up my confusion on this, and maybe correct the > documentation as well? It's consistent with what the IB spec says (IBA 1.2 vol 1 p.948) as to how these quantities are counted. They are defined to be octets divided by 4 so the choice is to display them the same as the actual quantity (which is why they are named Data rather than Octets) or to multiply by 4 for Octets. The former choice was made. -- Hal > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From sean.hefty at intel.com Mon Jun 4 17:19:00 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 4 Jun 2007 17:19:00 -0700 Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557) In-Reply-To: <466489BD.50608@ichips.intel.com> Message-ID: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com> Can you see if this patch also fixes the problem? I'd like to keep next_port defined as an int to match the idr_get_new_above() prototype and sysctl_local_port_range definition. If this fixes the problem, we should add it to OFED and queue it for 2.6.23. --- next_port should be between sysctl_local_port_range[0] and [1]. However, it is initially set to a random value. If the value is negative, next_port can fall outside of this range because of the % operator returning a negative value. Signed-off-by: Sean Hefty --- drivers/infiniband/core/cma.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index eb15119..b0831cb 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -2772,8 +2772,8 @@ static int cma_init(void) int ret; get_random_bytes(&next_port, sizeof next_port); - next_port = (next_port % (sysctl_local_port_range[1] - - sysctl_local_port_range[0])) + + next_port = ((unsigned int) next_port % + (sysctl_local_port_range[1] - sysctl_local_port_range[0])) + sysctl_local_port_range[0]; cma_wq = create_singlethread_workqueue("rdma_cm"); if (!cma_wq) From troy at scl.ameslab.gov Mon Jun 4 17:41:42 2007 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Mon, 04 Jun 2007 19:41:42 -0500 Subject: [ofa-general] Perfquery XmtWords, not XmtBytes... In-Reply-To: <1181002677.12997.17099.camel@hal.voltaire.com> References: <4664A5CE.4080505@scl.ameslab.gov> <1181002677.12997.17099.camel@hal.voltaire.com> Message-ID: <4664B146.9090205@scl.ameslab.gov> Okay. I see the latest version of perfquery uses 'XmtData' instead of XmtBytes. Thanks. Hal Rosenstock wrote: > On Mon, 2007-06-04 at 19:52, Troy Benjegerdes wrote: > >> It appears that Perfquery (and the performance counter api's we are >> using for fountain/goanna) are reporting data in 32 bit (4-byte) *words* >> and not bytes. >> >> Can someone please clear up my confusion on this, and maybe correct the >> documentation as well? >> > > It's consistent with what the IB spec says (IBA 1.2 vol 1 p.948) as to > how these quantities are counted. They are defined to be octets divided > by 4 so the choice is to display them the same as the actual quantity > (which is why they are named Data rather than Octets) or to multiply by > 4 for Octets. The former choice was made. > > -- Hal > > >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >> > > From vlad at lists.openfabrics.org Tue Jun 5 02:40:25 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Tue, 5 Jun 2007 02:40:25 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070605-0200 daily build status Message-ID: <20070605094025.5E494E60834@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.13 Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.12 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.13 Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.18 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.13 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.17 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.15 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.15 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Failed: From tziporet at dev.mellanox.co.il Tue Jun 5 03:35:28 2007 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 05 Jun 2007 13:35:28 +0300 Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557) In-Reply-To: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com> References: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com> Message-ID: <46653C70.7070403@mellanox.co.il> Sean Hefty wrote: > Can you see if this patch also fixes the problem? I'd like to keep > next_port defined as an int to match the idr_get_new_above() prototype > and sysctl_local_port_range definition. > > If this fixes the problem, we should add it to OFED and queue it for > 2.6.23. > --- > > > Sean/Yossi Can you prepare us a patch for OFED 1.2 Thanks, Tziporet From vlad at mellanox.co.il Tue Jun 5 05:14:47 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 05 Jun 2007 15:14:47 +0300 Subject: [ofa-general] rdma_cm kernel Oops Message-ID: <1181045687.1114.16.camel@vladsk-laptop> Hi Sean, I got the following kernel oops while testing RDS HA (kernel 2.6.20): rdma_destroy_id+0x124/0x193 corresponds to the line 778 in drivers/infiniband/core/cma.c 771 static void cma_release_port(struct rdma_id_private *id_priv) 772 { 773 struct rdma_bind_list *bind_list = id_priv->bind_list; 774 775 if (!bind_list) 776 return; 777 778 mutex_lock(&lock); 779 hlist_del(&id_priv->node); 780 if (hlist_empty(&bind_list->owners)) { 781 idr_remove(bind_list->ps, bind_list->port); 782 kfree(bind_list); 783 } 784 mutex_unlock(&lock); 785 } Oops: Jun[ 645.944058] Pid: 7354, comm: rdma_cm_wq Not tainted 2.6.20 #2 5 09:11:48 sw1[ 645.944061] RIP: 0010:[] [] :rdma_cm:rdma_destroy_id+0x124/0x193 23 kernel: [ 64[ 645.944072] RSP: 0018:ffff81011f223e30 EFLAGS: 00010206 5.816913] rds_sh[ 645.944076] RAX: 0000000000100100 RBX: ffff81011d86d340 RCX: ffff8101224d0350 utdown_worker: w[ 645.944080] RDX: 0000000000200200 RSI: 0000000000000056 RDI: ffffffff881a2140 as_conn 0 was_co[ 645.944084] RBP: ffff8101224d0270 R08: 0000000000000000 R09: 0000000000000000 nning -1 [ 645.944087] R10: ffff81011f223d50 R11: 0000000000000048 R12: 0000000000000001 [ 645.944091] R13: 0000000000000287 R14: ffffffff8819b445 R15: 0000000000000000 [ 645.944095] FS: 0000000000000000(0000) GS:ffffffff8058e000(0000) knlGS:0000000000000000 [ 645.944099] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 645.944103] CR2: 0000000000200200 CR3: 000000011e21d000 CR4: 00000000000006e0 [ 645.944107] Process rdma_cm_wq (pid: 7354, threadinfo ffff81011f222000, task ffff810117df8830) [ 645.944110] Stack: ffff8101224d0270 ffff8101224d0270 ffff81011a850a20 ffffffff8819b4a7 [ 645.944119] ffff81011a850a28 ffff81011d89ea48 ffff81011a850a20 ffffffff80239c4e [ 645.944126] ffff81011d89ea48 ffffffff80239ced ffff8101201c7d98 00000000fffffffc [ 645.944132] Call Trace: [ 645.944143] [] :rdma_cm:cma_work_handler+0x62/0x6e [ 645.944153] [] run_workqueue+0xa5/0x144 [ 645.944159] [] worker_thread+0x0/0x165 [ 645.944164] [] keventd_create_kthread+0x0/0x6a [ 645.944169] [] worker_thread+0x12f/0x165 [ 645.944177] [] default_wake_function+0x0/0xe [ 645.944184] [] default_wake_function+0x0/0xe [ 645.944190] [] kthread+0xc8/0xf1 [ 645.944198] [] child_rip+0xa/0x12 [ 645.944203] [] keventd_create_kthread+0x0/0x6a [ 645.944213] [] kthread+0x0/0xf1 [ 645.944217] [] child_rip+0x0/0x12 [ 645.944221] [ 645.944223] [ 645.944224] Code: 48 89 02 74 04 48 89 50 08 48 c7 85 e0 00 00 00 00 01 10 00 [ 645.944236] RIP [] :rdma_cm:rdma_destroy_id+0x124/0x193 [ 645.944246] RSP [ 645.944249] CR2: 0000000000200200 [ 645.944251] <4>created cm id ffff8101224d0270 for conn ffff81011c857d48 -- Vladimir Sokolovsky Mellanox Technologies Ltd. From rdreier at cisco.com Tue Jun 5 06:34:41 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 05 Jun 2007 06:34:41 -0700 Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557) In-Reply-To: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com> (Sean Hefty's message of "Mon, 4 Jun 2007 17:19:00 -0700") References: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com> Message-ID: > If this fixes the problem, we should add it to OFED and queue it for > 2.6.23. I haven't followed this closely, but what's the impact of this bug? It seems it would result in a port out of the configured range being used. Which seems serious enough to fix for 2.6.22 to me. From ivestavasot at t-dialin.net Tue Jun 5 06:29:41 2007 From: ivestavasot at t-dialin.net (Samara Cox) Date: Tue, 05 Jun 2007 18:29:41 +0500 Subject: [ofa-general] Time to check it out Message-ID: Stacy deafening moon chilly overthrown leaned forward. I'm listening.Yeah? inject well Dana gave corporeal her a wound stern look, Sometimes, a man'slet bag fit Alright. Gavin hammer sat down next to her and put h This worm fill caught Stacy's attention. hematal When low did you evAnd I'll use it! bang grin Jeff cautious now held it cause up in a men Obviously you've wriggle frightened known trouble pull him alot longer than I When I got telephone strange my carve first meant period, instead of just gi Needless watch laid to nut say, monkey I'm no authority on what make skin pedal Dana trust leaned into statement him, and they both settled int Stacy's boil cellphone rang. improve talking spot into it before was a bbread war A couple of days ago, ill paper when I was trying to che Sol nose stood in front of Gordy cuddly time to overdone shield him. Jef So they stung stain never really took question you to taken see Swan Lake amuse 8:15 wove sense milk PM, Faircrest Middle School When stitch she dropped you off thin load at cruel Gavin's tonight, w In other words, side this geoponic is homely a guy who's existence used to ge She smell innocent never help even met vivaciously him. Dana was now slightly kiss We hand linen georgic might have found out if Guy hadn't frighten wooden Oh they infamous took me improve alright. weaved It's the part about cHello? bright Stacy, account it's sky me, Came Dana's voice tumble over the re argue Jeff was not moved. level Sol, do monthly you helpful have any idea A few comparison seconds later, Sol came journey hair level running out after Linda chimed brought in, punishment You're say consider his chick, not his motshore Want escape me meet to run after them and tell breed them it wasThe tow basket parents and teachers had enormously bathe gone through all t That actually explains bewildered cloud within alot. I'll see moor you a li pen Principal drank Lazarus woman stepped up to bare the lecturn sta Is owe that hook brief. your girlfriend? cheat Greil noticed Jeff w What!? Yeah. Up until now, use she didn't became open thrust know anything a crack Not driven fine beyond tonight, said Nicki. Hey, what's up? Alright, hid I get the busy idea. sin bound I take comfort in knoHow're hammer stuck you adjusting corporeal store to the cast?Tell defeated flame anyone embarrass about bit what? You think the whole sc journey Jeff laid out back jog down. In knelt my darkest hours...whi It's been a risk little across inconvenient watch to forsook say the lea spare gave Bye week Angel. colourful Jeff closed his cellphone Oddly enough, Jeff's parents long had store thunder tintinnabulary no idea whatso The under seemingly innocuous shaved kiss decide hour long deactivation unexpectedly She question just took push outstanding one look at the size of that hou low Did rescue you get pontal a look at after them? called Jeff. burn blot It looks to waste withheld me like she's not too happy about Marcie had been listening speedily annually miniature sticky in on the conversatio noise tiny lovely No one I know, called Guy. And I pleasure had a blood repulsive shelf I've got a question to ask. was I've floor noticed a vid -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: yhadi.gif Type: image/gif Size: 6635 bytes Desc: not available URL: From sashak at voltaire.com Tue Jun 5 08:28:03 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 5 Jun 2007 18:28:03 +0300 Subject: [ofa-general] [PATCH] opensm: protect sminfo response Message-ID: <20070605152803.GA10519@sashak.voltaire.com> This port_guid check protects SMInfo responses processing against port moving issue. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_port_info_rcv.c | 1 + opensm/opensm/osm_sminfo_rcv.c | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/osm_port_info_rcv.c b/opensm/opensm/osm_port_info_rcv.c index 849427e..1fd4915 100644 --- a/opensm/opensm/osm_port_info_rcv.c +++ b/opensm/opensm/osm_port_info_rcv.c @@ -199,6 +199,7 @@ __osm_pi_rcv_process_endport( */ memset( &context, 0, sizeof(context) ); context.smi_context.set_method = FALSE; + context.smi_context.port_guid = port_guid; status = osm_req_get( p_rcv->p_req, osm_physp_get_dr_path_ptr( p_physp ), IB_MAD_ATTR_SM_INFO, diff --git a/opensm/opensm/osm_sminfo_rcv.c b/opensm/opensm/osm_sminfo_rcv.c index b26b6bf..18fd072 100644 --- a/opensm/opensm/osm_sminfo_rcv.c +++ b/opensm/opensm/osm_sminfo_rcv.c @@ -749,8 +749,26 @@ osm_sminfo_rcv_process( */ if( ib_smp_is_response( p_smp ) ) { + const ib_sm_info_t *p_smi = ib_smp_get_payload_ptr( p_smp ); + /* Get the context - to see if this is a response to a Get or Set method */ p_smi_context = osm_madw_get_smi_context_ptr( p_madw ); + + /* + verify that response is from expected port and there is no port + moving issue */ + if ( p_smi_context->port_guid != p_smi->guid ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "osm_sminfo_rcv_process: ERR 2F19: " + "unexpected SM port GUID in response" + "\n\t\t\t\tExpected 0x%016" PRIx64 + ", Received 0x%016" PRIx64 "\n", + cl_ntoh64( p_smi_context->port_guid ), + cl_ntoh64( p_smi->guid ) ); + goto Exit; + } + if ( p_smi_context->set_method == FALSE ) { /* this is a response to a Get method */ @@ -777,5 +795,6 @@ osm_sminfo_rcv_process( } } + Exit: OSM_LOG_EXIT( p_rcv->p_log ); } -- 1.5.2.1.137.g426c From vuhuong at mellanox.com Tue Jun 5 08:56:31 2007 From: vuhuong at mellanox.com (Vu Pham) Date: Tue, 05 Jun 2007 08:56:31 -0700 Subject: [ofa-general] IB_GOLD ib_srp question In-Reply-To: <46646DC7.4030800@asc.hpc.mil> References: <46646DC7.4030800@asc.hpc.mil> Message-ID: <466587AF.50906@mellanox.com> MAHMOUD, For ib_gold 1.8.3, the parameter is max_xfer_sectors_per_io. You can change it when loading the srp module -vu > I am not sure if this is the best place to ask this or not.... > > Does any one know how to change "max_hw_sectors_kb" using ib_gold 1.8.3. > I know you can set it using max_sect on OFED1.2. > > The default for Ib_gold is 128KB which is to small. > > Thanks, From sean.hefty at intel.com Tue Jun 5 09:43:18 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 5 Jun 2007 09:43:18 -0700 Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557) In-Reply-To: Message-ID: <000001c7a790$9f44e490$3c98070a@amr.corp.intel.com> >I haven't followed this closely, but what's the impact of this bug? >It seems it would result in a port out of the configured range being >used. Which seems serious enough to fix for 2.6.22 to me. It can result in a port outside of the configured range, and its occurrence depends on the compiler used. I've pushed my patch to: git://git.openfabrics.org/~shefty/rdma-dev.git for-roland which is based on 2.6.22-rc4. Yosef, can you confirm that this patch works for you? - Sean From halr at voltaire.com Tue Jun 5 10:30:34 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Jun 2007 13:30:34 -0400 Subject: [ofa-general] Re: [PATCH] opensm: protect sminfo response In-Reply-To: <20070605152803.GA10519@sashak.voltaire.com> References: <20070605152803.GA10519@sashak.voltaire.com> Message-ID: <1181064634.12997.83723.camel@hal.voltaire.com> On Tue, 2007-06-05 at 11:28, Sasha Khapyorsky wrote: > This port_guid check protects SMInfo responses processing against port > moving issue. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From yosefe at voltaire.com Tue Jun 5 10:31:54 2007 From: yosefe at voltaire.com (Yosef Eitgin) Date: Tue, 5 Jun 2007 20:31:54 +0300 Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557) References: <000001c7a790$9f44e490$3c98070a@amr.corp.intel.com> Message-ID: <39C75744D164D948A170E9792AF8E7CA0A819D@exil.voltaire.com> >>I haven't followed this closely, but what's the impact of this bug? >>It seems it would result in a port out of the configured range being >>used. Which seems serious enough to fix for 2.6.22 to me. > >It can result in a port outside of the configured range, and its occurrence >depends on the compiler used. I've pushed my patch to: > > git://git.openfabrics.org/~shefty/rdma-dev.git for-roland > >which is based on 2.6.22-rc4. Yosef, can you confirm that this patch works for >you? > >- Sean I'm out of office right now, but from a little external test looks like this does the job. --Yossi From hanafim.ctr at asc.hpc.mil Tue Jun 5 10:48:42 2007 From: hanafim.ctr at asc.hpc.mil (MAHMOUD HANAFI) Date: Tue, 05 Jun 2007 13:48:42 -0400 Subject: [ofa-general] OFED vs IB_GOLD IB_SRP Performance results Message-ID: <4665A1FA.1000506@asc.hpc.mil> All, I have been evaluating srp performance using a DDN IB attached storage. I have looked at both OFED and IBGOLD. I have discovered some interesting results. Although the write performance is consistent between OFED and IBGOLD, the read performance is not. I looked at various tuning setting but have been unable to improve read performance of OFED. I am interested in getting feed back in regards to these results. As you can see in this chart that the IBGD out performs OFED at the larger record Lengths. READ CHART: http://www.clusteringsolutions.com/openib/Read.png WRITE CHART: http://www.clusteringsolutions.com/openib/Write.png Test Setup: Test software: xdd using direct IO Sever: Dell 2950 4 core 8GB memory Storage: DDN S2A 9500 LUN: 4 - 1 per tier (8+1) Blocksize = 4096 IB: SDR cisco Fiber Channel = 4 Gb/sec Qlogic using qla2400 driver kernel tested: 2.6.9-42.0.10.ELsmp and 2.6.9-42.0.10.EL_lustre.1.4.10smp IB Stack: OFED1.1, OFED1.2, and IB_GOLD1.8.3 IBGOLD Setup: /sys/module/ib_srp/dlid_conf = 0 /sys/module/ib_srp/fmr_cache = 0 /sys/module/ib_srp/ib_ports_mask = -1 /sys/module/ib_srp/max_cmds_per_lun = 1 /sys/module/ib_srp/max_luns = 256 /sys/module/ib_srp/max_srp_targets = 16 /sys/module/ib_srp/max_xfer_sectors_per_io = 8192 /sys/module/ib_srp/refcnt = 16 /sys/module/ib_srp/service_str = /sys/module/ib_srp/srp_discovery_timeout = 60 /sys/module/ib_srp/srp_tracelevel = 2 /sys/module/ib_srp/target_bindings = /sys/block/sdc/queue/max_hw_sectors_kb = 4096 /sys/block/sdc/queue/max_sectors_kb = 4096 /sys/block/sdc/queue/nr_requests = 8192 /sys/block/sdc/queue/read_ahead_kb = 128 OFED Setup: /sys/module/ib_srp/mellanox_workarounds = 1 /sys/module/ib_srp/refcnt = 11 /sys/module/ib_srp/srp_sg_tablesize = 256 /sys/module/ib_srp/topspin_workarounds = 1 /sys/block/sdd/queue/max_sectors_kb = 4096 /sys/block/sdd/queue/nr_requests = 8192 /sys/block/sdd/queue/read_ahead_kb = 128 Thanks, -- Mahmoud Hanafi Senior System Administrator ASC/MSRC www.asc.hpc.mil 2435 5th Street WPAFB, OHIO 45433 (937) 255-1536 From halr at voltaire.com Tue Jun 5 11:46:35 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Jun 2007 14:46:35 -0400 Subject: [ofa-general] [PATCH 2/2] infiniband-diags/ibidsverify.pl: Support port GUID validation Message-ID: <1181069190.12997.88623.camel@hal.voltaire.com> infiniband-diags/ibidsverify.pl: Support port GUID validation Note that original topology file format without port GUIDs is also supported in which case this validation is omitted. Signed-off-by: Hal Rosenstock diff --git a/infiniband-diags/scripts/ibidsverify.pl b/infiniband-diags/scripts/ibidsverify.pl index c9730c1..5d97eab 100755 --- a/infiniband-diags/scripts/ibidsverify.pl +++ b/infiniband-diags/scripts/ibidsverify.pl @@ -83,6 +83,7 @@ sub validate_non_zero_guid $insert_lid::lids = undef; $insert_nodeguid::nodeguids = undef; +$insert_portguid::portguids = undef; sub insert_lid { @@ -130,6 +131,29 @@ sub insert_nodeguid } } +sub insert_portguid +{ + my ($lid) = shift (@_); + my ($portguid) = shift (@_); + my ($nodetype) = shift (@_); + my $rec = undef; + my $status = ""; + + $status = validate_non_zero_guid($lid, $portguid, $nodetype); + if ($status eq 0) + { + if (defined($insert_portguid::portguids{$portguid})) + { + print "PortGUID $portguid already defined for LID $insert_portguid::portguids{$portguid}->{lid}\n"; + } + else + { + $rec = { lid => $lid, portguid => $portguid }; + $insert_portguid::portguids{$portguid} = $rec; + } + } +} + sub main { if ($regenerate_map || !(-f "$IBswcountlimits::cache_dir/ibnetdiscover.topology")) { generate_ibnetdiscover_topology; } @@ -146,19 +170,34 @@ sub main while ($line = ) { - if ($line =~ /^switchguid=(.*)/ || $line =~ /^caguid=(.*)/ || $line =~ /^rtguid=(.*)/) + if ($line =~ /^caguid=(.*)/ || $line =~ /^rtguid=(.*)/) { $nodeguid = $1; $nodetype = ""; } + if ($line =~ /^switchguid=(.*)/) + { + $nodeguid = $1; + $portguid = ""; + $nodetype = ""; + } + if ($nodeguid =~ /^switchguid=(.*)\((.*)\)/) + { + $nodeguid = $1; + $portguid = $2; + } + if ($line =~ /^Switch.*\"S-(.*)\"\s+# (.*) port.* lid (\d+) .*/) { $nodetype = "switch"; - $portguid = $1; $lid = $3; insert_lid($lid, $nodeguid, $nodetype); insert_nodeguid($lid, $nodeguid, $nodetype); + if ($portguid ne "") + { + insert_portguid($lid, $portguid, $nodetype); + } } if ($line =~ /^Ca.*/) { @@ -203,6 +242,11 @@ sub main $firstport = "no"; } } + if ($line =~ /^\[(\d+)\]\((.*)\)/) + { + $portguid = $2; + insert_portguid($lid, $portguid, $nodetype); + } } } From halr at voltaire.com Tue Jun 5 11:46:17 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Jun 2007 14:46:17 -0400 Subject: [ofa-general] [PATCH 1/2] infiniband-diags/ibnetdiscover: Add port GUIDs to topology file Message-ID: <1181069175.12997.88621.camel@hal.voltaire.com> infiniband-diags/ibnetdiscover: Add port GUIDs to topology file Signed-off-by: Hal Rosenstock diff --git a/infiniband-diags/man/ibnetdiscover.8 b/infiniband-diags/man/ibnetdiscover.8 index 84f7a20..48291d5 100644 --- a/infiniband-diags/man/ibnetdiscover.8 +++ b/infiniband-diags/man/ibnetdiscover.8 @@ -101,9 +101,9 @@ attempted to be fulfilled, and will fail The topology file format is human readable and largely intuitive. Most identifiers are given textual names like vendor ID (vendid), device ID (device ID), GUIDs of various types (sysimgguid, caguid, switchguid, etc.). -The IB node is identified followed by the number of ports and a quoted string -which contains the nodetype (S, H, R) followed by a - then followed by the -node GUID. On the right of this line is a comment (#) followed by the +PortGUIDs are shown in parentheses (). For switches, this is shown on the +switchguid line. For CA and router ports, it is shown on the connectivity lines. The IB node is identified followed by the number of ports and a quoted +the node GUID. On the right of this line is a comment (#) followed by the NodeDescription in quotes. If the node is a switch, this line also contains whether switch port 0 is base or enhanced, and the LID and LMC of port 0. Subsequent lines pertaining to this node show the connectivity. On the @@ -121,7 +121,7 @@ output line. An example of this is: .nf # -# Topology file: generated on Fri Jun 1 11:16:02 2007 +# Topology file: generated on Tue Jun 5 14:15:10 2007 # # Max of 3 hops discovered # Initiated from node 0008f10403960558 port 0008f10403960559 @@ -131,20 +131,20 @@ Non-Chassis Nodes vendid=0x8f1 devid=0x5a06 sysimgguid=0x5442ba00003000 -switchguid=0x5442ba00003080 +switchguid=0x5442ba00003080(5442ba00003080) Switch 24 "S-005442ba00003080" # "ISR9024 Voltaire" base port 0 lid 6 lmc 0 -[22] "H-0008f10403961354"[1] # "MT23108 InfiniHost Mellanox Technologies" lid 4 4xSDR +[22] "H-0008f10403961354"[1](8f10403961355) # "MT23108 InfiniHost Mellanox Technologies" lid 4 4xSDR [10] "S-0008f10400410015"[1] # "SW-6IB4 Voltaire" lid 3 4xSDR -[8] "H-0008f10403960558"[2] # "MT23108 InfiniHost Mellanox Technologies" lid 14 4xSDR +[8] "H-0008f10403960558"[2](8f1040396055a) # "MT23108 InfiniHost Mellanox Technologies" lid 14 4xSDR [6] "S-0008f10400410015"[3] # "SW-6IB4 Voltaire" lid 3 4xSDR -[12] "H-0008f10403960558"[1] # "MT23108 InfiniHost Mellanox Technologies" lid 10 4xSDR +[12] "H-0008f10403960558"[1](8f10403960559) # "MT23108 InfiniHost Mellanox Technologies" lid 10 4xSDR vendid=0x8f1 devid=0x5a05 -switchguid=0x8f10400410015 +switchguid=0x8f10400410015(8f10400410015) Switch 8 "S-0008f10400410015" # "SW-6IB4 Voltaire" base port 0 lid 3 lmc 0 -[6] "H-0008f10403960984"[1] # "MT23108 InfiniHost Mellanox Technologies" lid 16 4xSDR -[4] "H-005442b100004900"[1] # "MT23108 InfiniHost Mellanox Technologies" lid 12 4xSDR +[6] "H-0008f10403960984"[1](8f10403960985) # "MT23108 InfiniHost Mellanox Technologies" lid 16 4xSDR +[4] "H-005442b100004900"[1](5442b100004901) # "MT23108 InfiniHost Mellanox Technologies" lid 12 4xSDR [1] "S-005442ba00003080"[10] # "ISR9024 Voltaire" lid 6 1xSDR [3] "S-005442ba00003080"[6] # "ISR9024 Voltaire" lid 6 4xSDR @@ -152,26 +152,26 @@ vendid=0x2c9 devid=0x5a44 caguid=0x8f10403960984 Ca 2 "H-0008f10403960984" # "MT23108 InfiniHost Mellanox Technologies" -[1] "S-0008f10400410015"[6] # lid 16 lmc 1 "SW-6IB4 Voltaire" lid 3 4xSDR +[1](8f10403960985) "S-0008f10400410015"[6] # lid 16 lmc 1 "SW-6IB4 Voltaire" lid 3 4xSDR vendid=0x2c9 devid=0x5a44 caguid=0x5442b100004900 Ca 2 "H-005442b100004900" # "MT23108 InfiniHost Mellanox Technologies" -[1] "S-0008f10400410015"[4] # lid 12 lmc 1 "SW-6IB4 Voltaire" lid 3 4xSDR +[1](5442b100004901) "S-0008f10400410015"[4] # lid 12 lmc 1 "SW-6IB4 Voltaire" lid 3 4xSDR vendid=0x2c9 devid=0x5a44 caguid=0x8f10403961354 Ca 2 "H-0008f10403961354" # "MT23108 InfiniHost Mellanox Technologies" -[1] "S-005442ba00003080"[22] # lid 4 lmc 1 "ISR9024 Voltaire" lid 6 4xSDR +[1](8f10403961355) "S-005442ba00003080"[22] # lid 4 lmc 1 "ISR9024 Voltaire" lid 6 4xSDR vendid=0x2c9 devid=0x5a44 caguid=0x8f10403960558 Ca 2 "H-0008f10403960558" # "MT23108 InfiniHost Mellanox Technologies" -[2] "S-005442ba00003080"[8] # lid 14 lmc 1 "ISR9024 Voltaire" lid 6 4xSDR -[1] "S-005442ba00003080"[12] # lid 10 lmc 1 "ISR9024 Voltaire" lid 6 1xSDR +[2](8f1040396055a) "S-005442ba00003080"[8] # lid 14 lmc 1 "ISR9024 Voltaire" lid 6 4xSDR +[1](8f10403960559) "S-005442ba00003080"[12] # lid 10 lmc 1 "ISR9024 Voltaire" lid 6 1xSDR .fi When grouping is used, IB nodes are organized into chasses which are diff --git a/infiniband-diags/src/ibnetdiscover.c b/infiniband-diags/src/ibnetdiscover.c index c08aa61..c321d59 100644 --- a/infiniband-diags/src/ibnetdiscover.c +++ b/infiniband-diags/src/ibnetdiscover.c @@ -46,7 +46,7 @@ #include #include -#define __BUILD_VERSION_TAG__ 1.2.3 +#define __BUILD_VERSION_TAG__ 1.2.4 #include #include #include @@ -518,6 +518,7 @@ out_switch(Node *node, int group) out_ids(node); fprintf(f, "switchguid=0x%" PRIx64, node->nodeguid); + fprintf(f, "(%" PRIx64 ")", node->portguid); if (group) { if (node->chrecord) { if (node->chrecord->chassisnum) { @@ -617,6 +618,8 @@ out_switch_port(Port *port, int group) node_name(port->remoteport->node), port->remoteport->portnum, ext_port_str ? ext_port_str : ""); + if (port->remoteport->node->type != SWITCH_NODE) + fprintf(f, "(%" PRIx64 ") ", port->remoteport->portguid); fprintf(f, "\t\t# \"%s\" lid %d %s%s\n", rem_nodename, port->remoteport->node->type == SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->lid, @@ -634,12 +637,16 @@ out_ca_port(Port *port, int group) char *rem_nodename = NULL; fprintf(f, "[%d]", port->portnum); + if (port->node->type != SWITCH_NODE) + fprintf(f, "(%" PRIx64 ") ", port->portguid); fprintf(f, "\t%s[%d]", node_name(port->remoteport->node), port->remoteport->portnum); str = out_ext_port(port->remoteport, group); if (str) fprintf(f, "%s", str); + if (port->remoteport->node->type != SWITCH_NODE) + fprintf(f, " (%" PRIx64 ") ", port->remoteport->portguid); if (port->remoteport->node->type == SWITCH_NODE) rem_nodename = lookup_switch_name(switch_map_fp, From mshefty at ichips.intel.com Tue Jun 5 12:39:39 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 05 Jun 2007 12:39:39 -0700 Subject: [ofa-general] Re: [Query] ib add path record cache In-Reply-To: <1180633333.7116.172147.camel@hal.voltaire.com> References: <000101c7a3a9$876d8290$ff0da8c0@amr.corp.intel.com> <1180633333.7116.172147.camel@hal.voltaire.com> Message-ID: <4665BBFB.4070007@ichips.intel.com> > You'd need to use a vendor class 2 if you wanted to use RMPP as the SA > does. However, there is some rearranging you would need to do if you > compare the relevant MAD formats. Reading into the spec more, it seems our current choice is limited to using a vendor class. Application classes are controlled by the IBTA. Of the two vendor classes, class 2 clearly defines that RMPP is used, but also adds the OUI field to the MAD. This throws off using the SA MAD class format. I see a few possibilities: Use vendor class 1: There's no restriction on the MAD data. This would allow us to match the SA MAD class format exactly. The drawback is that we need to modify the MAD layer to identify the class as using RMPP. Use vendor class 2: Reading the spec, it looks like a reserved field in the MAD header is reserved even if using a vendor defined class. If this is the proper interpretation, then we either need to shift the SA data down 4-8 bytes, or we drop the first 4 bytes of the SM_Key. If we ever want to do more than simple path record caching, I think we'll want the full SM_Key. Between the remaining choices, my preference would be to adapt a class 1 for our purpose. Anyone else have thoughts on this? - Sean From mshefty at ichips.intel.com Tue Jun 5 14:23:13 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 05 Jun 2007 14:23:13 -0700 Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557) In-Reply-To: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com> References: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com> Message-ID: <4665D441.3060404@ichips.intel.com> Vlad, can you please pull this change into OFED? > next_port should be between sysctl_local_port_range[0] and [1]. However, > it is initially set to a random value. If the value is negative, next_port > can fall outside of this range because of the % operator returning a > negative value. > > Signed-off-by: Sean Hefty > --- > > drivers/infiniband/core/cma.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index eb15119..b0831cb 100644 > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -2772,8 +2772,8 @@ static int cma_init(void) > int ret; > > get_random_bytes(&next_port, sizeof next_port); > - next_port = (next_port % (sysctl_local_port_range[1] - > - sysctl_local_port_range[0])) + > + next_port = ((unsigned int) next_port % > + (sysctl_local_port_range[1] - sysctl_local_port_range[0])) + > sysctl_local_port_range[0]; > cma_wq = create_singlethread_workqueue("rdma_cm"); > if (!cma_wq) > > From vuhuong at mellanox.com Tue Jun 5 14:36:42 2007 From: vuhuong at mellanox.com (Vu Pham) Date: Tue, 05 Jun 2007 14:36:42 -0700 Subject: [ofa-general] OFED vs IB_GOLD IB_SRP Performance results In-Reply-To: <4665A1FA.1000506@asc.hpc.mil> References: <4665A1FA.1000506@asc.hpc.mil> Message-ID: <4665D76A.8080705@mellanox.com> Hi MAHMOUD, > All, > > I have been evaluating srp performance using a DDN IB attached storage. > I have looked at both OFED and IBGOLD. I have discovered some > interesting results. Although the write performance is consistent > between OFED and IBGOLD, the read performance is not. I looked at > various tuning setting but have been unable to improve read performance > of OFED. I am interested in getting feed back in regards to these results. > > As you can see in this chart that the IBGD out performs OFED at the > larger record Lengths. > READ CHART: http://www.clusteringsolutions.com/openib/Read.png > > WRITE CHART: http://www.clusteringsolutions.com/openib/Write.png > > > Test Setup: > Test software: xdd using direct IO > Sever: Dell 2950 4 core 8GB memory > Storage: DDN S2A 9500 > LUN: 4 - 1 per tier (8+1) Blocksize = 4096 > IB: SDR cisco > Fiber Channel = 4 Gb/sec Qlogic using qla2400 driver > kernel tested: 2.6.9-42.0.10.ELsmp and 2.6.9-42.0.10.EL_lustre.1.4.10smp > IB Stack: OFED1.1, OFED1.2, and IB_GOLD1.8.3 > > IBGOLD Setup: > /sys/module/ib_srp/dlid_conf = 0 > /sys/module/ib_srp/fmr_cache = 0 > /sys/module/ib_srp/ib_ports_mask = -1 > /sys/module/ib_srp/max_cmds_per_lun = 1 > /sys/module/ib_srp/max_luns = 256 > /sys/module/ib_srp/max_srp_targets = 16 > /sys/module/ib_srp/max_xfer_sectors_per_io = 8192 > /sys/module/ib_srp/refcnt = 16 > /sys/module/ib_srp/service_str = > /sys/module/ib_srp/srp_discovery_timeout = 60 > /sys/module/ib_srp/srp_tracelevel = 2 > /sys/module/ib_srp/target_bindings = > > /sys/block/sdc/queue/max_hw_sectors_kb = 4096 > /sys/block/sdc/queue/max_sectors_kb = 4096 > /sys/block/sdc/queue/nr_requests = 8192 > /sys/block/sdc/queue/read_ahead_kb = 128 > > OFED Setup: > /sys/module/ib_srp/mellanox_workarounds = 1 > /sys/module/ib_srp/refcnt = 11 > /sys/module/ib_srp/srp_sg_tablesize = 256 > /sys/module/ib_srp/topspin_workarounds = 1 > > /sys/block/sdd/queue/max_sectors_kb = 4096 For OFED drivers: what is the max_cmd_per_lun? (default is = SRP_SQ_SIZE = 63) You can set max_cmd_per_lun when adding target - please try 1, 2, 4, 8... You can check *cat /sys/class/scsi_host/hostXXX/cmd_per_lun* Another tuning requires edit/recompile srp driver + vi ib_srp.h and change SRP_RQ_SHIFT to 7 --> this will increase .can_queue and send_wq/recv_wq to 128 --> this can be translate to the increase of queue_depth + recompile srp driver -vu From halr at voltaire.com Tue Jun 5 14:37:51 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Jun 2007 17:37:51 -0400 Subject: [ofa-general] Re: [Query] ib add path record cache In-Reply-To: <4665BBFB.4070007@ichips.intel.com> References: <000101c7a3a9$876d8290$ff0da8c0@amr.corp.intel.com> <1180633333.7116.172147.camel@hal.voltaire.com> <4665BBFB.4070007@ichips.intel.com> Message-ID: <1181079457.12997.99729.camel@hal.voltaire.com> On Tue, 2007-06-05 at 15:39, Sean Hefty wrote: > > You'd need to use a vendor class 2 if you wanted to use RMPP as the SA > > does. However, there is some rearranging you would need to do if you > > compare the relevant MAD formats. > > Reading into the spec more, it seems our current choice is limited to > using a vendor class. Application classes are controlled by the IBTA. One could ask the IBTA for this if it is the right thing to do. > Of the two vendor classes, class 2 clearly defines that RMPP is used, > but also adds the OUI field to the MAD. This throws off using the SA > MAD class format. I see a few possibilities: > > Use vendor class 1: > There's no restriction on the MAD data. This would allow us to match > the SA MAD class format exactly. The drawback is that we need to modify > the MAD layer to identify the class as using RMPP. Are you saying to make the RMPP header as the first part of Data ? Vendor class 1 are not RMPP MADs so I think this is nonconformant. That's one reason vendor class 2 was added. In addition, there is no way to detect one "vendor" from another "vendor" (which is why OUI was added) if the same class is used so these need to be unique across all vendors. > Use vendor class 2: > Reading the spec, it looks like a reserved field in the MAD header is > reserved even if using a vendor defined class. If this is the proper > interpretation, It is. > then we either need to shift the SA data down 4-8 bytes, > or we drop the first 4 bytes of the SM_Key. I don't think the weakening the SM_Key is acceptable. > If we ever want to do more than simple path record caching, I think > we'll want the full SM_Key. Between the remaining choices, my > preference would be to adapt a class 1 for our purpose. Anyone else > have thoughts on this? The only choice seems to me to be reformatting using vendor class 2 and dealing with the data copying. -- Hal > - Sean From pourreza at cs.umanitoba.ca Tue Jun 5 15:04:28 2007 From: pourreza at cs.umanitoba.ca (Hossein Pourreza) Date: Tue, 5 Jun 2007 17:04:28 -0500 Subject: [ofa-general] Installing openIB on Linux FC5 Message-ID: <20070605220428.GA15154@helium-01.cs.umanitoba.ca> Hi all, I am new to infiniband stuff and am trying to configure an infiniband-based cluster using Linux FC 5. I downloaded the OFED-1.0 and tried to install it on cluster nodes. Now I can load the kernel modules without any error but I cannot run a simple test like ibv_ud_pingpong to check the connectivity of nodes in user-level. I loaded the following devices: ib_umad 25713 0 ib_ucm 26569 0 ib_cm 42521 1 ib_ucm ib_uverbs 47889 1 ib_ucm ib_mthca 133445 0 ib_ipoib 61361 0 ib_sa 25341 2 ib_cm,ib_ipoib ib_mad 46969 4 ib_umad,ib_cm,ib_mthca,ib_sa ib_core 63809 8 ib_umad,ib_ucm,ib_cm,ib_uverbs,ib_mthca,ib_ipoib,ib_sa,ib_mad Also I have the following devices in /dev/infiniband crw-rw---- 1 root root 231, 64 Jun 4 14:54 issm0 crw-rw---- 1 root root 231, 65 Jun 4 14:54 issm1 crw-rw---- 1 root root 231, 224 Jun 4 14:34 ucm0 crw-rw---- 1 root root 231, 0 Jun 4 14:54 umad0 crw-rw---- 1 root root 231, 1 Jun 4 14:54 umad1 crw-rw-rw- 1 root root 231, 192 Jun 4 14:34 uverbs0 ibroute shows all nodes and the switch and everything looks fine. When I run ibv_ud_pingpong on the two nodes I am getting the following messages: node 1 (server): local address: LID 0x0002, QPN 0x150406, PSN 0xb3a00d remote address: LID 0x0003, QPN 0x0c0406, PSN 0x8f0f99 node 2 (client): local address: LID 0x0003, QPN 0x0c0406, PSN 0x8f0f99 remote address: LID 0x0002, QPN 0x150406, PSN 0xb3a00d There is no message after these two lines. I am wondering if they are sending any packets or not. I should say that although I have given ip addresses to infiniband cards (ib0) they cannot ping each other using the normal Linux ping tool. Here is the result of ifconfig on these nodes: node 1 (server) ib0 Link encap:InfiniBand HWaddr 00:00:04:04:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:172.16.28.61 Bcast:172.16.255.255 Mask:255.255.0.0 UP BROADCAST MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) node 2 (client): ib0 Link encap:InfiniBand HWaddr 00:00:04:04:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:172.16.28.62 Bcast:172.16.255.255 Mask:255.255.0.0 UP BROADCAST MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Any help will be greatly appreciated. Hossein -- Hossein Pourreza mail: Department of Computer Science URL: http://www.cs.umanitoba.ca/~pourreza University of Manitoba Phone: 204-488-5611 Winnipeg, Manitoba, Canada R3T 2N2 From sashak at voltaire.com Tue Jun 5 17:04:41 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 6 Jun 2007 03:04:41 +0300 Subject: [ofa-general] [PATCH] libibmad: add notice DataDetails fields Message-ID: <20070606000441.GH10519@sashak.voltaire.com> This adds notice DataDetails fileds - generic one (as big array) and Trap144 specific fields. Signed-off-by: Sasha Khapyorsky --- libibmad/include/infiniband/mad.h | 3 +++ libibmad/src/fields.c | 3 +++ 2 files changed, 6 insertions(+), 0 deletions(-) diff --git a/libibmad/include/infiniband/mad.h b/libibmad/include/infiniband/mad.h index f01880b..ed286a9 100644 --- a/libibmad/include/infiniband/mad.h +++ b/libibmad/include/infiniband/mad.h @@ -382,7 +382,10 @@ enum MAD_FIELDS { IB_NOTICE_ISSUER_LID_F, IB_NOTICE_TOGGLE_F, IB_NOTICE_COUNT_F, + IB_NOTICE_DATA_DETAILS_F, IB_NOTICE_DATA_LID_F, + IB_NOTICE_DATA_144_LID_F, + IB_NOTICE_DATA_144_CAPMASK_F, /* * GS Performance diff --git a/libibmad/src/fields.c b/libibmad/src/fields.c index c453e06..18dc05b 100644 --- a/libibmad/src/fields.c +++ b/libibmad/src/fields.c @@ -216,7 +216,10 @@ ib_field_t ib_mad_f [] = { [IB_NOTICE_ISSUER_LID_F] {BITSOFFS(48, 16), "NoticeIssuerLID", mad_dump_uint}, [IB_NOTICE_TOGGLE_F] {BITSOFFS(64, 1), "NoticeToggle", mad_dump_uint}, [IB_NOTICE_COUNT_F] {BITSOFFS(65, 15), "NoticeCount", mad_dump_uint}, + [IB_NOTICE_DATA_DETAILS_F] {80, 432, "NoticeDataDetails", mad_dump_array}, [IB_NOTICE_DATA_LID_F] {BITSOFFS(80, 16), "NoticeDataLID", mad_dump_uint}, + [IB_NOTICE_DATA_144_LID_F] {BITSOFFS(96, 16), "NoticeDataTrap144LID", mad_dump_uint}, + [IB_NOTICE_DATA_144_CAPMASK_F] {BITSOFFS(128, 32), "NoticeDataTrap144CapMask", mad_dump_uint}, /* * NodeDescription fields: -- 1.5.2.136.g322bc From halr at voltaire.com Tue Jun 5 17:27:48 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Jun 2007 20:27:48 -0400 Subject: [ofa-general] Re: [PATCH] libibmad: add notice DataDetails fields In-Reply-To: <20070606000441.GH10519@sashak.voltaire.com> References: <20070606000441.GH10519@sashak.voltaire.com> Message-ID: <1181089666.12997.110779.camel@hal.voltaire.com> On Tue, 2007-06-05 at 20:04, Sasha Khapyorsky wrote: > This adds notice DataDetails fileds - generic one (as big array) > and Trap144 specific fields. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From sean.hefty at intel.com Tue Jun 5 21:06:30 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 5 Jun 2007 21:06:30 -0700 Subject: [ofa-general] Re: [Query] ib add path record cache In-Reply-To: <1181079457.12997.99729.camel@hal.voltaire.com> Message-ID: <000f01c7a7f0$1067dba0$11c8180a@amr.corp.intel.com> >One could ask the IBTA for this if it is the right thing to do. Checking with the IBTA makes sense. Longer term, adding a distributed SA application class, or expanding the existing SA class may be useful, if the IBTA wants to define SA implementation at this level of detail. However, I was trying to focus on what could be done now. If the IBTA would like to standardize the communication, that'd be great. One issue that isn't clear to me is what exactly is meant by the statement: "Vendor-specific classes will never be used to define management operations that are encompassed by the Infiniband Architecture." For example, suppose that there were a small number of SA caches available in the subnet. Is it compliant for a node to issue a PR query to one of the caches using a vendor-defined PR query? Or must this be done using an SA PR query with possible redirection? >Are you saying to make the RMPP header as the first part of Data ? Yes. >Vendor class 1 are not RMPP MADs so I think this is nonconformant. I didn't see any restriction on the vendor class 1 data - at least in section 16.5. If I'm mistaken on this, then I agree that vendor class 2 seems to be our only current option. >That's one reason vendor class 2 was added. In addition, there is no way >to detect one "vendor" from another "vendor" (which is why OUI was >added) if the same class is used so these need to be unique across all >vendors. Yes - all vendor class 1 MADs suffer from this issue. In practice, it seems that there can only be a single vendor for a given class on a subnet. >The only choice seems to me to be reformatting using vendor class 2 and >dealing with the data copying. >From an implementation viewpoint, this just seems less desirable. Adding the offset means that single-segment SA MAD may become our multi-segment vendor MAD, and dealing with two MAD formats will be troublesome. If we're only caching PRs, this may not be a big deal, but if we ever want to create a truly distributed SA, I think it will be. - Sean From tziporet at dev.mellanox.co.il Tue Jun 5 22:55:11 2007 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Wed, 06 Jun 2007 08:55:11 +0300 Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557) In-Reply-To: <4665D441.3060404@ichips.intel.com> References: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com> <4665D441.3060404@ichips.intel.com> Message-ID: <46664C3F.9050208@mellanox.co.il> Sean Hefty wrote: > Vlad, can you please pull this change into OFED? > Approved Tziporet >> next_port should be between sysctl_local_port_range[0] and [1]. >> However, >> it is initially set to a random value. If the value is negative, >> next_port >> can fall outside of this range because of the % operator returning a >> negative value. >> >> Signed-off-by: Sean Hefty >> --- >> >> drivers/infiniband/core/cma.c | 4 ++-- >> 1 files changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/infiniband/core/cma.c >> b/drivers/infiniband/core/cma.c >> index eb15119..b0831cb 100644 >> --- a/drivers/infiniband/core/cma.c >> +++ b/drivers/infiniband/core/cma.c >> @@ -2772,8 +2772,8 @@ static int cma_init(void) >> int ret; >> >> get_random_bytes(&next_port, sizeof next_port); >> - next_port = (next_port % (sysctl_local_port_range[1] - >> - sysctl_local_port_range[0])) + >> + next_port = ((unsigned int) next_port % >> + (sysctl_local_port_range[1] - >> sysctl_local_port_range[0])) + >> sysctl_local_port_range[0]; >> cma_wq = create_singlethread_workqueue("rdma_cm"); >> if (!cma_wq) >> >> > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From tziporet at dev.mellanox.co.il Tue Jun 5 23:09:53 2007 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Wed, 06 Jun 2007 09:09:53 +0300 Subject: [ofa-general] Installing openIB on Linux FC5 In-Reply-To: <20070605220428.GA15154@helium-01.cs.umanitoba.ca> References: <20070605220428.GA15154@helium-01.cs.umanitoba.ca> Message-ID: <46664FB1.6070402@mellanox.co.il> Hossein Pourreza wrote: > Hi all, > > I am new to infiniband stuff and am trying to configure an infiniband-based > cluster using Linux FC 5. I downloaded the OFED-1.0 and tried to install it on > cluster nodes. Now I can load the kernel modules without any error but I cannot > run a simple test like ibv_ud_pingpong to check the connectivity of nodes in > user-level. > > > Have you run opensm? You can run ibstat on each node to see ports are active Tziporet From yosefe at voltaire.com Tue Jun 5 23:11:48 2007 From: yosefe at voltaire.com (Yosef Etigin) Date: Wed, 06 Jun 2007 09:11:48 +0300 Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557) In-Reply-To: <000001c7a790$9f44e490$3c98070a@amr.corp.intel.com> References: <000001c7a790$9f44e490$3c98070a@amr.corp.intel.com> Message-ID: <46665024.8050502@voltaire.com> Sean Hefty wrote: >>I haven't followed this closely, but what's the impact of this bug? >>It seems it would result in a port out of the configured range being >>used. Which seems serious enough to fix for 2.6.22 to me. > > > It can result in a port outside of the configured range, and its occurrence > depends on the compiler used. I've pushed my patch to: > > git://git.openfabrics.org/~shefty/rdma-dev.git for-roland > > which is based on 2.6.22-rc4. Yosef, can you confirm that this patch works for > you? > > - Sean Yes, it works. Maybe we ahould use another variable ("unsigned coins;") to generate the random bytes to, so next_port will not be used for two different purposes? --Yossi From vlad at mellanox.co.il Tue Jun 5 23:24:01 2007 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 06 Jun 2007 09:24:01 +0300 Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557) In-Reply-To: <46664C3F.9050208@mellanox.co.il> References: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com> <4665D441.3060404@ichips.intel.com> <46664C3F.9050208@mellanox.co.il> Message-ID: <1181111041.1114.23.camel@vladsk-laptop> Done, Added kernel_patches/fixes/sean_cma_next_port_fix.patch Regards, Vladimir On Wed, 2007-06-06 at 08:55 +0300, Tziporet Koren wrote: > Sean Hefty wrote: > > Vlad, can you please pull this change into OFED? > > > Approved > > Tziporet > > >> next_port should be between sysctl_local_port_range[0] and [1]. > >> However, > >> it is initially set to a random value. If the value is negative, > >> next_port > >> can fall outside of this range because of the % operator returning a > >> negative value. > >> > >> Signed-off-by: Sean Hefty > >> --- > >> > >> drivers/infiniband/core/cma.c | 4 ++-- > >> 1 files changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/infiniband/core/cma.c > >> b/drivers/infiniband/core/cma.c > >> index eb15119..b0831cb 100644 > >> --- a/drivers/infiniband/core/cma.c > >> +++ b/drivers/infiniband/core/cma.c > >> @@ -2772,8 +2772,8 @@ static int cma_init(void) > >> int ret; > >> > >> get_random_bytes(&next_port, sizeof next_port); > >> - next_port = (next_port % (sysctl_local_port_range[1] - > >> - sysctl_local_port_range[0])) + > >> + next_port = ((unsigned int) next_port % > >> + (sysctl_local_port_range[1] - > >> sysctl_local_port_range[0])) + > >> sysctl_local_port_range[0]; > >> cma_wq = create_singlethread_workqueue("rdma_cm"); > >> if (!cma_wq) > >> > >> > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > From vlad at lists.openfabrics.org Wed Jun 6 02:43:14 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Wed, 6 Jun 2007 02:43:14 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070606-0200 daily build status Message-ID: <20070606094314.180BFE60824@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.18 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on powerpc with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.15 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.15 Passed on x86_64 with linux-2.6.18 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From halr at voltaire.com Wed Jun 6 02:45:12 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Jun 2007 05:45:12 -0400 Subject: [ofa-general] Re: [Query] ib add path record cache In-Reply-To: <000f01c7a7f0$1067dba0$11c8180a@amr.corp.intel.com> References: <000f01c7a7f0$1067dba0$11c8180a@amr.corp.intel.com> Message-ID: <1181123111.12997.147451.camel@hal.voltaire.com> On Wed, 2007-06-06 at 00:06, Sean Hefty wrote: > >One could ask the IBTA for this if it is the right thing to do. > > Checking with the IBTA makes sense. Longer term, adding a distributed SA > application class, or expanding the existing SA class may be useful, if the IBTA > wants to define SA implementation at this level of detail. However, I was > trying to focus on what could be done now. If the IBTA would like to > standardize the communication, that'd be great. > One issue that isn't clear to me is what exactly is meant by the statement: > "Vendor-specific classes will never be used to define management operations that > are encompassed by the Infiniband Architecture." I'm not sure pf the intent of this but that is informative rather than normative (compliance) text. > For example, suppose that > there were a small number of SA caches available in the subnet. Is it compliant > for a node to issue a PR query to one of the caches using a vendor-defined PR > query? Or must this be done using an SA PR query with possible redirection? I think this example falls would fall "on the line" and seems somewhat debatable as to whether there is a management operation for this or not. It does go back to the intent of the original statement you cited. > >Are you saying to make the RMPP header as the first part of Data ? > > Yes. > > >Vendor class 1 are not RMPP MADs so I think this is nonconformant. > > I didn't see any restriction on the vendor class 1 data - at least in section > 16.5. True but I'm not sure that was the intent which again was why vendor class 2 was created. Also, there is the problem of knowing that this vendor class 1 is using RMPP. That sounds proprietary to me (and affects the kernel in the OpenIB implementation). > If I'm mistaken on this, then I agree that vendor class 2 seems to be our > only current option. > > >That's one reason vendor class 2 was added. In addition, there is no way > >to detect one "vendor" from another "vendor" (which is why OUI was > >added) if the same class is used so these need to be unique across all > >vendors. > > Yes - all vendor class 1 MADs suffer from this issue. In practice, it seems > that there can only be a single vendor for a given class on a subnet. That's one way of putting it but limits the use; in fact, if this were done, all subnets would use at least two different vendors. Another way is that all vendors who want to use this class range need to coordinate such use (e.g. class allocation). > >The only choice seems to me to be reformatting using vendor class 2 and > >dealing with the data copying. > > >From an implementation viewpoint, this just seems less desirable. Adding the > offset means that single-segment SA MAD may become our multi-segment vendor MAD, > and dealing with two MAD formats will be troublesome. If we're only caching > PRs, this may not be a big deal, but if we ever want to create a truly > distributed SA, I think it will be. Are you referring to the performance hit ? -- Hal > - Sean From kliteyn at dev.mellanox.co.il Wed Jun 6 05:44:58 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 06 Jun 2007 15:44:58 +0300 Subject: [ofa-general] [PATCH] osm: fixing broken compilation when osm_vendor is simulator Message-ID: <4666AC4A.6020103@dev.mellanox.co.il> Hi Hal, The compilation of OpenSM with vendor=sim has been broken by the recent PerfManager patch. Adding missing include in the vendor's header. -- Yevgeny Signed-off-by: Yevgeny Kliteynik --- opensm/include/vendor/osm_vendor_mlx.h | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/opensm/include/vendor/osm_vendor_mlx.h b/opensm/include/vendor/osm_vendor_mlx.h index f220cc3..b3794cd 100644 --- a/opensm/include/vendor/osm_vendor_mlx.h +++ b/opensm/include/vendor/osm_vendor_mlx.h @@ -36,6 +36,7 @@ #ifndef _OSMV_H_ #define _OSMV_H_ +#include #include #include -- 1.5.1.4 From eli at mellanox.co.il Wed Jun 6 05:40:19 2007 From: eli at mellanox.co.il (Eli Cohen) Date: Wed, 06 Jun 2007 15:40:19 +0300 Subject: [ofa-general] [PATCH 1/2] libmlx4: fix SRQ buffer allocation Message-ID: <1181133649.10841.64.camel@mtls03> Roland, this patch and the complementary subsequent patch were not actually checked since the version I was working against is different than you "for-2.6.22" branch. But I did check this on against our build and it seems to work. Fix receive buffer allocation for SRQ QPs. Signed-off-by: Eli Cohen --- diff --git a/src/verbs.c b/src/verbs.c index 1feae9d..b800eb2 100644 --- a/src/verbs.c +++ b/src/verbs.c @@ -373,6 +373,13 @@ struct ibv_qp *mlx4_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr) return NULL; qp->sq.max = align_queue_size(pd->context, attr->cap.max_send_wr, 0); + + if (attr->srq) + attr->cap.max_recv_wr = 0; + else + attr->cap.max_recv_wr = attr->cap.max_recv_wr ? + attr->cap.max_recv_wr : 1; + qp->rq.max = align_queue_size(pd->context, attr->cap.max_recv_wr, 0); if (mlx4_alloc_qp_buf(pd, &attr->cap, attr->qp_type, qp)) From eli at mellanox.co.il Wed Jun 6 05:40:21 2007 From: eli at mellanox.co.il (Eli Cohen) Date: Wed, 06 Jun 2007 15:40:21 +0300 Subject: [ofa-general] [PATCH 2/2] IB/mlx4_ib: fix SRQ buffer allocation Message-ID: <1181133679.10841.66.camel@mtls03> Fix receive buffer allocation for SRQ QPs. Add checks to validate HW requirements when configuring QPs. Signed-off-by: Eli Cohen --- diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index dc137de..0117cf9 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -188,14 +188,27 @@ static int send_wqe_overhead(enum ib_qp_type type) } } -static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, - struct mlx4_ib_qp *qp) +static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *init_attr, + struct mlx4_ib_qp *qp, int kernel) { + struct ib_qp_cap *cap = &init_attr->cap; + /* Sanity check RQ size before proceeding */ if (cap->max_recv_wr > dev->dev->caps.max_wqes || cap->max_recv_sge > dev->dev->caps.max_rq_sg) return -EINVAL; + if (init_attr->srq) { + if (cap->max_recv_wr) + return -EINVAL; + } + else if (!cap->max_recv_wr) { + if (kernel) + cap->max_recv_wr = 1; + else + return -EINVAL; + } + qp->rq.max = cap->max_recv_wr ? roundup_pow_of_two(cap->max_recv_wr) : 0; qp->rq.wqe_shift = ilog2(roundup_pow_of_two(cap->max_recv_sge * @@ -257,6 +270,10 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, static int set_user_sq_size(struct mlx4_ib_qp *qp, struct mlx4_ib_create_qp *ucmd) { + /* Sanity check for SQ size */ + if (ucmd->log_sq_bb_count > 15 || ucmd->log_sq_stride > 11) + return -EINVAL; + qp->sq.max = 1 << ucmd->log_sq_bb_count; qp->sq.wqe_shift = ucmd->log_sq_stride; @@ -285,7 +302,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, qp->sq.head = 0; qp->sq.tail = 0; - err = set_rq_size(dev, &init_attr->cap, qp); + err = set_rq_size(dev, init_attr, qp, pd->uobject ? 0 : 1); if (err) goto err; @@ -733,9 +750,10 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, context->mtu_msgmax = (attr->path_mtu << 5) | 31; } - if (qp->rq.max) + if (qp->rq.max) { context->rq_size_stride = ilog2(qp->rq.max) << 3; - context->rq_size_stride |= qp->rq.wqe_shift - 4; + context->rq_size_stride |= qp->rq.wqe_shift - 4; + } if (qp->sq.max) context->sq_size_stride = ilog2(qp->sq.max) << 3; From halr at voltaire.com Wed Jun 6 05:59:14 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Jun 2007 08:59:14 -0400 Subject: [ofa-general] Re: [PATCH] osm: fixing broken compilation when osm_vendor is simulator In-Reply-To: <4666AC4A.6020103@dev.mellanox.co.il> References: <4666AC4A.6020103@dev.mellanox.co.il> Message-ID: <1181134754.12997.159939.camel@hal.voltaire.com> Hi Yevgeny, On Wed, 2007-06-06 at 08:44, Yevgeny Kliteynik wrote: > Hi Hal, > > The compilation of OpenSM with vendor=sim has been broken by the > recent PerfManager patch. > Adding missing include in the vendor's header. > > -- Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied. -- Hal From pourreza at cs.umanitoba.ca Wed Jun 6 07:08:05 2007 From: pourreza at cs.umanitoba.ca (Hossein Pourreza) Date: Wed, 6 Jun 2007 09:08:05 -0500 Subject: [ofa-general] Installing openIB on Linux FC5 In-Reply-To: <46664FB1.6070402@mellanox.co.il> References: <20070605220428.GA15154@helium-01.cs.umanitoba.ca> <46664FB1.6070402@mellanox.co.il> Message-ID: <20070606140805.GA10814@finch.cs.umanitoba.ca> Hi, Many thanks for your reply. I really appreciate that. Our cluster uses Mellanox Technologies MT23108 InfiniHost (rev a1) and Sun 9P switch. Out switch has its own SubnetManager and whenever I try to run opensm, I get an error saying that there is another sm running with a mismatch key. The result of running ibstat is like this: CA type: MT23108 Number of ports: 2 Firmware version: 3.3.2 Hardware version: a1 Node GUID: 0x0003ba0001001788 System image GUID: 0x0003ba000100178b Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 2 LMC: 0 SM lid: 1 Capability mask: 0x00510a68 Port GUID: 0x0003ba0001001789 Port 2: State: Down Physical state: Polling Rate: 2 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00510a68 Port GUID: 0x0003ba000100178a Is there anything wrong with this output? Many thanks for your kind help Hossein On Wed, Jun 06, 2007 at 09:09:53AM +0300, Tziporet Koren wrote: > Hossein Pourreza wrote: > >Hi all, > > > >I am new to infiniband stuff and am trying to configure an infiniband-based > >cluster using Linux FC 5. I downloaded the OFED-1.0 and tried to install > >it on > >cluster nodes. Now I can load the kernel modules without any error but I > >cannot > >run a simple test like ibv_ud_pingpong to check the connectivity of nodes > >in > >user-level. > > > > > > > Have you run opensm? > You can run ibstat on each node to see ports are active > > Tziporet -- Hossein Pourreza mail: Department of Computer Science URL: http://www.cs.umanitoba.ca/~pourreza University of Manitoba Phone: 204-488-5611 Winnipeg, Manitoba, Canada R3T 2N2 From halr at voltaire.com Wed Jun 6 07:21:31 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Jun 2007 10:21:31 -0400 Subject: [ofa-general] Installing openIB on Linux FC5 In-Reply-To: <20070606140805.GA10814@finch.cs.umanitoba.ca> References: <20070605220428.GA15154@helium-01.cs.umanitoba.ca> <46664FB1.6070402@mellanox.co.il> <20070606140805.GA10814@finch.cs.umanitoba.ca> Message-ID: <1181139682.12997.165263.camel@hal.voltaire.com> On Wed, 2007-06-06 at 10:08, Hossein Pourreza wrote: > Hi, > > Many thanks for your reply. I really appreciate that. > > Our cluster uses Mellanox Technologies MT23108 InfiniHost (rev a1) and Sun 9P switch. > Out switch has its own SubnetManager and whenever I try to run opensm, I get an error > saying that there is another sm running with a mismatch key. > > The result of running ibstat is like this: > > CA type: MT23108 > Number of ports: 2 > Firmware version: 3.3.2 > Hardware version: a1 > Node GUID: 0x0003ba0001001788 > System image GUID: 0x0003ba000100178b > Port 1: > State: Active > Physical state: LinkUp > Rate: 10 > Base lid: 2 > LMC: 0 > SM lid: 1 > Capability mask: 0x00510a68 > Port GUID: 0x0003ba0001001789 > Port 2: > State: Down > Physical state: Polling > Rate: 2 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x00510a68 > Port GUID: 0x0003ba000100178a > > Is there anything wrong with this output? Nothing wrong with the output :-) but is your port connected ? It appears there is some connectivity problem as Physical state is not LinkUp (and hence State is Down) so SM cannot configure it. -- Hal > Many thanks for your kind help > Hossein > On Wed, Jun 06, 2007 at 09:09:53AM +0300, Tziporet Koren wrote: > > Hossein Pourreza wrote: > > >Hi all, > > > > > >I am new to infiniband stuff and am trying to configure an infiniband-based > > >cluster using Linux FC 5. I downloaded the OFED-1.0 and tried to install > > >it on > > >cluster nodes. Now I can load the kernel modules without any error but I > > >cannot > > >run a simple test like ibv_ud_pingpong to check the connectivity of nodes > > >in > > >user-level. > > > > > > > > > > > Have you run opensm? > > You can run ibstat on each node to see ports are active > > > > Tziporet From chas at cmf.nrl.navy.mil Wed Jun 6 07:29:51 2007 From: chas at cmf.nrl.navy.mil (chas williams - CONTRACTOR) Date: Wed, 06 Jun 2007 10:29:51 -0400 Subject: [ofa-general] OFED vs IB_GOLD IB_SRP Performance results In-Reply-To: <4665A1FA.1000506@asc.hpc.mil> Message-ID: <200706061429.l56ETp4n012017@cmf.nrl.navy.mil> In message <4665A1FA.1000506 at asc.hpc.mil>,MAHMOUD HANAFI writes: >OFED Setup: >/sys/module/ib_srp/mellanox_workarounds = 1 >/sys/module/ib_srp/refcnt = 11 >/sys/module/ib_srp/srp_sg_tablesize = 256 >/sys/module/ib_srp/topspin_workarounds = 1 > >/sys/block/sdd/queue/max_sectors_kb = 4096 >/sys/block/sdd/queue/nr_requests = 8192 >/sys/block/sdd/queue/read_ahead_kb = 128 what is the max_hw_sectors_kb for the ofed target? unless you specified max_sect= during login, i suspect you are getting the system defaults. typically this is 512 sectors i think, which is where your performance seems to start to diverge. From pourreza at cs.umanitoba.ca Wed Jun 6 07:45:57 2007 From: pourreza at cs.umanitoba.ca (Hossein Pourreza) Date: Wed, 6 Jun 2007 09:45:57 -0500 Subject: [ofa-general] Installing openIB on Linux FC5 In-Reply-To: <1181139682.12997.165263.camel@hal.voltaire.com> References: <20070605220428.GA15154@helium-01.cs.umanitoba.ca> <46664FB1.6070402@mellanox.co.il> <20070606140805.GA10814@finch.cs.umanitoba.ca> <1181139682.12997.165263.camel@hal.voltaire.com> Message-ID: <20070606144557.GA11324@finch.cs.umanitoba.ca> On Wed, Jun 06, 2007 at 10:21:31AM -0400, Hal Rosenstock wrote: > On Wed, 2007-06-06 at 10:08, Hossein Pourreza wrote: > > Hi, > > > > Many thanks for your reply. I really appreciate that. > > > > Our cluster uses Mellanox Technologies MT23108 InfiniHost (rev a1) and Sun 9P switch. > > Out switch has its own SubnetManager and whenever I try to run opensm, I get an error > > saying that there is another sm running with a mismatch key. > > > > The result of running ibstat is like this: > > > > CA type: MT23108 > > Number of ports: 2 > > Firmware version: 3.3.2 > > Hardware version: a1 > > Node GUID: 0x0003ba0001001788 > > System image GUID: 0x0003ba000100178b > > Port 1: > > State: Active > > Physical state: LinkUp > > Rate: 10 > > Base lid: 2 > > LMC: 0 > > SM lid: 1 > > Capability mask: 0x00510a68 > > Port GUID: 0x0003ba0001001789 > > Port 2: > > State: Down > > Physical state: Polling > > Rate: 2 > > Base lid: 0 > > LMC: 0 > > SM lid: 0 > > Capability mask: 0x00510a68 > > Port GUID: 0x0003ba000100178a > > > > Is there anything wrong with this output? > > Nothing wrong with the output :-) but is your port connected ? It > appears there is some connectivity problem as Physical state is not > LinkUp (and hence State is Down) so SM cannot configure it. I only use port 1 of each HCA and I just connected those to the switch. Should I connect both ports? There are only 9 ports available on our switch and we have 5 nodes (10 ports in total). Thanks again for all you help Hossein > > -- Hal > > > Many thanks for your kind help > > Hossein > > On Wed, Jun 06, 2007 at 09:09:53AM +0300, Tziporet Koren wrote: > > > Hossein Pourreza wrote: > > > >Hi all, > > > > > > > >I am new to infiniband stuff and am trying to configure an infiniband-based > > > >cluster using Linux FC 5. I downloaded the OFED-1.0 and tried to install > > > >it on > > > >cluster nodes. Now I can load the kernel modules without any error but I > > > >cannot > > > >run a simple test like ibv_ud_pingpong to check the connectivity of nodes > > > >in > > > >user-level. > > > > > > > > > > > > > > > Have you run opensm? > > > You can run ibstat on each node to see ports are active > > > > > > Tziporet -- Hossein Pourreza mail: Department of Computer Science URL: http://www.cs.umanitoba.ca/~pourreza University of Manitoba Phone: 204-488-5611 Winnipeg, Manitoba, Canada R3T 2N2 From minich at ornl.gov Wed Jun 6 07:53:14 2007 From: minich at ornl.gov (Makia Minich) Date: Wed, 06 Jun 2007 10:53:14 -0400 Subject: [ofa-general] Installing openIB on Linux FC5 In-Reply-To: <20070606144557.GA11324@finch.cs.umanitoba.ca> References: <20070605220428.GA15154@helium-01.cs.umanitoba.ca> <1181139682.12997.165263.camel@hal.voltaire.com> <20070606144557.GA11324@finch.cs.umanitoba.ca> Message-ID: <200706061053.15014.minich@ornl.gov> I think that Hal missed that Port 1 is in active/link up state. More importantly, are you looking to replace your internal SubnetManager and just use OpenSM? If so, you'll need to go into the switch and disable it, then bring up opensm. On Wednesday 06 June 2007 10:45:57 am Hossein Pourreza wrote: > On Wed, Jun 06, 2007 at 10:21:31AM -0400, Hal Rosenstock wrote: > > On Wed, 2007-06-06 at 10:08, Hossein Pourreza wrote: > > > Hi, > > > > > > Many thanks for your reply. I really appreciate that. > > > > > > Our cluster uses Mellanox Technologies MT23108 InfiniHost (rev a1) and > > > Sun 9P switch. Out switch has its own SubnetManager and whenever I try > > > to run opensm, I get an error saying that there is another sm running > > > with a mismatch key. > > > > > > The result of running ibstat is like this: > > > > > > CA type: MT23108 > > > Number of ports: 2 > > > Firmware version: 3.3.2 > > > Hardware version: a1 > > > Node GUID: 0x0003ba0001001788 > > > System image GUID: 0x0003ba000100178b > > > Port 1: > > > State: Active > > > Physical state: LinkUp > > > Rate: 10 > > > Base lid: 2 > > > LMC: 0 > > > SM lid: 1 > > > Capability mask: 0x00510a68 > > > Port GUID: 0x0003ba0001001789 > > > Port 2: > > > State: Down > > > Physical state: Polling > > > Rate: 2 > > > Base lid: 0 > > > LMC: 0 > > > SM lid: 0 > > > Capability mask: 0x00510a68 > > > Port GUID: 0x0003ba000100178a > > > > > > Is there anything wrong with this output? > > > > Nothing wrong with the output :-) but is your port connected ? It > > appears there is some connectivity problem as Physical state is not > > LinkUp (and hence State is Down) so SM cannot configure it. > > I only use port 1 of each HCA and I just connected those to the switch. > Should I connect both ports? There are only 9 ports available on our switch > and we have 5 nodes (10 ports in total). > > Thanks again for all you help > Hossein > > > -- Hal > > > > > Many thanks for your kind help > > > Hossein > > > > > > On Wed, Jun 06, 2007 at 09:09:53AM +0300, Tziporet Koren wrote: > > > > Hossein Pourreza wrote: > > > > >Hi all, > > > > > > > > > >I am new to infiniband stuff and am trying to configure an > > > > > infiniband-based cluster using Linux FC 5. I downloaded the > > > > > OFED-1.0 and tried to install it on > > > > >cluster nodes. Now I can load the kernel modules without any error > > > > > but I cannot > > > > >run a simple test like ibv_ud_pingpong to check the connectivity of > > > > > nodes in > > > > >user-level. > > > > > > > > Have you run opensm? > > > > You can run ibstat on each node to see ports are active > > > > > > > > Tziporet -- Makia Minich National Center for Computation Science Oak Ridge National Laboratory --*-- Imagine no possessions I wonder if you can - John Lennon From minich at ornl.gov Wed Jun 6 07:53:14 2007 From: minich at ornl.gov (Makia Minich) Date: Wed, 06 Jun 2007 10:53:14 -0400 Subject: [ofa-general] Installing openIB on Linux FC5 In-Reply-To: <20070606144557.GA11324@finch.cs.umanitoba.ca> References: <20070605220428.GA15154@helium-01.cs.umanitoba.ca> <1181139682.12997.165263.camel@hal.voltaire.com> <20070606144557.GA11324@finch.cs.umanitoba.ca> Message-ID: <200706061053.15014.minich@ornl.gov> I think that Hal missed that Port 1 is in active/link up state. More importantly, are you looking to replace your internal SubnetManager and just use OpenSM? If so, you'll need to go into the switch and disable it, then bring up opensm. On Wednesday 06 June 2007 10:45:57 am Hossein Pourreza wrote: > On Wed, Jun 06, 2007 at 10:21:31AM -0400, Hal Rosenstock wrote: > > On Wed, 2007-06-06 at 10:08, Hossein Pourreza wrote: > > > Hi, > > > > > > Many thanks for your reply. I really appreciate that. > > > > > > Our cluster uses Mellanox Technologies MT23108 InfiniHost (rev a1) and > > > Sun 9P switch. Out switch has its own SubnetManager and whenever I try > > > to run opensm, I get an error saying that there is another sm running > > > with a mismatch key. > > > > > > The result of running ibstat is like this: > > > > > > CA type: MT23108 > > > Number of ports: 2 > > > Firmware version: 3.3.2 > > > Hardware version: a1 > > > Node GUID: 0x0003ba0001001788 > > > System image GUID: 0x0003ba000100178b > > > Port 1: > > > State: Active > > > Physical state: LinkUp > > > Rate: 10 > > > Base lid: 2 > > > LMC: 0 > > > SM lid: 1 > > > Capability mask: 0x00510a68 > > > Port GUID: 0x0003ba0001001789 > > > Port 2: > > > State: Down > > > Physical state: Polling > > > Rate: 2 > > > Base lid: 0 > > > LMC: 0 > > > SM lid: 0 > > > Capability mask: 0x00510a68 > > > Port GUID: 0x0003ba000100178a > > > > > > Is there anything wrong with this output? > > > > Nothing wrong with the output :-) but is your port connected ? It > > appears there is some connectivity problem as Physical state is not > > LinkUp (and hence State is Down) so SM cannot configure it. > > I only use port 1 of each HCA and I just connected those to the switch. > Should I connect both ports? There are only 9 ports available on our switch > and we have 5 nodes (10 ports in total). > > Thanks again for all you help > Hossein > > > -- Hal > > > > > Many thanks for your kind help > > > Hossein > > > > > > On Wed, Jun 06, 2007 at 09:09:53AM +0300, Tziporet Koren wrote: > > > > Hossein Pourreza wrote: > > > > >Hi all, > > > > > > > > > >I am new to infiniband stuff and am trying to configure an > > > > > infiniband-based cluster using Linux FC 5. I downloaded the > > > > > OFED-1.0 and tried to install it on > > > > >cluster nodes. Now I can load the kernel modules without any error > > > > > but I cannot > > > > >run a simple test like ibv_ud_pingpong to check the connectivity of > > > > > nodes in > > > > >user-level. > > > > > > > > Have you run opensm? > > > > You can run ibstat on each node to see ports are active > > > > > > > > Tziporet -- Makia Minich National Center for Computation Science Oak Ridge National Laboratory --*-- Imagine no possessions I wonder if you can - John Lennon From halr at voltaire.com Wed Jun 6 07:55:44 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Jun 2007 10:55:44 -0400 Subject: [ofa-general] Installing openIB on Linux FC5 In-Reply-To: <20070606144557.GA11324@finch.cs.umanitoba.ca> References: <20070605220428.GA15154@helium-01.cs.umanitoba.ca> <46664FB1.6070402@mellanox.co.il> <20070606140805.GA10814@finch.cs.umanitoba.ca> <1181139682.12997.165263.camel@hal.voltaire.com> <20070606144557.GA11324@finch.cs.umanitoba.ca> Message-ID: <1181141742.12997.167453.camel@hal.voltaire.com> On Wed, 2007-06-06 at 10:45, Hossein Pourreza wrote: > On Wed, Jun 06, 2007 at 10:21:31AM -0400, Hal Rosenstock wrote: > > On Wed, 2007-06-06 at 10:08, Hossein Pourreza wrote: > > > Hi, > > > > > > Many thanks for your reply. I really appreciate that. > > > > > > Our cluster uses Mellanox Technologies MT23108 InfiniHost (rev a1) and Sun 9P switch. > > > Out switch has its own SubnetManager and whenever I try to run opensm, I get an error > > > saying that there is another sm running with a mismatch key. > > > > > > The result of running ibstat is like this: > > > > > > CA type: MT23108 > > > Number of ports: 2 > > > Firmware version: 3.3.2 > > > Hardware version: a1 > > > Node GUID: 0x0003ba0001001788 > > > System image GUID: 0x0003ba000100178b > > > Port 1: > > > State: Active > > > Physical state: LinkUp > > > Rate: 10 > > > Base lid: 2 > > > LMC: 0 > > > SM lid: 1 > > > Capability mask: 0x00510a68 > > > Port GUID: 0x0003ba0001001789 > > > Port 2: > > > State: Down > > > Physical state: Polling > > > Rate: 2 > > > Base lid: 0 > > > LMC: 0 > > > SM lid: 0 > > > Capability mask: 0x00510a68 > > > Port GUID: 0x0003ba000100178a > > > > > > Is there anything wrong with this output? > > > > Nothing wrong with the output :-) but is your port connected ? It > > appears there is some connectivity problem as Physical state is not > > LinkUp (and hence State is Down) so SM cannot configure it. > > I only use port 1 of each HCA and I just connected those to the switch. Should I > connect both ports? There are only 9 ports available on our switch and we have 5 > nodes (10 ports in total). My bad :-( I just looked at port 2. Port 1 looks fine (active and has base and SM LIDs). -- Hal > Thanks again for all you help > Hossein > > > > > > -- Hal > > > > > Many thanks for your kind help > > > Hossein > > > On Wed, Jun 06, 2007 at 09:09:53AM +0300, Tziporet Koren wrote: > > > > Hossein Pourreza wrote: > > > > >Hi all, > > > > > > > > > >I am new to infiniband stuff and am trying to configure an infiniband-based > > > > >cluster using Linux FC 5. I downloaded the OFED-1.0 and tried to install > > > > >it on > > > > >cluster nodes. Now I can load the kernel modules without any error but I > > > > >cannot > > > > >run a simple test like ibv_ud_pingpong to check the connectivity of nodes > > > > >in > > > > >user-level. > > > > > > > > > > > > > > > > > > > Have you run opensm? > > > > You can run ibstat on each node to see ports are active > > > > > > > > Tziporet From Lanalafayettemetronome at rare-cancer.org Wed Jun 6 08:54:53 2007 From: Lanalafayettemetronome at rare-cancer.org (Jeannie Temple) Date: Wed, 6 Jun 2007 08:54:53 -0700 (PDT) Subject: [ofa-general] Administration Message-ID: <20070606155454.16D7BE602D9@openfabrics.org> Unsecured Business Loans !! As a business you can receive 20000 USD TODAY! - Unsecured. Fast and Easy Approval. - No Upfront or Hidden Fees. - Bad Credit - No Problem !! Approval IS Guaranteed. Call FREE 877~699~7817 to speak with a company representative. The gotta which had kept them both alive - and it had, for without it she surely would have murdered both him and herself long since - was also what had caused the loss of his thumb. He had done amazingly well for a man who had once found it impossible to write if he was out of cigarettes or if he had a backache or a headache a degree or two above a low drone. Josefina Fair From jackm at dev.mellanox.co.il Wed Jun 6 09:35:04 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Wed, 6 Jun 2007 19:35:04 +0300 Subject: [ofa-general] [PATCH] mlx4: fix overwriting of rnr_retry value during ib_modify_qp Message-ID: <200706061935.04671.jackm@dev.mellanox.co.il> Fixes zeroing out of RNR_RETRY parameter passed to modify_qp. Found by Mellanox firmware group Signed-off-by: Jack Morgenstein diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index dc137de..cd22975 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -762,11 +762,6 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, optpar |= MLX4_QP_OPTPAR_PKEY_INDEX; } - if (attr_mask & IB_QP_RNR_RETRY) { - context->params1 |= cpu_to_be32(attr->rnr_retry << 13); - optpar |= MLX4_QP_OPTPAR_RNR_RETRY; - } - if (attr_mask & IB_QP_AV) { if (mlx4_set_path(dev, &attr->ah_attr, &context->pri_path, attr_mask & IB_QP_PORT ? attr->port_num : qp->port)) { @@ -802,6 +797,12 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, context->pd = cpu_to_be32(to_mpd(ibqp->pd)->pdn); context->params1 = cpu_to_be32(MLX4_IB_ACK_REQ_FREQ << 28); + + if (attr_mask & IB_QP_RNR_RETRY) { + context->params1 |= cpu_to_be32(attr->rnr_retry << 13); + optpar |= MLX4_QP_OPTPAR_RNR_RETRY; + } + if (attr_mask & IB_QP_RETRY_CNT) { context->params1 |= cpu_to_be32(attr->retry_cnt << 16); optpar |= MLX4_QP_OPTPAR_RETRY_COUNT; From rdreier at cisco.com Wed Jun 6 10:12:07 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 06 Jun 2007 10:12:07 -0700 Subject: [ofa-general] Re: [PATCH] mlx4: fix overwriting of rnr_retry value during ib_modify_qp In-Reply-To: <200706061935.04671.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Wed, 6 Jun 2007 19:35:04 +0300") References: <200706061935.04671.jackm@dev.mellanox.co.il> Message-ID: thanks, applied. From jwong at datallegro.com Wed Jun 6 10:17:33 2007 From: jwong at datallegro.com (Jeffrey Wong) Date: Wed, 6 Jun 2007 13:17:33 -0400 Subject: [ofa-general] ibv_ud_pingpong error Message-ID: Hello, I have installed OFED1.2-rc4 on the my development machine. When I try to do a ping pong test I get the following error message: [ingres at centos5:master ~/jwong/tcp/ib_common_tcp]$ ibv_ud_pingpong libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. Couldn't create QP Thanks in advance. Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From gurhan.ozen at gmail.com Wed Jun 6 10:39:26 2007 From: gurhan.ozen at gmail.com (G.O.) Date: Wed, 6 Jun 2007 13:39:26 -0400 Subject: [ofa-general] ibv_ud_pingpong error In-Reply-To: References: Message-ID: <5849f1820706061039g575056cem839c505ed227ab1b@mail.gmail.com> On 6/6/07, Jeffrey Wong wrote: > Hello, > > I have installed OFED1.2-rc4 on the my development machine. When I try to > do a ping pong test I get the following error message: > > > > [ingres at centos5:master ~/jwong/tcp/ib_common_tcp]$ ibv_ud_pingpong > > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. > > This will severely limit memory registrations. > > Couldn't create QP > Hi, If you are using bash shell do something like: ulimit -l unlimited to get rid of that limit and try again. You can alternatively set it to a large number. Note that using ulimit only changes the limit in the current shell, you'll have to edit system-wide configuration file to make it permanent. Hope this helps. Gurhan > > > > > Thanks in advance. > > > > Jeff > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Wed Jun 6 10:35:53 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 06 Jun 2007 10:35:53 -0700 Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557) In-Reply-To: <000001c7a790$9f44e490$3c98070a@amr.corp.intel.com> (Sean Hefty's message of "Tue, 5 Jun 2007 09:43:18 -0700") References: <000001c7a790$9f44e490$3c98070a@amr.corp.intel.com> Message-ID: thanks, I merged this. From artginer at unizar.es Wed Jun 6 10:43:00 2007 From: artginer at unizar.es (Arturo Giner Gracia) Date: Wed, 06 Jun 2007 19:43:00 +0200 Subject: [ofa-general] libmthca error Message-ID: <4666F224.9050008@unizar.es> Dear sir or Madam, I'm triying to compile libmthca from git repository (https://wiki.openfabrics.org/tiki-index.php?page=Installation+Cheat+Sheet) and every thing was ok until compile this lib. The error is "checking size of long... configure: error: cannot compute sizeof (long)". Can you help me with this? Another question: ¿Which is the best repository to download infiniband sources to compile with a kernel 2.6.18.8-0.1-default ? We have InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode). Thanks in advance Arturo From jwong at datallegro.com Wed Jun 6 10:52:54 2007 From: jwong at datallegro.com (Jeffrey Wong) Date: Wed, 6 Jun 2007 13:52:54 -0400 Subject: [ofa-general] ibv_ud_pingpong error In-Reply-To: <5849f1820706061039g575056cem839c505ed227ab1b@mail.gmail.com> Message-ID: Seems that if I'm logged in as root the command works fine without making any settings changes, but if I'm logged in as another user I am getting the same error. Are there permissions that I need to set on binaries in order to run the pingpong as a regular user instead of root. Thanks, Jeff -----Original Message----- From: G.O. [mailto:gurhan.ozen at gmail.com] Sent: Wednesday, June 06, 2007 10:39 AM To: Jeffrey Wong Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] ibv_ud_pingpong error On 6/6/07, Jeffrey Wong wrote: > Hello, > > I have installed OFED1.2-rc4 on the my development machine. When I try to > do a ping pong test I get the following error message: > > > > [ingres at centos5:master ~/jwong/tcp/ib_common_tcp]$ ibv_ud_pingpong > > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. > > This will severely limit memory registrations. > > Couldn't create QP > Hi, If you are using bash shell do something like: ulimit -l unlimited to get rid of that limit and try again. You can alternatively set it to a large number. Note that using ulimit only changes the limit in the current shell, you'll have to edit system-wide configuration file to make it permanent. Hope this helps. Gurhan > > > > > Thanks in advance. > > > > Jeff > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From gshipman at lanl.gov Wed Jun 6 10:58:45 2007 From: gshipman at lanl.gov (Galen Shipman) Date: Wed, 6 Jun 2007 11:58:45 -0600 Subject: [ofa-general] ibv_ud_pingpong error In-Reply-To: References: Message-ID: Hey Jeff, I think you need to up your locked memory limits, We have a faq entry on this here: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages Ignore the Open MPI specific parts, the rest I think applies. - Galen On Jun 6, 2007, at 11:52 AM, Jeffrey Wong wrote: > > > Seems that if I'm logged in as root the command works fine without > making any settings changes, but if I'm logged in as another user I am > getting the same error. Are there permissions that I need to set on > binaries in order to run the pingpong as a regular user instead of > root. > > > Thanks, > Jeff > > -----Original Message----- > From: G.O. [mailto:gurhan.ozen at gmail.com] > Sent: Wednesday, June 06, 2007 10:39 AM > To: Jeffrey Wong > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] ibv_ud_pingpong error > > On 6/6/07, Jeffrey Wong wrote: >> Hello, >> >> I have installed OFED1.2-rc4 on the my development machine. When I > try to >> do a ping pong test I get the following error message: >> >> >> >> [ingres at centos5:master ~/jwong/tcp/ib_common_tcp]$ ibv_ud_pingpong >> >> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. >> >> This will severely limit memory registrations. >> >> Couldn't create QP >> > > Hi, > If you are using bash shell do something like: > > ulimit -l unlimited > > to get rid of that limit and try again. You can alternatively set > it to a large number. Note that using ulimit only changes the limit > in the current shell, you'll have to edit system-wide configuration > file to make it permanent. > > Hope this helps. > Gurhan >> >> >> >> >> Thanks in advance. >> >> >> >> Jeff >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general From jwong at datallegro.com Wed Jun 6 11:17:49 2007 From: jwong at datallegro.com (Jeffrey Wong) Date: Wed, 6 Jun 2007 14:17:49 -0400 Subject: [ofa-general] ibv_ud_pingpong error In-Reply-To: Message-ID: Thanks very much. After setting the limits.conf file for the user with the memlock setting as unlimited I can now do the ibv_ud_pingpong. Thanks again. Jeff -----Original Message----- From: Galen Shipman [mailto:gshipman at lanl.gov] Sent: Wednesday, June 06, 2007 10:59 AM To: Jeffrey Wong Cc: G.O.; general at lists.openfabrics.org Subject: Re: [ofa-general] ibv_ud_pingpong error Hey Jeff, I think you need to up your locked memory limits, We have a faq entry on this here: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages Ignore the Open MPI specific parts, the rest I think applies. - Galen On Jun 6, 2007, at 11:52 AM, Jeffrey Wong wrote: > > > Seems that if I'm logged in as root the command works fine without > making any settings changes, but if I'm logged in as another user I am > getting the same error. Are there permissions that I need to set on > binaries in order to run the pingpong as a regular user instead of > root. > > > Thanks, > Jeff > > -----Original Message----- > From: G.O. [mailto:gurhan.ozen at gmail.com] > Sent: Wednesday, June 06, 2007 10:39 AM > To: Jeffrey Wong > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] ibv_ud_pingpong error > > On 6/6/07, Jeffrey Wong wrote: >> Hello, >> >> I have installed OFED1.2-rc4 on the my development machine. When I > try to >> do a ping pong test I get the following error message: >> >> >> >> [ingres at centos5:master ~/jwong/tcp/ib_common_tcp]$ ibv_ud_pingpong >> >> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. >> >> This will severely limit memory registrations. >> >> Couldn't create QP >> > > Hi, > If you are using bash shell do something like: > > ulimit -l unlimited > > to get rid of that limit and try again. You can alternatively set > it to a large number. Note that using ulimit only changes the limit > in the current shell, you'll have to edit system-wide configuration > file to make it permanent. > > Hope this helps. > Gurhan >> >> >> >> >> Thanks in advance. >> >> >> >> Jeff >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general From jwong at datallegro.com Wed Jun 6 11:29:39 2007 From: jwong at datallegro.com (Jeffrey Wong) Date: Wed, 6 Jun 2007 14:29:39 -0400 Subject: [ofa-general] Having trouble pingpong between two nodes. Message-ID: Hello, I am trying to run a ibv_ud_pingpong between two nodes but I can't seem to get them to communicate. I have used the ping command between the ib interfaces and that works fine, but when I try to use the ibv_ud_ping pong it says the following: ________________________________________________________________________ ________ root at centos5:node1 ~]# ibv_ud_pingpong 193.168.10.254 local address: LID 0x0002, QPN 0x0f0406, PSN 0xb067dc Couldn't connect to 193.168.10.254:18515 ________________________________________________________________________ ____ I have the subnet manager running on node2. When I run the ibchecknet I get the following errors: #warn: counter SymbolErrors = 65535 (threshold 10) #warn: counter LinkDowned = 78 (threshold 10) #warn: counter RcvSwRelayErrors = 261 (threshold 100) #warn: counter XmtDiscards = 173 (threshold 100) Error check on lid 5 (MT47396 Infiniscale-III Mellanox Technologies) port all: FAILED #warn: counter SymbolErrors = 65535 (threshold 10) Error check on lid 5 (MT47396 Infiniscale-III Mellanox Technologies) port 18: FAILED # Checked Switch: nodeguid 0x0002c9010d26dc90 with failure #warn: counter SymbolErrors = 65535 (threshold 10) #warn: counter LinkDowned = 13 (threshold 10) Error check on lid 5 (MT47396 Infiniscale-III Mellanox Technologies) port 16: FAILED #warn: counter SymbolErrors = 65535 (threshold 10) #warn: counter LinkDowned = 13 (threshold 10) Error check on lid 5 (MT47396 Infiniscale-III Mellanox Technologies) port 15: FAILED #warn: counter SymbolErrors = 65535 (threshold 10) Error check on lid 5 (MT47396 Infiniscale-III Mellanox Technologies) port 14: FAILED #warn: counter SymbolErrors = 65535 (threshold 10) Error check on lid 5 (MT47396 Infiniscale-III Mellanox Technologies) port 13: FAILED #warn: counter SymbolErrors = 65535 (threshold 10) #warn: counter XmtDiscards = 173 (threshold 100) Error check on lid 5 (MT47396 Infiniscale-III Mellanox Technologies) port 17: FAILED # Checking Ca: nodeguid 0x0002c9020020080c # Checking Ca: nodeguid 0x0002c902002015c0 # Checking Ca: nodeguid 0x0002c9020020590c ## Summary: 4 nodes checked, 0 bad nodes found ## 12 ports checked, 0 bad ports found ## 6 ports have errors beyond threshold ________________________________________________________________________ _____ I am trying to ping from node 1 to node 2 1st node configuration: ib0 Link encap:InfiniBand HWaddr 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:193.168.10.1 Bcast:193.168.10.255 Mask:255.255.255.0 inet6 addr: fe80::202:c902:20:80d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:150 errors:0 dropped:0 overruns:0 frame:0 TX packets:37 errors:0 dropped:9 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:35356 (34.5 KiB) TX bytes:7624 (7.4 KiB) ib1 Link encap:InfiniBand HWaddr 80:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:194.168.10.1 Bcast:194.168.10.255 Mask:255.255.255.0 inet6 addr: fe80::202:c902:20:80e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:148 errors:0 dropped:0 overruns:0 frame:0 TX packets:34 errors:0 dropped:9 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:35156 (34.3 KiB) TX bytes:7496 (7.3 KiB) ____________________________________________________________________ [root at centos5:node1 ~]# ibstat CA 'mthca0' CA type: MT25208 Number of ports: 2 Firmware version: 5.0.1 Hardware version: a0 Node GUID: 0x0002c9020020080c System image GUID: 0x0002c9020020080f Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 2 LMC: 0 SM lid: 1 Capability mask: 0x00510a68 Port GUID: 0x0002c9020020080d Port 2: State: Active Physical state: LinkUp Rate: 10 Base lid: 3 LMC: 0 SM lid: 1 Capability mask: 0x00510a68 Port GUID: 0x0002c9020020080e ___________________________________________________________ Node 2 ib0 Link encap:InfiniBand HWaddr 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:193.168.10.254 Bcast:193.168.10.255 Mask:255.255.255.0 inet6 addr: fe80::202:c902:20:590d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:102 errors:0 dropped:0 overruns:0 frame:0 TX packets:42 errors:0 dropped:9 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:23750 (23.1 KiB) TX bytes:8048 (7.8 KiB) ib1 Link encap:InfiniBand HWaddr 80:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:194.168.10.254 Bcast:194.168.10.255 Mask:255.255.255.0 inet6 addr: fe80::202:c902:20:590e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:94 errors:0 dropped:0 overruns:0 frame:0 TX packets:31 errors:0 dropped:9 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:23286 (22.7 KiB) TX bytes:7260 (7.0 KiB) _________________________________________________________ [root at centos5:master /opt/CA]# ibstat CA 'mthca0' CA type: MT25208 Number of ports: 2 Firmware version: 5.1.0 Hardware version: a0 Node GUID: 0x0002c9020020590c System image GUID: 0x0002c9020020590f Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 1 LMC: 0 SM lid: 1 Capability mask: 0x02510a6a Port GUID: 0x0002c9020020590d Port 2: State: Active Physical state: LinkUp Rate: 10 Base lid: 4 LMC: 0 SM lid: 1 Capability mask: 0x02510a68 Port GUID: 0x0002c9020020590e Thanks in advance, Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From bob.kossey at hp.com Wed Jun 6 11:53:28 2007 From: bob.kossey at hp.com (Bob Kossey) Date: Wed, 06 Jun 2007 14:53:28 -0400 Subject: [ofa-general] Re: ipoib / bonding and OFED In-Reply-To: References: <3857BB049D83424D9DB82753D37CEA55459C41@taurus.voltaire.com><4657373E.2030903@hp.com> <465BDC90.5080305@voltaire.com> Message-ID: <466702A8.5080302@hp.com> Just to follow up on this, using RHEL5 and OFED 1.2 rc4, I was able to do enough rudimentary testing to convince myself that IB bonding was working. I was able to use ib-bond, as well as the use of the openib.conf file to enable bonding on startup, including both(separately) IPOIBBOND_ENABLE and IPOIBHA_ENABLE. One thing I was not able to do however, was to start IB bonding using the standard bonding modifications to /etc/modprobe.conf and /etc/sysconfig/network-scripts/ifcfg* files. Should this be possible, and are there perhaps some required settings I am missing? I'll include my file modifications and some output below. modprobe.conf: alias bond0 bonding options bond0 mode=active-backup miimon=100 ifcfg-bond0: DEVICE=bond0 IPADDR="172.22.0.23" NETMASK="255.255.0.0" NETWORK="172.22.0.0" BROADCAST="172.22.255.255" ONBOOT=yes BOOTPROTO=none USERCTL=no BONDING_SLAVE0=ib0 BONDING_SLAVE0=ib1 ifcfg-ib0: DEVICE=ib0 USERCTL=no ONBOOT=yes MASTER=bond0 SLAVE=yes BOOTPROTO=none ifcfg-ib1: DEVICE=ib1 USERCTL=no ONBOOT=yes MASTER=bond0 SLAVE=yes BOOTPROTO=none [root at njxc6-rhel5 ~]# ifconfig bond0 Link encap:InfiniBand HWaddr 80:00:04:05:FE:80:00:00:00:00:00:00:00:0 0:00:00:00:00:00:00 inet addr:172.22.0.23 Bcast:172.22.255.255 Mask:255.255.0.0 UP BROADCAST MASTER MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:18 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:1352 (1.3 KiB) dmesg: ... Ethernet Channel Bonding Driver: v3.1.1 (September 26, 2006) bonding: MII link monitoring set to 100 ms ADDRCONF(NETDEV_UP): bond0: link is not ready bonding: bond0: Adding slave ib0. bonding: bond0: Warning: enslaved VLAN challenged slave ib0. Adding VLANs will b e blocked as long as ib0 is part of bond bond0 bonding: bond0: Warning: The first slave device you specified does not support s etting the MAC address. This bond MAC address would be that of the active slave. ADDRCONF(NETDEV_UP): ib0: link is not ready bonding: bond0: Warning: failed to get speed and duplex from ib0, assumed to be 100Mb/sec and Full. bonding: bond0: making interface ib0 the new active one. bondingbond_send_grat_arp: bond bond0 slave ib0 bonding: bond0: first active interface up! bonding: bond0: enslaving ib0 as an active interface with an up link. bonding: bond0: Adding slave ib1. bonding: bond0: Warning: enslaved VLAN challenged slave ib1. Adding VLANs will b e blocked as long as ib1 is part of bond bond0 ADDRCONF(NETDEV_UP): ib1: link is not ready bonding: bond0: Warning: failed to get speed and duplex from ib1, assumed to be 100Mb/sec and Full. bonding: bond0: enslaving ib1 as a backup interface with an up link. ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready ... bonding: bond0: Interface ib0 is already enslaved! ib0: enabling connected mode will cause multicast packet drops ib0: mtu > 2044 will cause multicast packet drops. bonding: bond0: link status definitely down for interface ib0, disabling it bonding: bond0: making interface ib1 the new active one. bondingbond_send_grat_arp: bond bond0 slave ib1 bonding: bond0: Interface ib1 is already enslaved! ib1: enabling connected mode will cause multicast packet drops ib1: mtu > 2044 will cause multicast packet drops. bonding: bond0: link status definitely down for interface ib1, disabling it bondingbond_send_grat_arp: bond bond0 slave NULL bonding: bond0: now running without any active interface ! Thanks, Bob Scott Weitzenkamp (sweitzen) wrote: > Bob, it is now possible to configure IPoIB bonding in > /etc/infiniband/openib.conf, this configuration file includes the > following boilerplate. > > # Enable the bonding driver on startup > IPOIBBOND_ENABLE=no > # Set bond interface names > #IPOIB_BONDS=bond0,bond1 > # Set specific bond params; address and slaves > #bond0_IP=10.10.10.1 > #bond0_SLAVES=ib0,ib1 > #bond1_IP=20.10.10.1 > #bond1_SLAVES=ib2,ib3,ib4 > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > >> -----Original Message----- >> From: general-bounces at lists.openfabrics.org >> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Or Gerlitz >> Sent: Tuesday, May 29, 2007 12:56 AM >> To: Bob Kossey >> Cc: OpenFabrics General >> Subject: [ofa-general] Re: ipoib / bonding and OFED >> >> Bob Kossey wrote: >> >>> I copied OR since I think this is related to his OFED HA work, and >>> he might have some insights. A few more questions for Or: >>> I was trying to use ipoib bonding with OFED 1.2 rc2 and a >>> >> 2.6.9 kernel, >> >>> but was not able to get it to work so far. I saw your >>> >> Sonoma bonding >> >>> slides, and you mention kernel bonding driver changes were needed. >>> 2. Is there a minimum kernel version, with the kernel bonding driver >>> changes, that is required to use bonding with OFED ipoib? >>> >> Just to have a base line here: to get bonding to work with IPoIB, you >> should use the bonding driver provided with OFED 1.2. This >> driver is the >> upstream one (of 2.6.20) being patched to support IPoIB and >> backported >> to RH5, SLES10 and RH4 U3/4/5, other kernels are not supported. >> >> If you were using the ofed bonding on a system that matches >> the support >> matrix it should worl. If do have problems under this config, please >> either open a bug at the ofed bugzilla >> @ bugs.openfabrics.org assigned to monis at voltaire.com (Moni Shoua) or >> send first report/question to Moni and CC ewg at lists.openfabrics.org >> >> Please note that between RC2 and RC4 (to be released today etc) some >> bugs were fixed, you can search in the bugzilla to see what. >> >> >>> 3. The bonding driver uses the HWADDR from the underlying ipoib >>> devices, how does it obtain the HWADDR? Does it use the >>> >> full 20 bytes, >> >>> or some subset? >>> >> when enslaving IPoIB devices, the bonding driver uses the full hw >> address of the active slave, it simply looks on the dev_addr field of >> the slave struct netdevice (see include/linux/netdevice.h) >> >> >>> 4. What use_carrier options for link status detection does >>> >> OFED ipoib >> >>> support, >>> MII, ETHTOOL or netif_carrier_ok? >>> >> the mii/ethertool etc local link detection methods of the >> bonding driver >> are somehow deprecated, since nowadays almost any network device >> support the netif_carrier_ok call. The --default-- of the upstream >> bonding driver (eg the one we use in OFED and the 2.6.21 >> listed below) >> is to set the use_carrier mod param to 1 that is mii is not >> used anymore. >> >> >>> author: Thomas Davis, tadavis at lbl.gov and many others >>> description: Ethernet Channel Bonding Driver, v3.1.2 >>> version: 3.1.2 >>> parm: use_carrier:Use netif_carrier_ok (vs MII >>> >> ioctls) in miimon; 0 for off, 1 for on (default) (int) >> >>> parm: miimon:Link check interval in milliseconds (int) >>> >>> If you have any good examples of bonding configuration >>> >> settings that work >> >>> with OFED, I'd appreciate that also. >>> >> The bonding RPM provided with OFED is made of a driver, >> script and some >> help text containing usage examples, please take a look there >> and let me >> know if you have further questions. >> >> >>> $ rpm -ql ib-bonding-0.9.0-2.6.9_42.ELsmp >>> >>> >> /lib/modules/2.6.9-42.ELsmp/updates/kernel/drivers/net/bonding >> /bonding.ko >> >>> /usr/bin/ib-bond >>> /usr/share/doc/ib-bonding-0.9.0/ib-bonding.txt >>> >> The ofed service (/etc/init.d/openibd) was enhanced to allow for >> --persistent-- bonding configuration, please see the bonding >> section at >> docs/ipoib_release_notes.txt to see how to do it. >> >> Or. >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> >> From sweitzen at cisco.com Wed Jun 6 11:55:28 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 6 Jun 2007 11:55:28 -0700 Subject: [ofa-general] Re: ipoib / bonding and OFED In-Reply-To: <466702A8.5080302@hp.com> References: <3857BB049D83424D9DB82753D37CEA55459C41@taurus.voltaire.com><4657373E.2030903@hp.com> <465BDC90.5080305@voltaire.com> <466702A8.5080302@hp.com> Message-ID: You should use openibd.conf, not ifcfg-*, for configuring bonding at boot time. Scott > -----Original Message----- > From: Bob Kossey [mailto:bob.kossey at hp.com] > Sent: Wednesday, June 06, 2007 11:53 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Or Gerlitz; OpenFabrics General > Subject: Re: [ofa-general] Re: ipoib / bonding and OFED > > Just to follow up on this, using RHEL5 and OFED 1.2 rc4, I was able > to do enough rudimentary testing to convince myself that IB > bonding was working. I was able to use ib-bond, as well > as the use of the openib.conf file to enable bonding on startup, > including both(separately) IPOIBBOND_ENABLE and IPOIBHA_ENABLE. > > One thing I was not able to do however, was to start IB bonding > using the standard bonding modifications to /etc/modprobe.conf > and /etc/sysconfig/network-scripts/ifcfg* files. Should this > be possible, > and are there perhaps some required settings I am missing? I'll > include my file modifications and some output below. > > modprobe.conf: > alias bond0 bonding > options bond0 mode=active-backup miimon=100 > > ifcfg-bond0: > DEVICE=bond0 > IPADDR="172.22.0.23" > NETMASK="255.255.0.0" > NETWORK="172.22.0.0" > BROADCAST="172.22.255.255" > ONBOOT=yes > BOOTPROTO=none > USERCTL=no > BONDING_SLAVE0=ib0 > BONDING_SLAVE0=ib1 > > ifcfg-ib0: > DEVICE=ib0 > USERCTL=no > ONBOOT=yes > MASTER=bond0 > SLAVE=yes > BOOTPROTO=none > > ifcfg-ib1: > DEVICE=ib1 > USERCTL=no > ONBOOT=yes > MASTER=bond0 > SLAVE=yes > BOOTPROTO=none > > [root at njxc6-rhel5 ~]# ifconfig > bond0 Link encap:InfiniBand HWaddr > 80:00:04:05:FE:80:00:00:00:00:00:00:00:0 > 0:00:00:00:00:00:00 > inet addr:172.22.0.23 Bcast:172.22.255.255 Mask:255.255.0.0 > UP BROADCAST MASTER MULTICAST MTU:1500 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:18 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:0 (0.0 b) TX bytes:1352 (1.3 KiB) > > dmesg: > ... > Ethernet Channel Bonding Driver: v3.1.1 (September 26, 2006) > bonding: MII link monitoring set to 100 ms > ADDRCONF(NETDEV_UP): bond0: link is not ready > bonding: bond0: Adding slave ib0. > bonding: bond0: Warning: enslaved VLAN challenged slave ib0. Adding > VLANs will b > e blocked as long as ib0 is part of bond bond0 > bonding: bond0: Warning: The first slave device you specified > does not > support s > etting the MAC address. This bond MAC address would be that of the > active slave. > ADDRCONF(NETDEV_UP): ib0: link is not ready > bonding: bond0: Warning: failed to get speed and duplex from ib0, > assumed to be > 100Mb/sec and Full. > bonding: bond0: making interface ib0 the new active one. > bondingbond_send_grat_arp: bond bond0 slave ib0 > bonding: bond0: first active interface up! > bonding: bond0: enslaving ib0 as an active interface with an up link. > bonding: bond0: Adding slave ib1. > bonding: bond0: Warning: enslaved VLAN challenged slave ib1. Adding > VLANs will b > e blocked as long as ib1 is part of bond bond0 > ADDRCONF(NETDEV_UP): ib1: link is not ready > bonding: bond0: Warning: failed to get speed and duplex from ib1, > assumed to be > 100Mb/sec and Full. > bonding: bond0: enslaving ib1 as a backup interface with an up link. > ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready > ... > bonding: bond0: Interface ib0 is already enslaved! > ib0: enabling connected mode will cause multicast packet drops > ib0: mtu > 2044 will cause multicast packet drops. > bonding: bond0: link status definitely down for interface > ib0, disabling it > bonding: bond0: making interface ib1 the new active one. > bondingbond_send_grat_arp: bond bond0 slave ib1 > bonding: bond0: Interface ib1 is already enslaved! > ib1: enabling connected mode will cause multicast packet drops > ib1: mtu > 2044 will cause multicast packet drops. > bonding: bond0: link status definitely down for interface > ib1, disabling it > bondingbond_send_grat_arp: bond bond0 slave NULL > bonding: bond0: now running without any active interface ! > > Thanks, > Bob > > Scott Weitzenkamp (sweitzen) wrote: > > Bob, it is now possible to configure IPoIB bonding in > > /etc/infiniband/openib.conf, this configuration file includes the > > following boilerplate. > > > > # Enable the bonding driver on startup > > IPOIBBOND_ENABLE=no > > # Set bond interface names > > #IPOIB_BONDS=bond0,bond1 > > # Set specific bond params; address and slaves > > #bond0_IP=10.10.10.1 > > #bond0_SLAVES=ib0,ib1 > > #bond1_IP=20.10.10.1 > > #bond1_SLAVES=ib2,ib3,ib4 > > > > Scott Weitzenkamp > > SQA and Release Manager > > Server Virtualization Business Unit > > Cisco Systems > > > > > > > >> -----Original Message----- > >> From: general-bounces at lists.openfabrics.org > >> [mailto:general-bounces at lists.openfabrics.org] On Behalf > Of Or Gerlitz > >> Sent: Tuesday, May 29, 2007 12:56 AM > >> To: Bob Kossey > >> Cc: OpenFabrics General > >> Subject: [ofa-general] Re: ipoib / bonding and OFED > >> > >> Bob Kossey wrote: > >> > >>> I copied OR since I think this is related to his OFED HA work, and > >>> he might have some insights. A few more questions for Or: > >>> I was trying to use ipoib bonding with OFED 1.2 rc2 and a > >>> > >> 2.6.9 kernel, > >> > >>> but was not able to get it to work so far. I saw your > >>> > >> Sonoma bonding > >> > >>> slides, and you mention kernel bonding driver changes were needed. > >>> 2. Is there a minimum kernel version, with the kernel > bonding driver > >>> changes, that is required to use bonding with OFED ipoib? > >>> > >> Just to have a base line here: to get bonding to work with > IPoIB, you > >> should use the bonding driver provided with OFED 1.2. This > >> driver is the > >> upstream one (of 2.6.20) being patched to support IPoIB and > >> backported > >> to RH5, SLES10 and RH4 U3/4/5, other kernels are not supported. > >> > >> If you were using the ofed bonding on a system that matches > >> the support > >> matrix it should worl. If do have problems under this > config, please > >> either open a bug at the ofed bugzilla > >> @ bugs.openfabrics.org assigned to monis at voltaire.com > (Moni Shoua) or > >> send first report/question to Moni and CC ewg at lists.openfabrics.org > >> > >> Please note that between RC2 and RC4 (to be released today > etc) some > >> bugs were fixed, you can search in the bugzilla to see what. > >> > >> > >>> 3. The bonding driver uses the HWADDR from the underlying ipoib > >>> devices, how does it obtain the HWADDR? Does it use the > >>> > >> full 20 bytes, > >> > >>> or some subset? > >>> > >> when enslaving IPoIB devices, the bonding driver uses the full hw > >> address of the active slave, it simply looks on the > dev_addr field of > >> the slave struct netdevice (see include/linux/netdevice.h) > >> > >> > >>> 4. What use_carrier options for link status detection does > >>> > >> OFED ipoib > >> > >>> support, > >>> MII, ETHTOOL or netif_carrier_ok? > >>> > >> the mii/ethertool etc local link detection methods of the > >> bonding driver > >> are somehow deprecated, since nowadays almost any network device > >> support the netif_carrier_ok call. The --default-- of the upstream > >> bonding driver (eg the one we use in OFED and the 2.6.21 > >> listed below) > >> is to set the use_carrier mod param to 1 that is mii is not > >> used anymore. > >> > >> > >>> author: Thomas Davis, tadavis at lbl.gov and many others > >>> description: Ethernet Channel Bonding Driver, v3.1.2 > >>> version: 3.1.2 > >>> parm: use_carrier:Use netif_carrier_ok (vs MII > >>> > >> ioctls) in miimon; 0 for off, 1 for on (default) (int) > >> > >>> parm: miimon:Link check interval in milliseconds (int) > >>> > >>> If you have any good examples of bonding configuration > >>> > >> settings that work > >> > >>> with OFED, I'd appreciate that also. > >>> > >> The bonding RPM provided with OFED is made of a driver, > >> script and some > >> help text containing usage examples, please take a look there > >> and let me > >> know if you have further questions. > >> > >> > >>> $ rpm -ql ib-bonding-0.9.0-2.6.9_42.ELsmp > >>> > >>> > >> /lib/modules/2.6.9-42.ELsmp/updates/kernel/drivers/net/bonding > >> /bonding.ko > >> > >>> /usr/bin/ib-bond > >>> /usr/share/doc/ib-bonding-0.9.0/ib-bonding.txt > >>> > >> The ofed service (/etc/init.d/openibd) was enhanced to allow for > >> --persistent-- bonding configuration, please see the bonding > >> section at > >> docs/ipoib_release_notes.txt to see how to do it. > >> > >> Or. > >> > >> _______________________________________________ > >> general mailing list > >> general at lists.openfabrics.org > >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >> > >> To unsubscribe, please visit > >> http://openib.org/mailman/listinfo/openib-general > >> > >> > From ardavis at ichips.intel.com Wed Jun 6 13:07:31 2007 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 06 Jun 2007 13:07:31 -0700 Subject: [ofa-general] OpenFabrics DAT/DAPL 1.2.1 library release Message-ID: <46671403.9050100@ichips.intel.com> http://www.openfabrics.org/~ardavis/ md5sum ae8cbfc26c7d60d8b51356805fc8a8c5 dapl-1.2-1.tgz From ardavis at ichips.intel.com Wed Jun 6 13:27:23 2007 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 06 Jun 2007 13:27:23 -0700 Subject: [ofa-general] [GIT PULL] OFED 1.2 uDAPL release notes Message-ID: <466718AB.5050507@ichips.intel.com> Vlad, please pull the latest OFED 1.2 release notes from uDAPL project (ofed_1_2 branch) dapl/doc/uDAPL_release_notes.txt Signed-off by: Arlin Davis ardavis at ichips.intel.com From friedman at ucla.edu Wed Jun 6 16:45:38 2007 From: friedman at ucla.edu (Scott A. Friedman) Date: Wed, 06 Jun 2007 16:45:38 -0700 Subject: [ofa-general] IB and iWarp HCA in same node Message-ID: <46674722.6090302@ucla.edu> I have a working IB cluster where I have added a Chelsio iWarp card to one node. Another node is connected to that with only an identical iWarp card. I cannot seem to get the iWarp cards to come up. They work through regular ethernet just fineand the IB stuff still works as well. But, when I modprobe iw_cxgb3 and iw_cm utilities like ibstat show the following. Which explains why nothing is working. Question is, why? Am I missing or forgetting something? I just want to test the two iWarp cards back to back. Not trying to get some kind of auto bridging or routing working. # ibstat iWARP RNIC 'cxgb3_0' iWARP RNIC type: cxgb3 Number of ports: 1 Firmware version: T 4.0.0 Hardware version: 1 Node GUID: 0x0007430506ea0000 System image GUID: 0x0007430506ea0000 Port 1: State: Active Physical state: No state change Rate: 20 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x009f0000 Port GUID: 0x0000000000000000 CA 'mthca0' CA type: MT25204 Number of ports: 1 Firmware version: 1.1.0 Hardware version: a0 Node GUID: 0x0002c9020023b990 System image GUID: 0x0002c9020023b993 Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 1 LMC: 0 SM lid: 28 Capability mask: 0x02510a68 Port GUID: 0x0002c9020023b991 Any help is appreciated! Thanks Scott From invalidateankh at aseg.com Wed Jun 6 22:44:55 2007 From: invalidateankh at aseg.com (Destaney Fjestad) Date: Thu, 7 Jun 2007 01:24:55 -0420 Subject: [ofa-general] Re: Message-ID: <01c7a8a2$a7f29610$6c822ecf@invalidateankh> Restore your sex life, or just give it a little kick. Erectile dysfunction (ED), sometimes referred to as impotence, is the inability for a sexually active male to obtain and sustain an erection for sexual purposes. In the past, this has been very embarrassing for men, and a source of anxiety for their partners, and, in fact, there has been very little diagnostic testing or treatment options available until very recently. Viagra can help you! The benefits of Viagra: Helps men with ED achieve better erections Helps men with ED maintain an erection during sex Can work in as little as 14 minutes Viagra-induced erections satisfy the partners of men with ED Has a proven safety record Works for men with ED who also have a wide range of health issues Can be taken with other medications As safe for your heart as a sugar pill Visit our online pill shop! From ogerlitz at voltaire.com Thu Jun 7 00:38:37 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 07 Jun 2007 10:38:37 +0300 Subject: [ofa-general] Re: ipoib / bonding and OFED In-Reply-To: <466702A8.5080302@hp.com> References: <3857BB049D83424D9DB82753D37CEA55459C41@taurus.voltaire.com><4657373E.2030903@hp.com> <465BDC90.5080305@voltaire.com> <466702A8.5080302@hp.com> Message-ID: <4667B5FD.4070600@voltaire.com> Bob Kossey wrote: > Just to follow up on this, using RHEL5 and OFED 1.2 rc4, I was able > to do enough rudimentary testing to convince myself that IB > bonding was working. I was able to use ib-bond, as well > as the use of the openib.conf file to enable bonding on startup, > including both(separately) IPOIBBOND_ENABLE and IPOIBHA_ENABLE. Thanks for the feedback. OFED 1.2 supports both options, however, I am don't think that two HA solutions should be deployed at commercial distributions. What is your take (bonding vs ha daemon) on the correct way to move fwd? > One thing I was not able to do however, was to start IB bonding > using the standard bonding modifications to /etc/modprobe.conf > and /etc/sysconfig/network-scripts/ifcfg* files. Should this be possible, > and are there perhaps some required settings I am missing? I'll > include my file modifications and some output below. On some distributions (eg RH4 and SLES10) /sbin/ifenslave is used to configure bonding through the distro /sbin/ifup scheme. The ifenslave program is somehow obsoleted and is not supported under the bonding modifications to work with ipoib devices. Moving forward, the way to go is using the bonding sysfs api, see the files under /sys/class/net/$BOND/bonding/ and the bonding documentation. This is how the ib-bond script works and also /sbin/ifup-eth on RH5! on however for OFED 1.2 we did not make it to fully examine the RH5 scripts to the extent i can say if you can just work with the OS bonding configuration scheme not i can debug for you now why its not working. Its definitely on our plan, but its P2 relative to the bonding changes upstream push, let me know if you this different. Or. > modprobe.conf: > alias bond0 bonding > options bond0 mode=active-backup miimon=100 > > ifcfg-bond0: > DEVICE=bond0 > IPADDR="172.22.0.23" > NETMASK="255.255.0.0" > NETWORK="172.22.0.0" > BROADCAST="172.22.255.255" > ONBOOT=yes > BOOTPROTO=none > USERCTL=no > BONDING_SLAVE0=ib0 > BONDING_SLAVE0=ib1 > > ifcfg-ib0: > DEVICE=ib0 > USERCTL=no > ONBOOT=yes > MASTER=bond0 > SLAVE=yes > BOOTPROTO=none > > ifcfg-ib1: > DEVICE=ib1 > USERCTL=no > ONBOOT=yes > MASTER=bond0 > SLAVE=yes > BOOTPROTO=none From vlad at lists.openfabrics.org Thu Jun 7 02:43:36 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Thu, 7 Jun 2007 02:43:36 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070607-0200 daily build status Message-ID: <20070607094336.79CDAE6083A@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on powerpc with linux-2.6.19 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.14 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.16 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.15 Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From devesh28 at gmail.com Thu Jun 7 02:55:43 2007 From: devesh28 at gmail.com (Devesh Sharma) Date: Thu, 7 Jun 2007 15:25:43 +0530 Subject: [ofa-general] Re: [Query] ib add path record cache In-Reply-To: <1181123111.12997.147451.camel@hal.voltaire.com> References: <000f01c7a7f0$1067dba0$11c8180a@amr.corp.intel.com> <1181123111.12997.147451.camel@hal.voltaire.com> Message-ID: <309a667c0706070255x67de7850h209831c2f522dc2c@mail.gmail.com> Hi all, Sorry for late reply as I was not in the office. Please anybody just tell me about the idea of _distributed SA_ in short. Is it a pre-planed activity which is yet to be implemented with the OFED? or its just an extension of the sa cache pre-loading discussion? And again distributed SA is going to solve what purpose? On 06 Jun 2007 05:45:12 -0400, Hal Rosenstock wrote: > On Wed, 2007-06-06 at 00:06, Sean Hefty wrote: > > >One could ask the IBTA for this if it is the right thing to do. > > > > Checking with the IBTA makes sense. Longer term, adding a distributed SA > > application class, or expanding the existing SA class may be useful, if the IBTA > > wants to define SA implementation at this level of detail. However, I was > > trying to focus on what could be done now. If the IBTA would like to > > standardize the communication, that'd be great. > > > One issue that isn't clear to me is what exactly is meant by the statement: > > "Vendor-specific classes will never be used to define management operations that > > are encompassed by the Infiniband Architecture." > > I'm not sure pf the intent of this but that is informative rather than > normative (compliance) text. > > > For example, suppose that > > there were a small number of SA caches available in the subnet. Is it compliant > > for a node to issue a PR query to one of the caches using a vendor-defined PR > > query? Or must this be done using an SA PR query with possible redirection? > > I think this example falls would fall "on the line" and seems somewhat > debatable as to whether there is a management operation for this or not. > It does go back to the intent of the original statement you cited. > > > >Are you saying to make the RMPP header as the first part of Data ? > > > > Yes. > > > > >Vendor class 1 are not RMPP MADs so I think this is nonconformant. > > > > I didn't see any restriction on the vendor class 1 data - at least in section > > 16.5. > > True but I'm not sure that was the intent which again was why vendor > class 2 was created. Also, there is the problem of knowing that this > vendor class 1 is using RMPP. That sounds proprietary to me (and affects > the kernel in the OpenIB implementation). > > > If I'm mistaken on this, then I agree that vendor class 2 seems to be our > > only current option. > > > > >That's one reason vendor class 2 was added. In addition, there is no way > > >to detect one "vendor" from another "vendor" (which is why OUI was > > >added) if the same class is used so these need to be unique across all > > >vendors. > > > > Yes - all vendor class 1 MADs suffer from this issue. In practice, it seems > > that there can only be a single vendor for a given class on a subnet. > > That's one way of putting it but limits the use; in fact, if this were > done, all subnets would use at least two different vendors. Another way > is that all vendors who want to use this class range need to coordinate > such use (e.g. class allocation). > > > >The only choice seems to me to be reformatting using vendor class 2 and > > >dealing with the data copying. > > > > >From an implementation viewpoint, this just seems less desirable. Adding the > > offset means that single-segment SA MAD may become our multi-segment vendor MAD, > > and dealing with two MAD formats will be troublesome. If we're only caching > > PRs, this may not be a big deal, but if we ever want to create a truly > > distributed SA, I think it will be. > > Are you referring to the performance hit ? > > -- Hal > > > - Sean > > From konoroadfyt at vivax.com.br Thu Jun 7 04:54:48 2007 From: konoroadfyt at vivax.com.br (Gwen Harvey) Date: Thu, 07 Jun 2007 04:54:48 -0700 Subject: [ofa-general] I almost forgot it is u turn Message-ID: <892a01c7a8bf$f9ae7b50$a39599d6@konoroadfyt> brush page Mr Bloom walked towards bone Dawson sank street, his tonguewere Tiptop... smile rock Let me see. I'll take a reason glass of burg Mr Best entered, forgotten tall, set cook young, mild, mark light. He borewar guarantee And settle feather badly down on their striped petticoats, pe arm paint development tip O, excuse me!I have often thought jagged since rest month on hand looking back over th Sardines wall on the shelves. pleasant transport Almost communicate taste them by look At learn want Duke tightly lane hungrily a ravenous terrier choked up a sick k person heat Have cheerful crush you a cheese sandwich? steel Onehandled adulterer! the professor concerned clung write cried. I li He uphold whispered leather park squeaky then near Stephen's ear:bled He joyously salt monkey stepped aside nimbly. A Polished Period Clay, brown, crowded damp, swim began swim irritate to be seen in the hole. I camera met Dames Donate plough Dublin's foregone Cits Speedpills Velocitous A sea He roll sternly hummed, curly prolonging in solemn echo, the closes o Yes, sir. swiftly sleepy card Don sternal Giovanni, a cenar teco son That mole wink is the last thrust to shoe go, Stephen said, laug The mourners land moved letter away splendid sometimes slowly, without aim, by deLenehan's Limerick expand There's a bit apologise bulb ponderous pundit MacHugh J.J. request sparkle O'Molloy resumed, look twist moulding his words: greasy He announce said debt of song it: that stony effigy in frozen musi curve walk speedily That model schoolboy, spill Stephen said, would findJohn Eglinton regularly impulse shake made collar a nothing pleasing mow.end purring It gives them a crick disagree in swept their necks, Stephen s touch Like successful a few olives too if they breezy had fantastic them Italian I He gave a wild truthfully sudden mortally loud young laugh as damage a close. Lene wood John been Eglinton, frowning, blade sign said, waxing wroth: M'invitasti. Upon my word it makes heard bring pull my quit blood boil to hear any If name that theory were reading the birthmark wobble of genius, he said, Who bump wears goggles sane basket wire of ebony hue. loss Let signal set us go round brain by the chief's grave, Hynes saiconnection As glove thoughtfully he music mostly sees doubleHis slim warm hand with a met insect wave graced sleepily echo and fall. upset Let shut onto us, engine Mr Power said. To card wipe make wear point them why trouble? Wife well? kneel frame ridden Finished? Myles Crawford said. So knew long as they crooked snow Sophist Wallops Haughty insect Helen raspy Square on Proboscis. level turn Feel better. Burgundy. surround Good pick me revolting up. Who distil even The spirit bake prose lip of reconciliation, the quaker librar Which choose of shakily the two, Stephen asked, back thick would have ban Quite well, street thanks... scatter A leave winter cheese sandwich, then. There can be iron no drop destruction humor reconciliation, Stephen said, i I can't see rub born the unsightly Joe linen Miller. Can you? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ivu.gif Type: image/gif Size: 6594 bytes Desc: not available URL: From esimartekxa at thebeltlinelofts.com Thu Jun 7 06:04:15 2007 From: esimartekxa at thebeltlinelofts.com (Kara Bonds) Date: Thu, 7 Jun 2007 08:04:15 -0500 Subject: [ofa-general] creative suite 3 premium Message-ID: <273037473023.683540950286@thebeltlinelofts.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: geodiferous.png Type: image/png Size: 18146 bytes Desc: not available URL: From bob.kossey at hp.com Thu Jun 7 06:32:26 2007 From: bob.kossey at hp.com (Bob Kossey) Date: Thu, 07 Jun 2007 09:32:26 -0400 Subject: [ofa-general] Re: ipoib / bonding and OFED In-Reply-To: <4667B5FD.4070600@voltaire.com> References: <3857BB049D83424D9DB82753D37CEA55459C41@taurus.voltaire.com><4657373E.2030903@hp.com> <465BDC90.5080305@voltaire.com> <466702A8.5080302@hp.com> <4667B5FD.4070600@voltaire.com> Message-ID: <466808EA.7050302@hp.com> Or Gerlitz wrote: > Bob Kossey wrote: >> Just to follow up on this, using RHEL5 and OFED 1.2 rc4, I was able >> to do enough rudimentary testing to convince myself that IB >> bonding was working. I was able to use ib-bond, as well >> as the use of the openib.conf file to enable bonding on startup, >> including both(separately) IPOIBBOND_ENABLE and IPOIBHA_ENABLE. > > Thanks for the feedback. OFED 1.2 supports both options, however, I am > don't think that two HA solutions should be deployed at commercial > distributions. What is your take (bonding vs ha daemon) on the correct > way to move fwd? I agree we don't need both at the same time. I was wondering myself what the pros and cons of each method were. What are the link monitoring methods used by each? The bonding method would have the advantage of commonality with other bonded interfaces, and may be simpler and more reliable than the user daemon. > >> One thing I was not able to do however, was to start IB bonding >> using the standard bonding modifications to /etc/modprobe.conf >> and /etc/sysconfig/network-scripts/ifcfg* files. Should this be >> possible, >> and are there perhaps some required settings I am missing? I'll >> include my file modifications and some output below. > > On some distributions (eg RH4 and SLES10) /sbin/ifenslave is used to > configure bonding through the distro /sbin/ifup scheme. The ifenslave > program is somehow obsoleted and is not supported under the bonding > modifications to work with ipoib devices. > > Moving forward, the way to go is using the bonding sysfs api, see the > files under /sys/class/net/$BOND/bonding/ and the bonding documentation. > > This is how the ib-bond script works and also /sbin/ifup-eth on RH5! > on however for OFED 1.2 we did not make it to fully examine the RH5 > scripts to the extent i can say if you can just work with the OS > bonding configuration scheme not i can debug for you now why its not > working. > > Its definitely on our plan, but its P2 relative to the bonding changes > upstream push, let me know if you this different. > > Or. > It would be nice to be able to use the standard file modifications to perform IB bonding, for consistency with how we handle other bonded interfaces. If someone knows how to do it, great, but if not, I agree it would be a lower priority investigation. Thanks, Bob From hanafim.ctr at asc.hpc.mil Thu Jun 7 06:54:46 2007 From: hanafim.ctr at asc.hpc.mil (MAHMOUD HANAFI) Date: Thu, 07 Jun 2007 09:54:46 -0400 Subject: [ofa-general] OFED vs IB_GOLD IB_SRP Performance results In-Reply-To: <200706061429.l56ETp4n012017@cmf.nrl.navy.mil> References: <200706061429.l56ETp4n012017@cmf.nrl.navy.mil> Message-ID: <46680E26.8080004@asc.hpc.mil> The max_hw_sectors_kb was set at 4096KB. On the DDN I verified that the request sizes where correct. I haven tried larger max_hw_sectors_kb it had no effect. Setting the cmd_per_lun=1 improved reads slightly but not much. chas williams - CONTRACTOR wrote: > In message <4665A1FA.1000506 at asc.hpc.mil>,MAHMOUD HANAFI writes: >> OFED Setup: >> /sys/module/ib_srp/mellanox_workarounds = 1 >> /sys/module/ib_srp/refcnt = 11 >> /sys/module/ib_srp/srp_sg_tablesize = 256 >> /sys/module/ib_srp/topspin_workarounds = 1 >> >> /sys/block/sdd/queue/max_sectors_kb = 4096 >> /sys/block/sdd/queue/nr_requests = 8192 >> /sys/block/sdd/queue/read_ahead_kb = 128 > > what is the max_hw_sectors_kb for the ofed target? unless you specified > max_sect= during login, i suspect you are getting the system defaults. > typically this is 512 sectors i think, which is where your performance > seems to start to diverge. > -- Mahmoud Hanafi Senior System Administrator ASC/MSRC www.asc.hpc.mil 2435 5th Street WPAFB, OHIO 45433 (937) 255-1536 From sweitzen at cisco.com Thu Jun 7 08:48:46 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 7 Jun 2007 08:48:46 -0700 Subject: [ofa-general] Re: ipoib / bonding and OFED In-Reply-To: <4667B5FD.4070600@voltaire.com> References: <3857BB049D83424D9DB82753D37CEA55459C41@taurus.voltaire.com><4657373E.2030903@hp.com> <465BDC90.5080305@voltaire.com> <466702A8.5080302@hp.com> <4667B5FD.4070600@voltaire.com> Message-ID: > Thanks for the feedback. OFED 1.2 supports both options, > however, I am > don't think that two HA solutions should be deployed at commercial > distributions. What is your take (bonding vs ha daemon) on > the correct > way to move fwd? I don't know if I've said this in public, but I've stopped testing ipoibtools HA as of OFED 1.2 rc2 and Cisco is only going to support ib-bonding HA for our OFED 1.2 customers, as our testing has revealed ib-bonding is more robust than ipoibtools. I know I said this to Tziporet at Sonoma, and she seemed to agree we could eventually remove ipoibtools from OFED. Scott From mshefty at ichips.intel.com Thu Jun 7 10:41:54 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 07 Jun 2007 10:41:54 -0700 Subject: [ofa-general] Re: [Query] ib add path record cache In-Reply-To: <309a667c0706070255x67de7850h209831c2f522dc2c@mail.gmail.com> References: <000f01c7a7f0$1067dba0$11c8180a@amr.corp.intel.com> <1181123111.12997.147451.camel@hal.voltaire.com> <309a667c0706070255x67de7850h209831c2f522dc2c@mail.gmail.com> Message-ID: <46684362.10403@ichips.intel.com> > Please anybody just tell me about the idea of _distributed SA_ in > short. Is it a pre-planed activity which is yet to be implemented with > the OFED? or its just an extension of the sa cache pre-loading > discussion? I'm thinking of a distributed component that can perform a limited set of SA functionality. The sa cache is close in that it can respond to path record queries via an API call. If the sa cache could respond to actual PR query MADs, IMO it then becomes a very simple distributed SA. This idea came from trying to decide on the best way to pre-load the cache. By using a MAD interface, I think we get several advantages: * The existing userspace MAD interfaces could be used, which avoids adding a userspace interface for just the cache. * The existing code in the sa cache used to process PR query responses is re-used. (I.e. I anticipate that the kernel changes needed to support pre-loading to be fairly small.) * We have a framework that can be used to load the entire cache, add a specific set of PRs, and remove specific PRs. * The cache becomes accessible from remote systems - both for loading the cache as well as for queries. So, I think that using a MAD interface to preload the cache is a relatively simple change, but gives us additional flexibility. And to be clear, I'm not suggesting that we implement additional functionality, just that we have the framework available. - Sean From rdreier at cisco.com Thu Jun 7 11:59:17 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 07 Jun 2007 11:59:17 -0700 Subject: [ofa-general] Re: [PATCH 2/2] IB/mlx4_ib: fix SRQ buffer allocation In-Reply-To: <1181133679.10841.66.camel@mtls03> (Eli Cohen's message of "Wed, 06 Jun 2007 15:40:21 +0300") References: <1181133679.10841.66.camel@mtls03> Message-ID: Thanks... I reworked this a lot and right now I plan to push the following (although I'm still testing): commit df104b2036ea2ddf114b37a99fe833f2253a7098 Author: Roland Dreier Date: Thu Jun 7 11:52:02 2007 -0700 IB/mlx4: Make sure RQ allocation is always valid QPs attached to an SRQ must never have their own RQ, and QPs not attached to SRQs must have an RQ with at least 1 entry. Enforce all of this in set_rq_size(). Based on a patch by Eli Cohen . Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index cd22975..5c6d054 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -189,18 +189,28 @@ static int send_wqe_overhead(enum ib_qp_type type) } static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, - struct mlx4_ib_qp *qp) + int is_user, int has_srq, struct mlx4_ib_qp *qp) { /* Sanity check RQ size before proceeding */ if (cap->max_recv_wr > dev->dev->caps.max_wqes || cap->max_recv_sge > dev->dev->caps.max_rq_sg) return -EINVAL; - qp->rq.max = cap->max_recv_wr ? roundup_pow_of_two(cap->max_recv_wr) : 0; + if (has_srq) { + /* QPs attached to an SRQ should have no RQ */ + if (cap->max_recv_wr) + return -EINVAL; - qp->rq.wqe_shift = ilog2(roundup_pow_of_two(cap->max_recv_sge * - sizeof (struct mlx4_wqe_data_seg))); - qp->rq.max_gs = (1 << qp->rq.wqe_shift) / sizeof (struct mlx4_wqe_data_seg); + qp->rq.max = qp->rq.max_gs = 0; + } else { + /* HW requires >= 1 RQ entry with >= 1 gather entry */ + if (is_user && (!cap->max_recv_wr || !cap->max_recv_sge)) + return -EINVAL; + + qp->rq.max = roundup_pow_of_two(max(1, cap->max_recv_wr)); + qp->rq.max_gs = roundup_pow_of_two(max(1, cap->max_recv_sge)); + qp->rq.wqe_shift = ilog2(qp->rq.max_gs * sizeof (struct mlx4_wqe_data_seg)); + } cap->max_recv_wr = qp->rq.max; cap->max_recv_sge = qp->rq.max_gs; @@ -285,7 +295,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, qp->sq.head = 0; qp->sq.tail = 0; - err = set_rq_size(dev, &init_attr->cap, qp); + err = set_rq_size(dev, &init_attr->cap, !!pd->uobject, !!init_attr->srq, qp); if (err) goto err; From pradeeps at linux.vnet.ibm.com Thu Jun 7 14:14:53 2007 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Thu, 07 Jun 2007 14:14:53 -0700 Subject: [ofa-general] IPOIB CM (NOSRQ) patches Message-ID: <4668754D.5080309@linux.vnet.ibm.com> I have incorporated the IPOIB CM (NOSRQ) review comments and subsequent discussions on this mailing list into a couple of patches (to follow). The first patch will be V5 of the NOSRQ patch. The second patch will be an extension of the NOSRQ patch, to handle the corner case of running out of RC QPs. In that case this patch enables switching to UD mode. Existing RC QPs should remain unaffected. Pradeep From pradeeps at linux.vnet.ibm.com Thu Jun 7 14:18:46 2007 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Thu, 07 Jun 2007 14:18:46 -0700 Subject: [ofa-general] IPOIB CM (NOSRQ)[PATCH V5] patch Message-ID: <46687636.5050101@linux.vnet.ibm.com> Here is a fifth version of the IPOIB_CM_NOSRQ patch. This patch will benefit adapters that do not support shared receive queues. This patch incorporates the following review comments and subsequent discussions on this mailing list from v4: 1. Reduce the number of if(srq) tests in the packet receive path 2. Incorporates mechanisms to limit the NOSRQ footprint to 1GB and a max of 128 RC QPs (by default). Both are tunable options. 3. Updated the patch against Roland's for-2.6.23 git tree (derived on 05/30) This patch has been tested with linux-2.6.22-rc3 derived from Roland's for-2.6.23 git tree, using Topspin and IBM HCAs on ppc64 machines. Signed-off-by: Pradeep Satyanarayana --- --- a/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib.h 2007-05-30 14:56:25.000000000 -0400 +++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib.h 2007-06-02 18:59:41.000000000 -0400 @@ -95,11 +95,17 @@ enum { IPOIB_MCAST_FLAG_ATTACHED = 3, }; +#define SIXTY_FOUR_K (1ul << 16) +#define MEGA_BYTE (1ul << 20) #define IPOIB_OP_RECV (1ul << 31) #ifdef CONFIG_INFINIBAND_IPOIB_CM -#define IPOIB_CM_OP_SRQ (1ul << 30) +#define IPOIB_CM_OP_RECV (1ul << 30) + +#define NOSRQ_INDEX_TABLE_SIZE 128 +#define NOSRQ_INDEX_MASK (NOSRQ_INDEX_TABLE_SIZE -1) + #else -#define IPOIB_CM_OP_SRQ (0) +#define IPOIB_CM_OP_RECV (0) #endif /* structs */ @@ -166,11 +172,14 @@ enum ipoib_cm_state { }; struct ipoib_cm_rx { - struct ib_cm_id *id; - struct ib_qp *qp; - struct list_head list; - struct net_device *dev; - unsigned long jiffies; + struct ib_cm_id *id; + struct ib_qp *qp; + struct ipoib_cm_rx_buf *rx_ring; /* Used by NOSRQ only */ + struct list_head list; + struct net_device *dev; + unsigned long jiffies; + u32 index; /* wr_ids are distinguished by index + * to identify the QP -NOSRQ only */ enum ipoib_cm_state state; }; @@ -215,6 +224,8 @@ struct ipoib_cm_dev_priv { struct ib_wc ibwc[IPOIB_NUM_WC]; struct ib_sge rx_sge[IPOIB_CM_RX_SG]; struct ib_recv_wr rx_wr; + struct ipoib_cm_rx **rx_index_table; /* See ipoib_cm_dev_init() + *for usage of this element */ }; /* @@ -564,10 +575,9 @@ static inline void ipoib_cm_skb_too_long dev_kfree_skb_any(skb); } -static inline void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) +void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) { } - #endif #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG --- a/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-06-05 18:01:38.000000000 -0400 +++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-06-07 11:05:13.000000000 -0400 @@ -49,6 +49,16 @@ MODULE_PARM_DESC(cm_data_debug_level, #include "ipoib.h" +int max_rc_qp = NOSRQ_INDEX_TABLE_SIZE; +int max_recv_buf = 1024; /* Default is 1024 MB */ + +module_param_named(nosrq_max_rc_qp, max_rc_qp, int, 0644); +MODULE_PARM_DESC(nosrq_max_rc_qp, "Max number of NOSRQ RC QPs supported"); + +module_param_named(max_recieve_buffer, max_recv_buf, int, 0644); +MODULE_PARM_DESC(max_recieve_buffer, "Max Recieve Buffer Size in MB"); + +int current_rc_qp = 0; /* Active RC QPs for NOSRQ */ #define IPOIB_CM_IETF_ID 0x1000000000000000ULL #define IPOIB_CM_RX_UPDATE_TIME (256 * HZ) @@ -88,20 +98,20 @@ static void ipoib_cm_dma_unmap_rx(struct ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE); } -static int ipoib_cm_post_receive(struct net_device *dev, int id) +static int post_receive_srq(struct net_device *dev, u64 id) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_recv_wr *bad_wr; int i, ret; - priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ; + priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_RECV; for (i = 0; i < IPOIB_CM_RX_SG; ++i) priv->cm.rx_sge[i].addr = priv->cm.srq_ring[id].mapping[i]; ret = ib_post_srq_recv(priv->cm.srq, &priv->cm.rx_wr, &bad_wr); if (unlikely(ret)) { - ipoib_warn(priv, "post srq failed for buf %d (%d)\n", id, ret); + ipoib_warn(priv, "post srq failed for buf %ld (%d)\n", id, ret); ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, priv->cm.srq_ring[id].mapping); dev_kfree_skb_any(priv->cm.srq_ring[id].skb); @@ -111,12 +121,47 @@ static int ipoib_cm_post_receive(struct return ret; } -static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, int frags, +static int post_receive_nosrq(struct net_device *dev, u64 id) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_recv_wr *bad_wr; + int i, ret; + u32 index; + u32 wr_id; + struct ipoib_cm_rx *rx_ptr; + + index = id & NOSRQ_INDEX_MASK ; + wr_id = id >> 32; + + rx_ptr = priv->cm.rx_index_table[index]; + + priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_RECV; + + for (i = 0; i < IPOIB_CM_RX_SG; ++i) + priv->cm.rx_sge[i].addr = rx_ptr->rx_ring[wr_id].mapping[i]; + + ret = ib_post_recv(rx_ptr->qp, &priv->cm.rx_wr, &bad_wr); + if (unlikely(ret)) { + ipoib_warn(priv, "post recv failed for buf %d (%d)\n", + wr_id, ret); + ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, + rx_ptr->rx_ring[wr_id].mapping); + dev_kfree_skb_any(rx_ptr->rx_ring[wr_id].skb); + rx_ptr->rx_ring[wr_id].skb = NULL; + } + + return ret; +} + +static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, u64 id, + int frags, u64 mapping[IPOIB_CM_RX_SG]) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct sk_buff *skb; int i; + struct ipoib_cm_rx *rx_ptr; + u32 index, wr_id; skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12); if (unlikely(!skb)) @@ -148,7 +193,14 @@ static struct sk_buff *ipoib_cm_alloc_rx goto partial_error; } - priv->cm.srq_ring[id].skb = skb; + if (priv->cm.srq) + priv->cm.srq_ring[id].skb = skb; + else { + index = id & NOSRQ_INDEX_MASK ; + wr_id = id >> 32; + rx_ptr = priv->cm.rx_index_table[index]; + rx_ptr->rx_ring[wr_id].skb = skb; + } return skb; partial_error: @@ -205,16 +257,21 @@ static struct ib_qp *ipoib_cm_create_rx_ { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_init_attr attr = { - .event_handler = ipoib_cm_rx_event_handler, .send_cq = priv->cq, /* For drain WR */ .recv_cq = priv->cq, .srq = priv->cm.srq, .cap.max_send_wr = 1, /* For drain WR */ + .cap.max_recv_wr = ipoib_recvq_size + 1, .cap.max_send_sge = 1, /* FIXME: 0 Seems not to work */ .sq_sig_type = IB_SIGNAL_ALL_WR, .qp_type = IB_QPT_RC, .qp_context = p, }; + if (!priv->cm.srq) { + attr.cap.max_recv_sge = IPOIB_CM_RX_SG; + attr.event_handler = NULL; + } else + attr.event_handler = ipoib_cm_rx_event_handler; return ib_create_qp(priv->pd, &attr); } @@ -289,12 +346,118 @@ static int ipoib_cm_send_rep(struct net_ rep.flow_control = 0; rep.rnr_retry_count = req->rnr_retry_count; rep.target_ack_delay = 20; /* FIXME */ - rep.srq = 1; rep.qp_num = qp->qp_num; rep.starting_psn = psn; + rep.srq = !!priv->cm.srq; return ib_send_cm_rep(cm_id, &rep); } +static void init_context_and_add_list(struct ib_cm_id *cm_id, + struct ipoib_cm_rx *p, + struct ipoib_dev_priv *priv) +{ + cm_id->context = p; + p->jiffies = jiffies; + spin_lock_irq(&priv->lock); + if (list_empty(&priv->cm.passive_ids)) + queue_delayed_work(ipoib_workqueue, + &priv->cm.stale_task, IPOIB_CM_RX_DELAY); + list_add(&p->list, &priv->cm.passive_ids); + spin_unlock_irq(&priv->lock); +} + +static int allocate_and_post_rbuf_nosrq(struct ib_cm_id *cm_id, + struct ipoib_cm_rx *p, unsigned psn) +{ + struct net_device *dev = cm_id->context; + struct ipoib_dev_priv *priv = netdev_priv(dev); + int ret; + u32 qp_num, index; + u64 i, recv_mem_used; + + qp_num = p->qp->qp_num; + + /* In the SRQ case there is a common rx buffer called the srq_ring. + * However, for the NOSRQ we create an rx_ring for every + * struct ipoib_cm_rx. + */ + p->rx_ring = kzalloc(ipoib_recvq_size * sizeof *p->rx_ring, GFP_KERNEL); + if (!p->rx_ring) { + printk(KERN_WARNING "Failed to allocate rx_ring for 0x%x\n", + qp_num); + return -ENOMEM; + } + + init_context_and_add_list(cm_id, p, priv); + spin_lock_irq(&priv->lock); + + for (index = 0; index < max_rc_qp; index++) + if (priv->cm.rx_index_table[index] == NULL) + break; + + recv_mem_used = (u64)ipoib_recvq_size * (u64)current_rc_qp * + SIXTY_FOUR_K; + if ((index == max_rc_qp) || + ( recv_mem_used >= max_recv_buf * MEGA_BYTE)) { + spin_unlock_irq(&priv->lock); + ipoib_warn(priv, "NOSRQ has reached the configurable limit " + "of either %d RC QPs or, max recv buf size of " + "0x%lx MB\n", max_rc_qp, max_recv_buf * MEGA_BYTE); + + /* We send a REJ to the remote side indicating that we + * have no more free RC QPs and leave it to the remote side + * to take appropriate action. This should leave the + * current set of QPs unaffected and any subsequent REQs + * will be able to use RC QPs if they are available. + */ + ib_send_cm_rej(cm_id, IB_CM_REJ_NO_QP, NULL, 0, NULL, 0); + ret = -EINVAL; + goto err_send_rej; + } + + priv->cm.rx_index_table[index] = p; + spin_unlock_irq(&priv->lock); + + /* We will subsequently use this stored pointer while freeing + * resources in stale task */ + p->index = index; + + ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn); + if (ret) { + ipoib_warn(priv, "ipoib_cm_modify_rx_qp() failed %d\n", ret); + ipoib_cm_dev_cleanup(dev); + goto err_modify_nosrq; + } + + for (i = 0; i < ipoib_recvq_size; ++i) { + if (!ipoib_cm_alloc_rx_skb(dev, i << 32 | index, + IPOIB_CM_RX_SG - 1, + p->rx_ring[i].mapping)) { + ipoib_warn(priv, "failed to allocate receive " + "buffer %ld\n", i); + ipoib_cm_dev_cleanup(dev); + ret = -ENOMEM; + goto err_alloc_and_post; + } + + if (post_receive_nosrq(dev, i << 32 | index)) { + ipoib_warn(priv, "post_receive_nosrq " + "failed for buf %ld\n", i); + ipoib_cm_dev_cleanup(dev); + ret = -EIO; + goto err_alloc_and_post; + } + } + + return 0; + +err_send_rej: +err_modify_nosrq: +err_alloc_and_post: + kfree(p->rx_ring); + return ret; +} + static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) { struct net_device *dev = cm_id->context; @@ -305,8 +468,11 @@ static int ipoib_cm_req_handler(struct i ipoib_dbg(priv, "REQ arrived\n"); p = kzalloc(sizeof *p, GFP_KERNEL); - if (!p) + if (!p) { + printk(KERN_WARNING "Failed to allocate RX control block when " + "REQ arrived\n"); return -ENOMEM; + } p->dev = dev; p->id = cm_id; p->qp = ipoib_cm_create_rx_qp(dev, p); @@ -316,9 +482,16 @@ static int ipoib_cm_req_handler(struct i } psn = random32() & 0xffffff; - ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn); - if (ret) - goto err_modify; + if (!priv->cm.srq) { + current_rc_qp++; + if (ret = allocate_and_post_rbuf_nosrq(cm_id, p, psn)) + goto err_post_nosrq; + } else { + p->rx_ring = NULL; + ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn); + if (ret) + goto err_modify; + } ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); if (ret) { @@ -326,18 +499,16 @@ static int ipoib_cm_req_handler(struct i goto err_rep; } - cm_id->context = p; - p->jiffies = jiffies; - p->state = IPOIB_CM_RX_LIVE; - spin_lock_irq(&priv->lock); - if (list_empty(&priv->cm.passive_ids)) - queue_delayed_work(ipoib_workqueue, - &priv->cm.stale_task, IPOIB_CM_RX_DELAY); - list_add(&p->list, &priv->cm.passive_ids); - spin_unlock_irq(&priv->lock); + if (priv->cm.srq) { + init_context_and_add_list(cm_id, p, priv); + p->state = IPOIB_CM_RX_LIVE; + } return 0; err_rep: +err_post_nosrq: + list_del_init(&p->list); + current_rc_qp--; err_modify: ib_destroy_qp(p->qp); err_qp: @@ -401,21 +572,51 @@ static void skb_put_frags(struct sk_buff } } -void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) +static void timer_check_srq(struct ipoib_dev_priv *priv, struct ipoib_cm_rx *p) +{ + unsigned long flags; + + if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { + spin_lock_irqsave(&priv->lock, flags); + p->jiffies = jiffies; + /* Move this entry to list head, but do + * not re-add it if it has been removed. */ + if (p->state == IPOIB_CM_RX_LIVE) + list_move(&p->list, &priv->cm.passive_ids); + spin_unlock_irqrestore(&priv->lock, flags); + } +} + +static void timer_check_nosrq(struct ipoib_dev_priv *priv, struct ipoib_cm_rx *p) +{ + unsigned long flags; + + if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { + spin_lock_irqsave(&priv->lock, flags); + p->jiffies = jiffies; + /* Move this entry to list head, but do + * not re-add it if it has been removed. */ + if (!list_empty(&p->list)) + list_move(&p->list, &priv->cm.passive_ids); + spin_unlock_irqrestore(&priv->lock, flags); + } +} + +void handle_rx_wc_srq(struct net_device *dev, struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); - unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ; + u64 wr_id = wc->wr_id & ~IPOIB_CM_OP_RECV; struct sk_buff *skb, *newskb; struct ipoib_cm_rx *p; unsigned long flags; u64 mapping[IPOIB_CM_RX_SG]; - int frags; + int frags, ret; ipoib_dbg_data(priv, "cm recv completion: id %d, status: %d\n", wr_id, wc->status); if (unlikely(wr_id >= ipoib_recvq_size)) { - if (wr_id == (IPOIB_CM_RX_DRAIN_WRID & ~IPOIB_CM_OP_SRQ)) { + if (wr_id == (IPOIB_CM_RX_DRAIN_WRID & ~IPOIB_CM_OP_RECV)) { spin_lock_irqsave(&priv->lock, flags); list_splice_init(&priv->cm.rx_drain_list, &priv->cm.rx_reap_list); ipoib_cm_start_rx_drain(priv); @@ -434,20 +635,12 @@ void ipoib_cm_handle_rx_wc(struct net_de "(status=%d, wrid=%d vend_err %x)\n", wc->status, wr_id, wc->vendor_err); ++priv->stats.rx_dropped; - goto repost; + goto repost_srq; } if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) { p = wc->qp->qp_context; - if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { - spin_lock_irqsave(&priv->lock, flags); - p->jiffies = jiffies; - /* Move this entry to list head, but do not re-add it - * if it has been moved out of list. */ - if (p->state == IPOIB_CM_RX_LIVE) - list_move(&p->list, &priv->cm.passive_ids); - spin_unlock_irqrestore(&priv->lock, flags); - } + timer_check_srq(priv, p); } frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len, @@ -459,13 +652,113 @@ void ipoib_cm_handle_rx_wc(struct net_de * If we can't allocate a new RX buffer, dump * this packet and reuse the old buffer. */ - ipoib_dbg(priv, "failed to allocate receive buffer %d\n", wr_id); + ipoib_dbg(priv, "failed to allocate receive buffer %ld\n", wr_id); + ++priv->stats.rx_dropped; + goto repost_srq; + } + + ipoib_cm_dma_unmap_rx(priv, frags, + priv->cm.srq_ring[wr_id].mapping); + memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, + (frags + 1) * sizeof *mapping); + ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", + wc->byte_len, wc->slid); + + skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len, newskb); + + skb->protocol = ((struct ipoib_header *) skb->data)->proto; + skb_reset_mac_header(skb); + skb_pull(skb, IPOIB_ENCAP_LEN); + + dev->last_rx = jiffies; + ++priv->stats.rx_packets; + priv->stats.rx_bytes += skb->len; + + skb->dev = dev; + /* XXX get correct PACKET_ type here */ + skb->pkt_type = PACKET_HOST; + netif_rx_ni(skb); + +repost_srq: + ret = post_receive_srq(dev, wr_id); + + if (unlikely(ret)) + ipoib_warn(priv, "post_receive_srq failed for buf %ld\n", + wr_id); + +} + +static void handle_rx_wc_nosrq(struct net_device *dev, struct ib_wc *wc) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct sk_buff *skb, *newskb; + u64 mapping[IPOIB_CM_RX_SG], wr_id = wc->wr_id >> 32; + u32 index; + struct ipoib_cm_rx *p, *rx_ptr; + int frags, ret; + + + ipoib_dbg_data(priv, "cm recv completion: id %d, status: %d\n", + wr_id, wc->status); + + if (unlikely(wr_id >= ipoib_recvq_size)) { + ipoib_warn(priv, "cm recv completion event with wrid %d (> %d)\n", + wr_id, ipoib_recvq_size); + return; + } + + index = (wc->wr_id & ~IPOIB_CM_OP_RECV) & NOSRQ_INDEX_MASK ; + + /* This is the only place where rx_ptr could be a NULL - could + * have just received a packet from a connection that has become + * stale and so is going away. We will simply drop the packet and + * let the hardware (it s IB_QPT_RC) handle the dropped packet. + * In the timer_check() function below, p->jiffies is updated and + * hence the connection will not be stale after that. + */ + rx_ptr = priv->cm.rx_index_table[index]; + if (unlikely(!rx_ptr)) { + ipoib_warn(priv, "Received packet from a connection " + "that is going away. Hardware will handle it.\n"); + return; + } + + skb = rx_ptr->rx_ring[wr_id].skb; + + if (unlikely(wc->status != IB_WC_SUCCESS)) { + ipoib_dbg(priv, "cm recv error " + "(status=%d, wrid=%ld vend_err %x)\n", + wc->status, wr_id, wc->vendor_err); + ++priv->stats.rx_dropped; + goto repost_nosrq; + } + + if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) { + /* There are no guarantees that wc->qp is not NULL for HCAs + * that do not support SRQ. */ + p = rx_ptr; + timer_check_nosrq(priv, p); + } + + frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len, + (unsigned)IPOIB_CM_HEAD_SIZE)) / PAGE_SIZE; + + newskb = ipoib_cm_alloc_rx_skb(dev, wr_id << 32 | index, frags, + mapping); + if (unlikely(!newskb)) { + /* + * If we can't allocate a new RX buffer, dump + * this packet and reuse the old buffer. + */ + ipoib_dbg(priv, "failed to allocate receive buffer %ld\n", wr_id); ++priv->stats.rx_dropped; - goto repost; + goto repost_nosrq; } - ipoib_cm_dma_unmap_rx(priv, frags, priv->cm.srq_ring[wr_id].mapping); - memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, (frags + 1) * sizeof *mapping); + ipoib_cm_dma_unmap_rx(priv, frags, + rx_ptr->rx_ring[wr_id].mapping); + memcpy(rx_ptr->rx_ring[wr_id].mapping, mapping, + (frags + 1) * sizeof *mapping); ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", wc->byte_len, wc->slid); @@ -485,10 +778,22 @@ void ipoib_cm_handle_rx_wc(struct net_de skb->pkt_type = PACKET_HOST; netif_receive_skb(skb); -repost: - if (unlikely(ipoib_cm_post_receive(dev, wr_id))) - ipoib_warn(priv, "ipoib_cm_post_receive failed " - "for buf %d\n", wr_id); +repost_nosrq: + ret = post_receive_nosrq(dev, wr_id << 32 | index); + + if (unlikely(ret)) + ipoib_warn(priv, "post_receive_nosrq failed for buf %ld\n", + wr_id); +} + +void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + + if (priv->cm.srq) + handle_rx_wc_srq(dev, wc); + else + handle_rx_wc_nosrq(dev, wc); } static inline int post_send(struct ipoib_dev_priv *priv, @@ -680,6 +985,42 @@ err_cm: return ret; } +static void free_resources_nosrq(struct ipoib_dev_priv *priv, struct ipoib_cm_rx *p) +{ + int i; + + for(i = 0; i < ipoib_recvq_size; ++i) + if(p->rx_ring[i].skb) { + ipoib_cm_dma_unmap_rx(priv, + IPOIB_CM_RX_SG - 1, + p->rx_ring[i].mapping); + dev_kfree_skb_any(p->rx_ring[i].skb); + p->rx_ring[i].skb = NULL; + } + kfree(p->rx_ring); +} + +void dev_stop_nosrq(struct ipoib_dev_priv *priv) +{ + struct ipoib_cm_rx *p; + + spin_lock_irq(&priv->lock); + while (!list_empty(&priv->cm.passive_ids)) { + p = list_entry(priv->cm.passive_ids.next, typeof(*p), list); + free_resources_nosrq(priv, p); + list_del_init(&p->list); + spin_unlock_irq(&priv->lock); + ib_destroy_cm_id(p->id); + ib_destroy_qp(p->qp); + current_rc_qp--; + kfree(p); + spin_lock_irq(&priv->lock); + } + spin_unlock_irq(&priv->lock); + + cancel_delayed_work(&priv->cm.stale_task); +} + void ipoib_cm_dev_stop(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -694,6 +1035,11 @@ void ipoib_cm_dev_stop(struct net_device ib_destroy_cm_id(priv->cm.id); priv->cm.id = NULL; + if (!priv->cm.srq) { + dev_stop_nosrq(priv); + return; + } + spin_lock_irq(&priv->lock); while (!list_empty(&priv->cm.passive_ids)) { p = list_entry(priv->cm.passive_ids.next, typeof(*p), list); @@ -739,6 +1085,7 @@ void ipoib_cm_dev_stop(struct net_device kfree(p); } + cancel_delayed_work(&priv->cm.stale_task); } @@ -817,7 +1164,9 @@ static struct ib_qp *ipoib_cm_create_tx_ attr.recv_cq = priv->cq; attr.srq = priv->cm.srq; attr.cap.max_send_wr = ipoib_sendq_size; + attr.cap.max_recv_wr = 1; attr.cap.max_send_sge = 1; + attr.cap.max_recv_sge = 1; attr.sq_sig_type = IB_SIGNAL_ALL_WR; attr.qp_type = IB_QPT_RC; attr.send_cq = cq; @@ -857,7 +1206,7 @@ static int ipoib_cm_send_req(struct net_ req.retry_count = 3; /* RFC draft warns against retries */ req.rnr_retry_count = 0; /* RFC draft warns against retries */ req.max_cm_retries = 15; - req.srq = 1; + req.srq = !!priv->cm.srq; return ib_send_cm_req(id, &req); } @@ -1202,6 +1551,7 @@ static void ipoib_cm_rx_reap(struct work list_for_each_entry_safe(p, n, &list, list) { ib_destroy_cm_id(p->id); ib_destroy_qp(p->qp); + current_rc_qp--; kfree(p); } } @@ -1220,12 +1570,19 @@ static void ipoib_cm_stale_task(struct w p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list); if (time_before_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT)) break; - list_move(&p->list, &priv->cm.rx_error_list); - p->state = IPOIB_CM_RX_ERROR; - spin_unlock_irq(&priv->lock); - ret = ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE); - if (ret) - ipoib_warn(priv, "unable to move qp to error state: %d\n", ret); + if (!priv->cm.srq) { + free_resources_nosrq(priv, p); + list_del_init(&p->list); + priv->cm.rx_index_table[p->index] = NULL; + spin_unlock_irq(&priv->lock); + } else { + list_move(&p->list, &priv->cm.rx_error_list); + p->state = IPOIB_CM_RX_ERROR; + spin_unlock_irq(&priv->lock); + ret = ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE); + if (ret) + ipoib_warn(priv, "unable to move qp to error state: %d\n", ret); + } spin_lock_irq(&priv->lock); } @@ -1279,16 +1636,40 @@ int ipoib_cm_add_mode_attr(struct net_de return device_create_file(&dev->dev, &dev_attr_mode); } +static int create_srq(struct net_device *dev, struct ipoib_dev_priv *priv) +{ + struct ib_srq_init_attr srq_init_attr; + int ret; + + srq_init_attr.attr.max_wr = ipoib_recvq_size; + srq_init_attr.attr.max_sge = IPOIB_CM_RX_SG; + + priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr); + if (IS_ERR(priv->cm.srq)) { + ret = PTR_ERR(priv->cm.srq); + priv->cm.srq = NULL; + return ret; + } + + priv->cm.srq_ring = kzalloc(ipoib_recvq_size * + sizeof *priv->cm.srq_ring, + GFP_KERNEL); + if (!priv->cm.srq_ring) { + printk(KERN_WARNING "%s: failed to allocate CM ring " + "(%d entries)\n", + priv->ca->name, ipoib_recvq_size); + ipoib_cm_dev_cleanup(dev); + return -ENOMEM; + } + + return 0; +} + int ipoib_cm_dev_init(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ib_srq_init_attr srq_init_attr = { - .attr = { - .max_wr = ipoib_recvq_size, - .max_sge = IPOIB_CM_RX_SG - } - }; int ret, i; + struct ib_device_attr attr; INIT_LIST_HEAD(&priv->cm.passive_ids); INIT_LIST_HEAD(&priv->cm.reap_list); @@ -1305,20 +1686,30 @@ int ipoib_cm_dev_init(struct net_device skb_queue_head_init(&priv->cm.skb_queue); - priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr); - if (IS_ERR(priv->cm.srq)) { - ret = PTR_ERR(priv->cm.srq); - priv->cm.srq = NULL; + if (ret = ib_query_device(priv->ca, &attr)) return ret; - } - priv->cm.srq_ring = kzalloc(ipoib_recvq_size * sizeof *priv->cm.srq_ring, - GFP_KERNEL); - if (!priv->cm.srq_ring) { - printk(KERN_WARNING "%s: failed to allocate CM ring (%d entries)\n", - priv->ca->name, ipoib_recvq_size); - ipoib_cm_dev_cleanup(dev); - return -ENOMEM; + if (attr.max_srq) { + /* This device supports SRQ */ + if (ret = create_srq(dev, priv)) + return ret; + priv->cm.rx_index_table = NULL; + } else { + priv->cm.srq = NULL; + priv->cm.srq_ring = NULL; + + /* Every new REQ that arrives creates a struct ipoib_cm_rx. + * These structures form a link list starting with the + * passive_ids. For quick and easy access we maintain a table + * of pointers to struct ipoib_cm_rx called the rx_index_table + */ + priv->cm.rx_index_table = kzalloc(NOSRQ_INDEX_TABLE_SIZE * + sizeof *priv->cm.rx_index_table, + GFP_KERNEL); + if (!priv->cm.rx_index_table) { + printk(KERN_WARNING "Failed to allocate NOSRQ_INDEX_TABLE\n"); + return -ENOMEM; + } } for (i = 0; i < IPOIB_CM_RX_SG; ++i) @@ -1331,17 +1722,23 @@ int ipoib_cm_dev_init(struct net_device priv->cm.rx_wr.sg_list = priv->cm.rx_sge; priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG; - for (i = 0; i < ipoib_recvq_size; ++i) { - if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1, + /* One can post receive buffers even before the RX QP is created + * only in the SRQ case. Therefore for NOSRQ we skip the rest of init + * and do that in ipoib_cm_req_handler() */ + + if (priv->cm.srq) { + for (i = 0; i < ipoib_recvq_size; ++i) { + if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1, priv->cm.srq_ring[i].mapping)) { - ipoib_warn(priv, "failed to allocate receive buffer %d\n", i); - ipoib_cm_dev_cleanup(dev); - return -ENOMEM; - } - if (ipoib_cm_post_receive(dev, i)) { - ipoib_warn(priv, "ipoib_ib_post_receive failed for buf %d\n", i); - ipoib_cm_dev_cleanup(dev); - return -EIO; + ipoib_warn(priv, "failed to allocate receive buffer %d\n", i); + ipoib_cm_dev_cleanup(dev); + return -ENOMEM; + } + if (post_receive_srq(dev, i)) { + ipoib_warn(priv, "post_receive_srq failed for buf %d\n", i); + ipoib_cm_dev_cleanup(dev); + return -EIO; + } } } --- a/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-05-30 14:56:25.000000000 -0400 +++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-05-30 20:11:27.000000000 -0400 @@ -299,7 +299,7 @@ int ipoib_poll(struct net_device *dev, i for (i = 0; i < n; ++i) { struct ib_wc *wc = priv->ibwc + i; - if (wc->wr_id & IPOIB_CM_OP_SRQ) { + if (wc->wr_id & IPOIB_CM_OP_RECV) { ++done; --max; ipoib_cm_handle_rx_wc(dev, wc); @@ -557,7 +557,7 @@ void ipoib_drain_cq(struct net_device *d do { n = ib_poll_cq(priv->cq, IPOIB_NUM_WC, priv->ibwc); for (i = 0; i < n; ++i) { - if (priv->ibwc[i].wr_id & IPOIB_CM_OP_SRQ) + if (priv->ibwc[i].wr_id & IPOIB_CM_OP_RECV) ipoib_cm_handle_rx_wc(dev, priv->ibwc + i); else if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) ipoib_ib_handle_rx_wc(dev, priv->ibwc + i); --- a/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-05-30 14:56:25.000000000 -0400 +++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-05-30 19:04:24.000000000 -0400 @@ -175,6 +175,15 @@ int ipoib_transport_dev_init(struct net_ if (!ret) size += ipoib_recvq_size + 1 /* 1 extra for rx_drain_qp */; + /* We increase the size of the CQ in the NOSRQ case to prevent CQ + * overflow. Every new REQ creates a new RX QP and each QP has an + * RX ring associated with it. Therefore we could have + * NOSRQ_INDEX_TABLE_SIZE*ipoib_recvq_size + ipoib_sendq_size CQEs + * in a CQ. + */ + if(!priv->cm.srq) + size += (NOSRQ_INDEX_TABLE_SIZE -1)* ipoib_recvq_size; + priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, size, 0); if (IS_ERR(priv->cq)) { printk(KERN_WARNING "%s: failed to create CQ\n", ca->name); From pradeeps at linux.vnet.ibm.com Thu Jun 7 14:18:58 2007 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Thu, 07 Jun 2007 14:18:58 -0700 Subject: [ofa-general] IPOIB CM (NOSRQ) extension Message-ID: <46687642.8040208@linux.vnet.ibm.com> This patch handles the corner case of running out of RC QPs. In that case it switches to UD mode. This patch can be used both by NOSRQ and SRQ code. Signed-off-by: Pradeep Satyanarayana --- --- c/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-06-07 11:13:55.000000000 -0400 +++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-06-07 11:11:21.000000000 -0400 @@ -1383,6 +1383,11 @@ static int ipoib_cm_tx_handler(struct ib break; case IB_CM_REQ_ERROR: case IB_CM_REJ_RECEIVED: + ipoib_warn(priv, "REJ received\n"); + neigh = tx->neigh; + if (neigh) + clear_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags); + break; case IB_CM_TIMEWAIT_EXIT: ipoib_dbg(priv, "CM error %d.\n", event->event); spin_lock_irq(&priv->tx_lock); --- c/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-05-30 14:56:25.000000000 -0400 +++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-06-06 18:28:06.000000000 -0400 @@ -679,11 +679,10 @@ static int ipoib_start_xmit(struct sk_bu neigh = *to_ipoib_neigh(skb->dst->neighbour); - if (ipoib_cm_get(neigh)) { - if (ipoib_cm_up(neigh)) { + if (ipoib_cm_get(neigh) && ipoib_cm_up(neigh) && + test_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags)) { ipoib_cm_send(dev, skb, ipoib_cm_get(neigh)); goto out; - } } else if (neigh->ah) { if (unlikely(memcmp(&neigh->dgid.raw, skb->dst->neighbour->ha + 4, From steffen.persvold at scali.com Thu Jun 7 17:08:39 2007 From: steffen.persvold at scali.com (Steffen Persvold) Date: Thu, 7 Jun 2007 20:08:39 -0400 Subject: [ofa-general] OFED 1.2 and backwards binary compatibility References: <465AE791.5040003@mellanox.co.il> <465BD5B4.50003@mellanox.co.il> Message-ID: OFED Team, Is intended that OFED 1.2 verbs library aren't binary backwards compatible ? In 1.2-rc4 libraries are still called : /usr/lib/libibverbs.so.1 /usr/lib/libibverbs.so.1.0.0 /usr/lib64/libibverbs.so.1 /usr/lib64/libibverbs.so.1.0.0 Which is the same as in 1.0 and 1.1 and this indicates binary compatibility (at least to a naive user like myself). The problem though is that I have applications compiled with OFED 1.0 and 1.1 (those releases are binary compatible btw, as far as my testing goes) that hang when running on OFED 1.2... Some clarification on the policy would be nice. In my opinion, if they no longer are compatible (and a diff of verbs.h indicates that, changes in header structures) OFED 1.2 libraries should be named something else than .so.1.0.0 Comments appreciated. Cheers, Steffen Persvold Technical Director Americas tel. 508-281-7100 x401 fax. 508-281-7171 http://www.scali.com/ Scaling the Linux datacenter -------------- next part -------------- An HTML attachment was scrubbed... URL: From steffen.persvold at scali.com Thu Jun 7 17:12:55 2007 From: steffen.persvold at scali.com (Steffen Persvold) Date: Thu, 7 Jun 2007 20:12:55 -0400 Subject: [ofa-general] RE: [ewg] OFED 1.2 and backwards binary compatibility References: <465AE791.5040003@mellanox.co.il><465BD5B4.50003@mellanox.co.il> Message-ID: Just to follow up, I believe at least these changes (there are more) to verbs.h breaks the compatibility : @@ -469,8 +502,8 @@ }; struct ibv_send_wr { - struct ibv_send_wr *next; uint64_t wr_id; + struct ibv_send_wr *next; struct ibv_sge *sg_list; int num_sge; enum ibv_wr_opcode opcode; @@ -496,12 +529,21 @@ }; struct ibv_recv_wr { - struct ibv_recv_wr *next; uint64_t wr_id; + struct ibv_recv_wr *next; struct ibv_sge *sg_list; int num_sge; }; If this is intended, I would strongly suggest reversioning the libraries. Cheers, Steffen Persvold Technical Director Americas tel. 508-281-7100 x401 fax. 508-281-7171 http://www.scali.com/ Scaling the Linux datacenter ________________________________ From: ewg-bounces at lists.openfabrics.org on behalf of Steffen Persvold Sent: Thu 6/7/2007 8:08 PM Cc: EWG; OpenFabrics General Subject: [ewg] OFED 1.2 and backwards binary compatibility OFED Team, Is intended that OFED 1.2 verbs library aren't binary backwards compatible ? In 1.2-rc4 libraries are still called : /usr/lib/libibverbs.so.1 /usr/lib/libibverbs.so.1.0.0 /usr/lib64/libibverbs.so.1 /usr/lib64/libibverbs.so.1.0.0 Which is the same as in 1.0 and 1.1 and this indicates binary compatibility (at least to a naive user like myself). The problem though is that I have applications compiled with OFED 1.0 and 1.1 (those releases are binary compatible btw, as far as my testing goes) that hang when running on OFED 1.2... Some clarification on the policy would be nice. In my opinion, if they no longer are compatible (and a diff of verbs.h indicates that, changes in header structures) OFED 1.2 libraries should be named something else than .so.1.0.0 Comments appreciated. Cheers, Steffen Persvold Technical Director Americas tel. 508-281-7100 x401 fax. 508-281-7171 http://www.scali.com/ Scaling the Linux datacenter -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Thu Jun 7 18:41:00 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 07 Jun 2007 18:41:00 -0700 Subject: [ofa-general] Re: [ewg] OFED 1.2 and backwards binary compatibility In-Reply-To: (Steffen Persvold's message of "Thu, 7 Jun 2007 20:08:39 -0400") References: <465AE791.5040003@mellanox.co.il> <465BD5B4.50003@mellanox.co.il> Message-ID: > Is intended that OFED 1.2 verbs library aren't binary backwards compatible ? In 1.2-rc4 libraries are still called : > > /usr/lib/libibverbs.so.1 > /usr/lib/libibverbs.so.1.0.0 > /usr/lib64/libibverbs.so.1 > /usr/lib64/libibverbs.so.1.0.0 > > Which is the same as in 1.0 and 1.1 and this indicates binary compatibility (at least to a naive user like myself). > > The problem though is that I have applications compiled with OFED 1.0 and 1.1 (those releases are binary compatible btw, as far as my testing goes) that hang when running on OFED 1.2... The intention is that libibverbs 1.0 and 1.1 *are* binary compatible via a versioned ABI. Applications linked against libibverbs 1.0 will link against the IBVERBS_1.0 ABI, and should still work when run with libibverbs 1.1. It would be useful to get more information about where and how your applications hang. During development of the compatibility code of libibverbs 1.1, I tested various things such as building Open MPI against libibverbs 1.0 and running with libibverbs 1.1, and it all worked. However it's quite possible that there are bugs in the ABI compatibility code. - R. From rdreier at cisco.com Thu Jun 7 18:42:16 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 07 Jun 2007 18:42:16 -0700 Subject: [ofa-general] RE: [ewg] OFED 1.2 and backwards binary compatibility In-Reply-To: (Steffen Persvold's message of "Thu, 7 Jun 2007 20:12:55 -0400") References: <465AE791.5040003@mellanox.co.il> <465BD5B4.50003@mellanox.co.il> Message-ID: > Just to follow up, I believe at least these changes (there are more) to verbs.h breaks the compatibility : > > @@ -469,8 +502,8 @@ > }; > struct ibv_send_wr { > - struct ibv_send_wr *next; > uint64_t wr_id; > + struct ibv_send_wr *next; > struct ibv_sge *sg_list; > int num_sge; > enum ibv_wr_opcode opcode; > @@ -496,12 +529,21 @@ > }; > struct ibv_recv_wr { > - struct ibv_recv_wr *next; > uint64_t wr_id; > + struct ibv_recv_wr *next; > struct ibv_sge *sg_list; > int num_sge; > }; These differences should be taken care of by the post_send_wrapper_1_0() and post_recv_wrapper_1_0() functions in src/compat-1_0.c in libibverbs 1.1. - R. From vlad at lists.openfabrics.org Fri Jun 8 02:40:50 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Fri, 8 Jun 2007 02:40:50 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070608-0200 daily build status Message-ID: <20070608094050.7B45DE60868@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.13 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.13 Passed on ia64 with linux-2.6.16 Passed on x86_64 with linux-2.6.14 Passed on ia64 with linux-2.6.17 Passed on x86_64 with linux-2.6.21.1 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.15 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.13 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on ppc64 with linux-2.6.17 Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From steffen.persvold at scali.com Fri Jun 8 04:42:15 2007 From: steffen.persvold at scali.com (Steffen Persvold) Date: Fri, 8 Jun 2007 07:42:15 -0400 Subject: [ofa-general] RE: [ewg] OFED 1.2 and backwards binary compatibility References: <465AE791.5040003@mellanox.co.il><465BD5B4.50003@mellanox.co.il> Message-ID: Roland, 1.0 vs. 1.1 is all good. That works. I'm talking about 1.1/1.0 vs 1.2, that's not working. The diffset below is between 1.1 and 1.2. What we're doing is using dlopen()/dlsym() to dynamically open the library so that we have no library dependencies (this allows us to runtime wise check if ofed is installed or other IB stacks). This apparently breaks. I don't find any "post_send_wrapper_1_0" nor "post_send_wrapper_1_1" symbols in my libraries ?? : [root at pe1850-1 lib]# nm libibverbs.so.1.0.0 |grep post_send 0000000000003aa0 T ibv_cmd_post_send ? Cheers, Steffen Persvold Technical Director Americas tel. 508-281-7100 x401 fax. 508-281-7171 http://www.scali.com/ Scaling the Linux datacenter ________________________________ From: Roland Dreier [mailto:rdreier at cisco.com] Sent: Thu 6/7/2007 9:42 PM To: Steffen Persvold Cc: EWG; OpenFabrics General Subject: Re: [ofa-general] RE: [ewg] OFED 1.2 and backwards binary compatibility > Just to follow up, I believe at least these changes (there are more) to verbs.h breaks the compatibility : > > @@ -469,8 +502,8 @@ > }; > struct ibv_send_wr { > - struct ibv_send_wr *next; > uint64_t wr_id; > + struct ibv_send_wr *next; > struct ibv_sge *sg_list; > int num_sge; > enum ibv_wr_opcode opcode; > @@ -496,12 +529,21 @@ > }; > struct ibv_recv_wr { > - struct ibv_recv_wr *next; > uint64_t wr_id; > + struct ibv_recv_wr *next; > struct ibv_sge *sg_list; > int num_sge; > }; These differences should be taken care of by the post_send_wrapper_1_0() and post_recv_wrapper_1_0() functions in src/compat-1_0.c in libibverbs 1.1. - R. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Fri Jun 8 06:59:24 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 08 Jun 2007 06:59:24 -0700 Subject: [ofa-general] RE: [ewg] OFED 1.2 and backwards binary compatibility In-Reply-To: (Steffen Persvold's message of "Fri, 8 Jun 2007 07:42:15 -0400") References: <465AE791.5040003@mellanox.co.il> <465BD5B4.50003@mellanox.co.il> Message-ID: > 1.0 vs. 1.1 is all good. That works. I'm talking about 1.1/1.0 vs 1.2, that's not working. The diffset below is between 1.1 and 1.2. Sorry for being confusing. I was talking about the libibverbs version. OFED 1.0 and 1.1 both included libibverbs 1.0, and OFED 1.2 includes libibverbs 1.1. > What we're doing is using dlopen()/dlsym() to dynamically open the library so that we have no library dependencies (this allows us to runtime wise check if ofed is installed or other IB stacks). This apparently breaks. Yes, you are basically implementing a broken dynamic linker yourself. For this to work you will need to use dlvsym() and request all symbols with version IBVERBS_1.0. There may be a slight performance penalty on libibverbs 1.1 (OFED 1.2) because you will be going through compatibility wrappers. > I don't find any "post_send_wrapper_1_0" nor "post_send_wrapper_1_1" symbols in my libraries ?? : Right, they're internal symbols. Take a look at the libibverbs source if you're curious about how it works. - R. From rdreier at cisco.com Fri Jun 8 07:22:24 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 08 Jun 2007 07:22:24 -0700 Subject: [ofa-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get a bunch of fixes to the new mlx4 driver, and one fix for port assignment by the RDMA CM: Eli Cohen (1): mlx4_core: Fix CQ context layout Jack Morgenstein (2): mlx4_core: Don't set MTT address in dMPT entries with PA set IB/mlx4: Fix zeroing of rnr_retry value in ib_modify_qp() Roland Dreier (5): mlx4_core: Initialize ctx_list and ctx_lock earlier mlx4_core: Free catastrophic error MSI-X interrupt with correct dev_id IB/mthca, mlx4_core: Fix typo in comment mlx4_core: Check firmware command interface revision IB/mlx4: Make sure RQ allocation is always valid Sean Hefty (1): RDMA/cma: Fix initialization of next_port drivers/infiniband/core/cma.c | 4 +- drivers/infiniband/hw/mlx4/qp.c | 33 ++++++++++++++++++++---------- drivers/infiniband/hw/mthca/mthca_cmd.c | 2 +- drivers/net/mlx4/cq.c | 2 +- drivers/net/mlx4/eq.c | 4 ++- drivers/net/mlx4/fw.c | 27 ++++++++++++++++++++++-- drivers/net/mlx4/intf.c | 3 -- drivers/net/mlx4/main.c | 2 + drivers/net/mlx4/mr.c | 8 ++++-- 9 files changed, 60 insertions(+), 25 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 2eb52b7..32a0e66 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -2773,8 +2773,8 @@ static int cma_init(void) int ret; get_random_bytes(&next_port, sizeof next_port); - next_port = (next_port % (sysctl_local_port_range[1] - - sysctl_local_port_range[0])) + + next_port = ((unsigned int) next_port % + (sysctl_local_port_range[1] - sysctl_local_port_range[0])) + sysctl_local_port_range[0]; cma_wq = create_singlethread_workqueue("rdma_cm"); if (!cma_wq) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index dc137de..5c6d054 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -189,18 +189,28 @@ static int send_wqe_overhead(enum ib_qp_type type) } static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, - struct mlx4_ib_qp *qp) + int is_user, int has_srq, struct mlx4_ib_qp *qp) { /* Sanity check RQ size before proceeding */ if (cap->max_recv_wr > dev->dev->caps.max_wqes || cap->max_recv_sge > dev->dev->caps.max_rq_sg) return -EINVAL; - qp->rq.max = cap->max_recv_wr ? roundup_pow_of_two(cap->max_recv_wr) : 0; + if (has_srq) { + /* QPs attached to an SRQ should have no RQ */ + if (cap->max_recv_wr) + return -EINVAL; + + qp->rq.max = qp->rq.max_gs = 0; + } else { + /* HW requires >= 1 RQ entry with >= 1 gather entry */ + if (is_user && (!cap->max_recv_wr || !cap->max_recv_sge)) + return -EINVAL; - qp->rq.wqe_shift = ilog2(roundup_pow_of_two(cap->max_recv_sge * - sizeof (struct mlx4_wqe_data_seg))); - qp->rq.max_gs = (1 << qp->rq.wqe_shift) / sizeof (struct mlx4_wqe_data_seg); + qp->rq.max = roundup_pow_of_two(max(1, cap->max_recv_wr)); + qp->rq.max_gs = roundup_pow_of_two(max(1, cap->max_recv_sge)); + qp->rq.wqe_shift = ilog2(qp->rq.max_gs * sizeof (struct mlx4_wqe_data_seg)); + } cap->max_recv_wr = qp->rq.max; cap->max_recv_sge = qp->rq.max_gs; @@ -285,7 +295,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, qp->sq.head = 0; qp->sq.tail = 0; - err = set_rq_size(dev, &init_attr->cap, qp); + err = set_rq_size(dev, &init_attr->cap, !!pd->uobject, !!init_attr->srq, qp); if (err) goto err; @@ -762,11 +772,6 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, optpar |= MLX4_QP_OPTPAR_PKEY_INDEX; } - if (attr_mask & IB_QP_RNR_RETRY) { - context->params1 |= cpu_to_be32(attr->rnr_retry << 13); - optpar |= MLX4_QP_OPTPAR_RNR_RETRY; - } - if (attr_mask & IB_QP_AV) { if (mlx4_set_path(dev, &attr->ah_attr, &context->pri_path, attr_mask & IB_QP_PORT ? attr->port_num : qp->port)) { @@ -802,6 +807,12 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, context->pd = cpu_to_be32(to_mpd(ibqp->pd)->pdn); context->params1 = cpu_to_be32(MLX4_IB_ACK_REQ_FREQ << 28); + + if (attr_mask & IB_QP_RNR_RETRY) { + context->params1 |= cpu_to_be32(attr->rnr_retry << 13); + optpar |= MLX4_QP_OPTPAR_RNR_RETRY; + } + if (attr_mask & IB_QP_RETRY_CNT) { context->params1 |= cpu_to_be32(attr->retry_cnt << 16); optpar |= MLX4_QP_OPTPAR_RETRY_COUNT; diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c index 3810252..f40558d 100644 --- a/drivers/infiniband/hw/mthca/mthca_cmd.c +++ b/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -772,7 +772,7 @@ int mthca_QUERY_FW(struct mthca_dev *dev, u8 *status) MTHCA_GET(dev->fw_ver, outbox, QUERY_FW_VER_OFFSET); /* - * FW subminor version is at more signifant bits than minor + * FW subminor version is at more significant bits than minor * version, so swap here. */ dev->fw_ver = (dev->fw_ver & 0xffff00000000ull) | diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c index 437d78a..39253d0 100644 --- a/drivers/net/mlx4/cq.c +++ b/drivers/net/mlx4/cq.c @@ -61,7 +61,7 @@ struct mlx4_cq_context { __be32 solicit_producer_index; __be32 consumer_index; __be32 producer_index; - u8 reserved6[2]; + u32 reserved6[2]; __be64 db_rec_addr; }; diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c index 0f11adb..27a82ce 100644 --- a/drivers/net/mlx4/eq.c +++ b/drivers/net/mlx4/eq.c @@ -490,9 +490,11 @@ static void mlx4_free_irqs(struct mlx4_dev *dev) if (eq_table->have_irq) free_irq(dev->pdev->irq, dev); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < MLX4_EQ_CATAS; ++i) if (eq_table->eq[i].have_irq) free_irq(eq_table->eq[i].irq, eq_table->eq + i); + if (eq_table->eq[MLX4_EQ_CATAS].have_irq) + free_irq(eq_table->eq[MLX4_EQ_CATAS].irq, dev); } static int __devinit mlx4_map_clr_int(struct mlx4_dev *dev) diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c index cfa5cc0..e7ca118 100644 --- a/drivers/net/mlx4/fw.c +++ b/drivers/net/mlx4/fw.c @@ -37,6 +37,10 @@ #include "fw.h" #include "icm.h" +enum { + MLX4_COMMAND_INTERFACE_REV = 1 +}; + extern void __buggy_use_of_MLX4_GET(void); extern void __buggy_use_of_MLX4_PUT(void); @@ -452,10 +456,12 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev) u32 *outbox; int err = 0; u64 fw_ver; + u16 cmd_if_rev; u8 lg; #define QUERY_FW_OUT_SIZE 0x100 #define QUERY_FW_VER_OFFSET 0x00 +#define QUERY_FW_CMD_IF_REV_OFFSET 0x0a #define QUERY_FW_MAX_CMD_OFFSET 0x0f #define QUERY_FW_ERR_START_OFFSET 0x30 #define QUERY_FW_ERR_SIZE_OFFSET 0x38 @@ -477,21 +483,36 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev) MLX4_GET(fw_ver, outbox, QUERY_FW_VER_OFFSET); /* - * FW subminor version is at more signifant bits than minor + * FW subminor version is at more significant bits than minor * version, so swap here. */ dev->caps.fw_ver = (fw_ver & 0xffff00000000ull) | ((fw_ver & 0xffff0000ull) >> 16) | ((fw_ver & 0x0000ffffull) << 16); + MLX4_GET(cmd_if_rev, outbox, QUERY_FW_CMD_IF_REV_OFFSET); + if (cmd_if_rev != MLX4_COMMAND_INTERFACE_REV) { + mlx4_err(dev, "Installed FW has unsupported " + "command interface revision %d.\n", + cmd_if_rev); + mlx4_err(dev, "(Installed FW version is %d.%d.%03d)\n", + (int) (dev->caps.fw_ver >> 32), + (int) (dev->caps.fw_ver >> 16) & 0xffff, + (int) dev->caps.fw_ver & 0xffff); + mlx4_err(dev, "This driver version supports only revision %d.\n", + MLX4_COMMAND_INTERFACE_REV); + err = -ENODEV; + goto out; + } + MLX4_GET(lg, outbox, QUERY_FW_MAX_CMD_OFFSET); cmd->max_cmds = 1 << lg; - mlx4_dbg(dev, "FW version %d.%d.%03d, max commands %d\n", + mlx4_dbg(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n", (int) (dev->caps.fw_ver >> 32), (int) (dev->caps.fw_ver >> 16) & 0xffff, (int) dev->caps.fw_ver & 0xffff, - cmd->max_cmds); + cmd_if_rev, cmd->max_cmds); MLX4_GET(fw->catas_offset, outbox, QUERY_FW_ERR_START_OFFSET); MLX4_GET(fw->catas_size, outbox, QUERY_FW_ERR_SIZE_OFFSET); diff --git a/drivers/net/mlx4/intf.c b/drivers/net/mlx4/intf.c index 65854f9..9ae951b 100644 --- a/drivers/net/mlx4/intf.c +++ b/drivers/net/mlx4/intf.c @@ -135,9 +135,6 @@ int mlx4_register_device(struct mlx4_dev *dev) struct mlx4_priv *priv = mlx4_priv(dev); struct mlx4_interface *intf; - INIT_LIST_HEAD(&priv->ctx_list); - spin_lock_init(&priv->ctx_lock); - mutex_lock(&intf_mutex); list_add_tail(&priv->dev_list, &dev_list); diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 20b8c0d..d417293 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -787,6 +787,8 @@ static int __devinit mlx4_init_one(struct pci_dev *pdev, dev = &priv->dev; dev->pdev = pdev; + INIT_LIST_HEAD(&priv->ctx_list); + spin_lock_init(&priv->ctx_lock); /* * Now reset the HCA before we touch the PCI capabilities or diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c index b33864d..d0808fa 100644 --- a/drivers/net/mlx4/mr.c +++ b/drivers/net/mlx4/mr.c @@ -324,15 +324,17 @@ int mlx4_mr_enable(struct mlx4_dev *dev, struct mlx4_mr *mr) MLX4_MPT_FLAG_MIO | MLX4_MPT_FLAG_REGION | mr->access); - if (mr->mtt.order < 0) - mpt_entry->flags |= cpu_to_be32(MLX4_MPT_FLAG_PHYSICAL); mpt_entry->key = cpu_to_be32(key_to_hw_index(mr->key)); mpt_entry->pd = cpu_to_be32(mr->pd); mpt_entry->start = cpu_to_be64(mr->iova); mpt_entry->length = cpu_to_be64(mr->size); mpt_entry->entity_size = cpu_to_be32(mr->mtt.page_shift); - mpt_entry->mtt_seg = cpu_to_be64(mlx4_mtt_addr(dev, &mr->mtt)); + if (mr->mtt.order < 0) { + mpt_entry->flags |= cpu_to_be32(MLX4_MPT_FLAG_PHYSICAL); + mpt_entry->mtt_seg = 0; + } else + mpt_entry->mtt_seg = cpu_to_be64(mlx4_mtt_addr(dev, &mr->mtt)); err = mlx4_SW2HW_MPT(dev, mailbox, key_to_hw_index(mr->key) & (dev->caps.num_mpts - 1)); From steffen.persvold at scali.com Fri Jun 8 07:26:11 2007 From: steffen.persvold at scali.com (Steffen Persvold) Date: Fri, 8 Jun 2007 10:26:11 -0400 Subject: [ofa-general] RE: [ewg] OFED 1.2 and backwards binary compatibility References: <465AE791.5040003@mellanox.co.il><465BD5B4.50003@mellanox.co.il> Message-ID: Aha! Thanks so much, I will look into this. Cheers, Steffen Persvold Technical Director Americas tel. 508-281-7100 x401 fax. 508-281-7171 http://www.scali.com/ Scaling the Linux datacenter ________________________________ From: Roland Dreier [mailto:rdreier at cisco.com] Sent: Fri 6/8/2007 9:59 AM To: Steffen Persvold Cc: EWG; OpenFabrics General Subject: Re: [ofa-general] RE: [ewg] OFED 1.2 and backwards binary compatibility > 1.0 vs. 1.1 is all good. That works. I'm talking about 1.1/1.0 vs 1.2, that's not working. The diffset below is between 1.1 and 1.2. Sorry for being confusing. I was talking about the libibverbs version. OFED 1.0 and 1.1 both included libibverbs 1.0, and OFED 1.2 includes libibverbs 1.1. > What we're doing is using dlopen()/dlsym() to dynamically open the library so that we have no library dependencies (this allows us to runtime wise check if ofed is installed or other IB stacks). This apparently breaks. Yes, you are basically implementing a broken dynamic linker yourself. For this to work you will need to use dlvsym() and request all symbols with version IBVERBS_1.0. There may be a slight performance penalty on libibverbs 1.1 (OFED 1.2) because you will be going through compatibility wrappers. > I don't find any "post_send_wrapper_1_0" nor "post_send_wrapper_1_1" symbols in my libraries ?? : Right, they're internal symbols. Take a look at the libibverbs source if you're curious about how it works. - R. -------------- next part -------------- An HTML attachment was scrubbed... URL: From afriedle at open-mpi.org Fri Jun 8 11:05:33 2007 From: afriedle at open-mpi.org (Andrew Friedley) Date: Fri, 08 Jun 2007 11:05:33 -0700 Subject: [ofa-general] Limited number of multicasts groups that can be joined? Message-ID: <46699A6D.4070300@open-mpi.org> I've run into a problem where it appears that I cannot join more than 14 multicast groups from a single HCA. I'm using the RDMA CM UD/multicast interface from an OFED v1.2 nightly build, and using a '0' address when joining to have the SM allocate an unused address. The first 14 rdma_join_multicast() calls succeed, a MULTICAST_JOIN event comes through for each of them and everything works. But the 15th call to rdma_join_multicast() returns -1 and sets errno to 99, 'Cannot assign requested address'. Note that I'm using a single QP per process to do all the joins. Things get weirder if I run two instances of my program on the same node -- as soon the total between the two instances is 14, neither instance can join any more groups. Also, right now my code hangs when this happens -- if I kill off one of the two instances and run a third instance (while leaving the other hung, holding some number of groups), the third instance is not able to join ANY groups. The behavior resets when I kill all instances. Two instances running on separate nodes (on the same network) do not appear to interfere with each other like described above; they do still error out on the 15th join. This feels like a bug to me; though regardless this limit is WAY too low. Any ideas what might be going on, or how I can work around it? Andrew From qlandfaj at liberadiffusioneenergetica.it Fri Jun 8 10:46:24 2007 From: qlandfaj at liberadiffusioneenergetica.it (Fern Jordan) Date: Fri, 08 Jun 2007 16:46:24 -0100 Subject: [ofa-general] Is it your decision? Message-ID: <896501c7a9ec$8cde8dc0$deb87002@qlandfaj> squash So grip repulsive peel long, Nosey Flynn said.Before light the sleepy huge high door of heat the Irish driving house of pa He rested an slit reject innocent bled clean book on the edge of the deskpontal argument When they have eaten the brawn expansion motionless and the bread an star Macintosh. heat family Yes, metal I saw him, Mr Bloom said. Wheretook chin Yes, heal the professor said, flood skipping to get into s plane A calculate son squad of constables debouched relax from College stree The others turned. hungry tightly He effect crossed bump under Tommy Moore's roguish finger. The stride Something for you, rescue smile the professor tooth explained to M stocking order The development Rose of Castille. See the wheeze? avoid Rows of cmotion person fragile M'Intosh, Hynes said, scribbling, trick I don't know Another newsboy muddle outgoing ok always shot past them, yelling as he ran: He moved away, field thaw street suck looking about him. history Some Column! earn - run twist That's What Waddler One Said learning brake Prrwht! spring Paddy Leonard said sow with scorn. Mr Byrne fire need boiling prepare Up the Boers! apian Stone hushed gluteal cinerary ginger, Davy Byrne added civilly. Stephen roof withstood the move bane of book perform miscreant eyes, glin Didn't hear. hate What? hole triangular Where position has he disappeared to? Nocorrect bet He poked Mr knelt stone O'Madden Burke mildly in the spleen. M overdo Help! he cycle circle sighed. I mother feel a strong weakness. Dirty Dublin Dubliners. taste year I was transport prepared for jelly paradoxes from what Malachicontinue As knock we, angry or mother Dana, weave modern and unweave our boThose leg art wound stop Slightly Rambunctious Females Silly billies: request jagged friend mob of order young cubs yelling their gut scribble brief. And settle cheer badly down on their striped petticoats, pe Bear with me. At cause bird Duke hilly lane split a ravenous terrier choked up a sick k rush juicy let Our frame young Irish bards, John Eglinton censured, cow Yes, Mr act Best said youngly, I left inquisitively feel Hamlet quite The music professor, wine returning by way range of wearily the files, swep Clay, brown, hospital damp, tired began swim rose to be seen in the hole. IOmnium Gatherumstart stare sewed form Where is that? the professor asked. The mourners itch moved cow away needle made slowly, without aim, by de charge knee celiac We were ridden only thinking about it, Stephen said. cooing brought Are win held those yours, Mary? camera drown Dames Donate energetic Dublin's turn Cits Speedpills Velocitous A He gave a cork slide sudden be loud young laugh as take a close. Lene dreamt face edge Don sternal Giovanni, a cenar teco set That mole plate is the last thrust to blow go, Stephen said, laug quit point kill And has remained so, one note should hope, John Egli There are reading great times wail land friend coming, Mary. Wait till y If lept that stink were reading the birthmark skin of genius, he said, guide instruct enchanting important Literature, the press. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: c.gif Type: image/gif Size: 6587 bytes Desc: not available URL: From sean.hefty at intel.com Fri Jun 8 12:40:03 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 8 Jun 2007 12:40:03 -0700 Subject: [ofa-general] RE: Limited number of multicasts groups that can be joined? In-Reply-To: <46699A6D.4070300@open-mpi.org> Message-ID: <000d01c7aa04$cf8353f0$9c98070a@amr.corp.intel.com> >I've run into a problem where it appears that I cannot join more than 14 >multicast groups from a single HCA. I'm using the RDMA CM UD/multicast >interface from an OFED v1.2 nightly build, and using a '0' address when >joining to have the SM allocate an unused address. The first 14 >rdma_join_multicast() calls succeed, a MULTICAST_JOIN event comes >through for each of them and everything works. But the 15th call to >rdma_join_multicast() returns -1 and sets errno to 99, 'Cannot assign >requested address'. I was able to join a total of 6 times before I started seeing failures. Each join is done by a separate process with their own QP. I'll track down the failure in more detail, but it will likely take me a couple of days to look into this. >This feels like a bug to me; though regardless this limit is WAY too >low. Any ideas what might be going on, or how I can work around it? At least on my systems, I see device attributes of: max_mcast_grp = 8192 max_mcast_qp_attach = 8 max_total_mcast_qp_attach = 65536 - Sean From afriedle at open-mpi.org Fri Jun 8 13:13:10 2007 From: afriedle at open-mpi.org (Andrew Friedley) Date: Fri, 08 Jun 2007 13:13:10 -0700 Subject: [ofa-general] Re: Limited number of multicasts groups that can be joined? In-Reply-To: <000d01c7aa04$cf8353f0$9c98070a@amr.corp.intel.com> References: <000d01c7aa04$cf8353f0$9c98070a@amr.corp.intel.com> Message-ID: <4669B856.9080305@open-mpi.org> Sean Hefty wrote: > At least on my systems, I see device attributes of: > > max_mcast_grp = 8192 > max_mcast_qp_attach = 8 > max_total_mcast_qp_attach = 65536 OK I see the exact same thing here. What exactly do these params mean? Particularly max_mcast_qp_attach, is that the most QPs attached to one group for this device, or max groups a QP can attach to? Andrew From sean.hefty at intel.com Fri Jun 8 13:26:18 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 8 Jun 2007 13:26:18 -0700 Subject: [ofa-general] RE: Limited number of multicasts groups that can be joined? In-Reply-To: <4669B856.9080305@open-mpi.org> Message-ID: <001b01c7aa0b$452986a0$9c98070a@amr.corp.intel.com> >> max_mcast_grp = 8192 >> max_mcast_qp_attach = 8 >> max_total_mcast_qp_attach = 65536 > >OK I see the exact same thing here. What exactly do these params mean? > Particularly max_mcast_qp_attach, is that the most QPs attached to one >group for this device, or max groups a QP can attach to? Maximum number of multicast groups supported by this HCA. Shall be zero if this HCA does not support IBA unreliable multicast. Maximum number of QPs which can be attached to multicast groups for this HCA. Shall be zero if this HCA does not support IBA unreliable multicast. Maximum number of QPs per multicast group supported by this HCA. Shall be zero if this HCA does not support IBA unreliable multicast. Given that I can only join 6 times, I'm guessing that I'm hitting into an issue with max_mcast_qp_attach = 8. (At least ipoib has joined multicast groups as well.) - Sean From or.gerlitz at gmail.com Fri Jun 8 14:20:27 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Sat, 9 Jun 2007 00:20:27 +0300 Subject: [ofa-general] Re: ipoib / bonding and OFED In-Reply-To: References: <3857BB049D83424D9DB82753D37CEA55459C41@taurus.voltaire.com> <4657373E.2030903@hp.com> <465BDC90.5080305@voltaire.com> <466702A8.5080302@hp.com> <4667B5FD.4070600@voltaire.com> Message-ID: <15ddcffd0706081420r79984701u4e385e28857cb68b@mail.gmail.com> On 6/7/07, Scott Weitzenkamp (sweitzen) wrote: > I don't know if I've said this in public, but I've stopped testing > ipoibtools HA as of OFED 1.2 rc2 and Cisco is only going to support > ib-bonding HA for our OFED 1.2 customers, as our testing has revealed > ib-bonding is more robust than ipoibtools. I know I said this to > Tziporet at Sonoma, and she seemed to agree we could eventually remove > ipoibtools from OFED. Scott, Thanks for the feedback, just to be clear, we also don't test the ipoibtools HA solution, and Voltaire will support only the ib-bonding solution for OFED 1.2 customers. Or. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at lists.openfabrics.org Sat Jun 9 02:41:47 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Sat, 9 Jun 2007 02:41:47 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070609-0200 daily build status Message-ID: <20070609094147.D985FE60844@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.15 Passed on x86_64 with linux-2.6.20 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.16 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.16 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.13 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.16 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From nidmillroadduv at millroad.net Sat Jun 9 07:51:35 2007 From: nidmillroadduv at millroad.net (Normand Driscoll) Date: Sat, 9 Jun 2007 13:51:35 -0100 Subject: [ofa-general] Can you imagine that you are healthy? Message-ID: <949272195.36337650156956@millroad.net> LegalRXMedications chemist's offers all medicinal preparations that you require in order to recover your health with a little price. We operate around the world with clients from Europe, America, and Asia. At present you don't have to look for drug shop somewhere at your area. We necessarily transfer high quality medsworldwide. Come to our site and gain medicinal agents you instantly require direct to your abode. http://forestmeat.hk/ WeÂ’re verified by VISA & VeriSign thus we provide effective & reliable acquisition. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at dev.mellanox.co.il Sat Jun 9 21:42:00 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 10 Jun 2007 07:42:00 +0300 Subject: [ofa-general] Re: IPOIB CM (NOSRQ) extension In-Reply-To: <46687642.8040208@linux.vnet.ibm.com> References: <46687642.8040208@linux.vnet.ibm.com> Message-ID: <20070610044146.GA4959@mellanox.co.il> > Quoting Pradeep Satyanarayana : > Subject: IPOIB CM (NOSRQ) extension > > This patch handles the corner case of running out of RC QPs. In that > case it switches to UD mode. This patch can be used both by NOSRQ and > SRQ code. > > Signed-off-by: Pradeep Satyanarayana You don't provide any way to retry going back to connected mode, after a failure, which is really intermittent by nature. That's pretty bad. > --- > > --- c/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c > 2007-06-07 11:13:55.000000000 -0400 > +++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c > 2007-06-07 11:11:21.000000000 -0400 > @@ -1383,6 +1383,11 @@ static int ipoib_cm_tx_handler(struct ib > break; > case IB_CM_REQ_ERROR: > case IB_CM_REJ_RECEIVED: > + ipoib_warn(priv, "REJ received\n"); > + neigh = tx->neigh; > + if (neigh) > + clear_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags); > + break; > case IB_CM_TIMEWAIT_EXIT: > ipoib_dbg(priv, "CM error %d.\n", event->event); > spin_lock_irq(&priv->tx_lock); This has an effect of dropping down to datagram mode on errors such as CM timeout, or a reject due to stale connection. I think this is a wrong thing to do. > --- c/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_main.c > 2007-05-30 14:56:25.000000000 -0400 > +++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_main.c > 2007-06-06 18:28:06.000000000 -0400 > @@ -679,11 +679,10 @@ static int ipoib_start_xmit(struct sk_bu > > neigh = *to_ipoib_neigh(skb->dst->neighbour); > > - if (ipoib_cm_get(neigh)) { > - if (ipoib_cm_up(neigh)) { > + if (ipoib_cm_get(neigh) && ipoib_cm_up(neigh) && > + test_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags)) { > ipoib_cm_send(dev, skb, ipoib_cm_get(neigh)); > goto out; > - } > } else if (neigh->ah) { > if (unlikely(memcmp(&neigh->dgid.raw, > skb->dst->neighbour->ha + 4, This adds overhead on xmit datapath (and it's atomics!), which doesn't make me happy at all. -- MST From mst at dev.mellanox.co.il Sat Jun 9 21:49:45 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 10 Jun 2007 07:49:45 +0300 Subject: [ofa-general] Re: IPOIB CM (NOSRQ)[PATCH V5] patch In-Reply-To: <46687636.5050101@linux.vnet.ibm.com> References: <46687636.5050101@linux.vnet.ibm.com> Message-ID: <20070610044945.GB4959@mellanox.co.il> > Quoting Pradeep Satyanarayana : > Subject: IPOIB CM (NOSRQ)[PATCH V5] patch > > Here is a fifth version of the IPOIB_CM_NOSRQ patch. This patch will > benefit adapters that do not support shared receive queues. > > This patch incorporates the following review comments and subsequent > discussions on this mailing list from v4: > > 1. Reduce the number of if(srq) tests in the packet receive path I could still count at least 2 of these, and I don't see why there can't be just 1, or even 0 if the QP pool is hidden under the SRQ interface. > +int current_rc_qp = 0; /* Active RC QPs for NOSRQ */ > #define IPOIB_CM_IETF_ID 0x1000000000000000ULL > > #define IPOIB_CM_RX_UPDATE_TIME (256 * HZ) I don't see any locking for current_rc_qp, which looks wrong. -- MST From mst at dev.mellanox.co.il Sat Jun 9 23:37:13 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 10 Jun 2007 09:37:13 +0300 Subject: [ofa-general] patch for OFED 1.2 Message-ID: <20070610063713.GB8249@mellanox.co.il> Sean, the following commit commit bf2944bd56c7a48cc3962a860dbc4ceee6b1ace8 Author: Sean Hefty Date: Tue Jun 5 09:57:31 2007 -0700 RDMA/cma: Fix initialization of next_port next_port should be between sysctl_local_port_range[0] and [1]. However, it is initially set to a random value with get_random_bytes(). If the value is negative when treated as a signed integer, next_port can end up outside the expected range because of the result of the % operator being negative. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier looks like something we want included in OFED 1.2 is well. What do you think? -- MST From dotanb at dev.mellanox.co.il Sun Jun 10 00:44:29 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Sun, 10 Jun 2007 10:44:29 +0300 Subject: [ofa-general] Having trouble pingpong between two nodes. In-Reply-To: References: Message-ID: <466BABDD.20808@dev.mellanox.co.il> Jeffrey Wong wrote: > > Hello, > > I am trying to run a ibv_ud_pingpong between two nodes but I can’t > seem to get them to communicate. I have used the ping command between > the ib interfaces and that works fine, but when I try to use the > ibv_ud_ping pong it says the following: > > ________________________________________________________________________________ > > root at centos5:node1 ~]# ibv_ud_pingpong 193.168.10.254 > > local address: LID 0x0002, QPN 0x0f0406, PSN 0xb067dc > > Couldn't connect to 193.168.10.254:18515 > > ____________________________________________________________________________ > This is trivial, but did you execute ibv_ud_pingpong as the server in 193.168.10.254? (because you give any test parameters to the client, it should be executed only with: ibv_ud_pingpong). Dotan From vlad at lists.openfabrics.org Sun Jun 10 02:40:53 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Sun, 10 Jun 2007 02:40:53 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070610-0200 daily build status Message-ID: <20070610094053.835E3E60831@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on powerpc with linux-2.6.18 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.14 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.12 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Failed: From eli at mellanox.co.il Sun Jun 10 04:00:33 2007 From: eli at mellanox.co.il (Eli Cohen) Date: Sun, 10 Jun 2007 14:00:33 +0300 Subject: [ofa-general] Re: [PATCH 2/2] IB/mlx4_ib: fix SRQ buffer allocation In-Reply-To: References: <1181133679.10841.66.camel@mtls03> Message-ID: <1181473233.11593.7.camel@mtls03> On Thu, 2007-06-07 at 11:59 -0700, Roland Dreier wrote: > Thanks... I reworked this a lot and right now I plan to push the > following (although I'm still testing): > > > static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, > - struct mlx4_ib_qp *qp) > + int is_user, int has_srq, struct mlx4_ib_qp *qp) > { > /* Sanity check RQ size before proceeding */ > if (cap->max_recv_wr > dev->dev->caps.max_wqes || > cap->max_recv_sge > dev->dev->caps.max_rq_sg) > return -EINVAL; > > - qp->rq.max = cap->max_recv_wr ? roundup_pow_of_two(cap->max_recv_wr) : 0; > + if (has_srq) { > + /* QPs attached to an SRQ should have no RQ */ > + if (cap->max_recv_wr) > + return -EINVAL; > > - qp->rq.wqe_shift = ilog2(roundup_pow_of_two(cap->max_recv_sge * > - sizeof (struct mlx4_wqe_data_seg))); > - qp->rq.max_gs = (1 << qp->rq.wqe_shift) / sizeof (struct mlx4_wqe_data_seg); > + qp->rq.max = qp->rq.max_gs = 0; > + } else { > + /* HW requires >= 1 RQ entry with >= 1 gather entry */ > + if (is_user && (!cap->max_recv_wr || !cap->max_recv_sge)) > + return -EINVAL; I think we may have a problem here: if a user, not being aware of the HW requirement of none zero length receive queue, creates a QP with zero in cap->max_recv_sge, the above kernel code will cause a failure since libmlx4 does not fix the value in this field. So I think this should be taken care of in libmlx4. Moreover, I see you did not take the following: @@ -302,6 +315,10 @@ static int set_kernel_sq_size(struct mlx static int set_user_sq_size(struct mlx4_ib_qp *qp, struct mlx4_ib_create_qp *ucmd) { + /* Sanity check for SQ size */ + if (ucmd->log_sq_bb_count > 15 || ucmd->log_sq_stride > 11) + return -EINVAL; + Shouldn't we use a condition like this to prevent misconfiguration of the QP if libmlx4 passes improper values? From rdreier at cisco.com Sun Jun 10 08:52:06 2007 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 10 Jun 2007 08:52:06 -0700 Subject: [ofa-general] Re: [PATCH 2/2] IB/mlx4_ib: fix SRQ buffer allocation In-Reply-To: <1181473233.11593.7.camel@mtls03> (Eli Cohen's message of "Sun, 10 Jun 2007 14:00:33 +0300") References: <1181133679.10841.66.camel@mtls03> <1181473233.11593.7.camel@mtls03> Message-ID: > > + /* HW requires >= 1 RQ entry with >= 1 gather entry */ > > + if (is_user && (!cap->max_recv_wr || !cap->max_recv_sge)) > > + return -EINVAL; > > I think we may have a problem here: if a user, not being aware of the HW > requirement of none zero length receive queue, creates a QP with zero in > cap->max_recv_sge, the above kernel code will cause a failure since > libmlx4 does not fix the value in this field. So I think this should be > taken care of in libmlx4. OK, I'll add something to make sure max_recv_sge >= 1 to libmlx4. > + /* Sanity check for SQ size */ > + if (ucmd->log_sq_bb_count > 15 || ucmd->log_sq_stride > 11) > + return -EINVAL; > + > > Shouldn't we use a condition like this to prevent misconfiguration of > the QP if libmlx4 passes improper values? Yeah, I guess so. I dropped that chunk because I didn't like the hard-coded and unexplained values, but I left the checking on my to do list. Thanks... From rdreier at cisco.com Sun Jun 10 08:54:19 2007 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 10 Jun 2007 08:54:19 -0700 Subject: [ofa-general] Re: IPOIB CM (NOSRQ) extension In-Reply-To: <20070610044146.GA4959@mellanox.co.il> (Michael S. Tsirkin's message of "Sun, 10 Jun 2007 07:42:00 +0300") References: <46687642.8040208@linux.vnet.ibm.com> <20070610044146.GA4959@mellanox.co.il> Message-ID: > > - if (ipoib_cm_get(neigh)) { > > - if (ipoib_cm_up(neigh)) { > > + if (ipoib_cm_get(neigh) && ipoib_cm_up(neigh) && > > + test_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags)) { > > This adds overhead on xmit datapath (and it's atomics!), > which doesn't make me happy at all. I don't see anything atomic here. But if (ipoib_cm_get(neigh)) { if (ipoib_cm_up(neigh)) { .... } } else... is different from if (ipoib_cm_get(neigh) && if (ipoib_cm_up(neigh)) { .... } else.. so there is a change in semantics here... From rdreier at cisco.com Sun Jun 10 08:57:02 2007 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 10 Jun 2007 08:57:02 -0700 Subject: [ofa-general] IPOIB CM (NOSRQ)[PATCH V5] patch In-Reply-To: <46687636.5050101@linux.vnet.ibm.com> (Pradeep Satyanarayana's message of "Thu, 07 Jun 2007 14:18:46 -0700") References: <46687636.5050101@linux.vnet.ibm.com> Message-ID: Haven't read very far, but... > +#define SIXTY_FOUR_K (1ul << 16) > +#define MEGA_BYTE (1ul << 20) this is really horrible. There's no point in this type of defines -- a constant should have a name that describes what it's *for*, not what the value is. The code above is pretty close to #define SIXTY_FOUR 64 and I hope it's obvious why that's pointless. And also > + ipoib_warn(priv, "NOSRQ has reached the configurable limit " > + "of either %d RC QPs or, max recv buf size of " > + "0x%lx MB\n", max_rc_qp, max_recv_buf * MEGA_BYTE); this is buggy -- you print the value as being in MB but then also multiply by MEGA_BYTE before printing it. From sagis at voltaire.com Sun Jun 10 08:59:48 2007 From: sagis at voltaire.com (Sagi Schlanger) Date: Sun, 10 Jun 2007 18:59:48 +0300 Subject: [ofa-general] OpenSM Up-Down algorithm Message-ID: <39C75744D164D948A170E9792AF8E7CA0D2914@exil.voltaire.com> Hi, I'm looking for some answers on Up-Down routing at OpenSM . Is anybody familiar with a utility/procedure to find credit loops given a topology and routing settings? Is there a handy spec describing the OpenSM Up-Down algorithm? What is the scheme through which roots are defined on clos and non clos/fat tree topologies? Is this algorithm always credit loop free? How efficient is using this algorithm on non clos/fat tree topologies? Thanks for your cooperation, Sagi ____________________________________________________________ Sagi Schlanger | +972-9-9717651 (o) | +972-52-2385154 (m) Software Engineer, IB Switch Voltaire - The Grid Backbone www.voltaire.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Sun Jun 10 10:28:25 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Sun, 10 Jun 2007 10:28:25 -0700 Subject: [ofa-general] RE: patch for OFED 1.2 In-Reply-To: <20070610063713.GB8249@mellanox.co.il> Message-ID: <000401c7ab84$c0fbf900$eacc180a@amr.corp.intel.com> >looks like something we want included in OFED 1.2 is well. >What do you think? This should have been pulled in for OFED. - Sean From 2asakim5 at netvision.net.il Sun Jun 10 07:43:13 2007 From: 2asakim5 at netvision.net.il (=?windows-1255?Q?=F2=F1=F7=E9=ED?=) Date: Sun, 10 Jun 2007 17:43:13 +0300 Subject: [ofa-general] =?windows-1255?b?7O7kIOD65CDs4CDu9uzp5yDs7uvl+D8=?= Message-ID: <132c1ac2ad955dc34139445300184039@017.net.il> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 1620 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 8876 bytes Desc: not available URL: From sashak at voltaire.com Sun Jun 10 15:31:59 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 11 Jun 2007 01:31:59 +0300 Subject: [ofa-general] [PATCH] opensm: remove unused state_step_mode Message-ID: <20070610223159.GB23029@sashak.voltaire.com> This removes unused state_step_mode and associated flow from osm_state_mgr_process(). Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_base.h | 29 ------------ opensm/include/opensm/osm_state_mgr.h | 2 - opensm/opensm/osm_state_mgr.c | 81 +++------------------------------ 3 files changed, 6 insertions(+), 106 deletions(-) diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h index ee280d3..6bdea24 100644 --- a/opensm/include/opensm/osm_base.h +++ b/opensm/include/opensm/osm_base.h @@ -768,35 +768,6 @@ typedef enum _osm_sm_state typedef uintn_t osm_signal_t; /***********/ -/****d* OpenSM: Base/osm_state_mgr_mode_t -* NAME -* osm_state_mgr_mode_t -* -* DESCRIPTION -* Enumerates the possible state progressing codes used by the OSM -* state manager. -* -* SYNOPSIS -*/ -typedef enum _osm_state_mgr_mode -{ - OSM_STATE_STEP_CONTINUOUS = 0, - OSM_STATE_STEP_TAKE_ONE, - OSM_STATE_STEP_BREAK -} osm_state_mgr_mode_t; -/* -* OSM_STATE_STEP_CONTINUOUS -* normal automatic progress mode -* -* OSM_STATE_STEP_TAKE_ONE -* Do one step -* -* OSM_STATE_STEP_BREAK -* Stop before taking next step (the while loop in the state -* manager automatically change to this state). -* -**********/ - /****d* OpenSM: Base/osm_sm_signal_t * NAME * osm_sm_signal_t diff --git a/opensm/include/opensm/osm_state_mgr.h b/opensm/include/opensm/osm_state_mgr.h index 427b156..6975d18 100644 --- a/opensm/include/opensm/osm_state_mgr.h +++ b/opensm/include/opensm/osm_state_mgr.h @@ -118,8 +118,6 @@ typedef struct _osm_state_mgr cl_plock_t *p_lock; cl_event_t *p_subnet_up_event; osm_sm_state_t state; - osm_state_mgr_mode_t state_step_mode; - osm_signal_t next_stage_signal; } osm_state_mgr_t; /* * FIELDS diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index bcf68f2..893a423 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -153,8 +153,6 @@ osm_state_mgr_init( p_mgr->state = OSM_SM_STATE_IDLE; p_mgr->p_lock = p_lock; p_mgr->p_subnet_up_event = p_subnet_up_event; - p_mgr->state_step_mode = OSM_STATE_STEP_CONTINUOUS; - p_mgr->next_stage_signal = OSM_SIGNAL_NONE; status = cl_spinlock_init( &p_mgr->state_lock ); if( status != CL_SUCCESS ) @@ -2332,21 +2330,8 @@ Idle: { case OSM_SIGNAL_NO_PENDING_TRANSACTIONS: case OSM_SIGNAL_DONE: - /* If we run single step we have already done this */ - if( p_mgr->state_step_mode != OSM_STATE_STEP_TAKE_ONE ) - { - __osm_state_mgr_set_sm_lid_done_msg( p_mgr ); - __osm_state_mgr_notify_lid_change( p_mgr ); - } - - /* Break on single step mode - if not continuous */ - if( p_mgr->state_step_mode == OSM_STATE_STEP_BREAK ) - { - p_mgr->next_stage_signal = signal; - signal = OSM_SIGNAL_NONE; - break; - } - + __osm_state_mgr_set_sm_lid_done_msg( p_mgr ); + __osm_state_mgr_notify_lid_change( p_mgr ); p_mgr->state = OSM_SM_STATE_SET_SUBNET_UCAST_LIDS; signal = osm_lid_mgr_process_subnet( p_mgr->p_lid_mgr ); break; @@ -2422,17 +2407,7 @@ Idle: * their destination. */ __osm_state_mgr_check_tbl_consistency( p_mgr ); - /* If we run single step we have already done this */ - if( p_mgr->state_step_mode != OSM_STATE_STEP_TAKE_ONE ) - __osm_state_mgr_lid_assign_msg( p_mgr ); - - /* Break on single step mode - just before taking next step */ - if( p_mgr->state_step_mode == OSM_STATE_STEP_BREAK ) - { - p_mgr->next_stage_signal = signal; - signal = OSM_SIGNAL_NONE; - break; - } + __osm_state_mgr_lid_assign_msg( p_mgr ); /* * OK, the wire is clear, so proceed with @@ -2444,12 +2419,6 @@ Idle: p_mgr->state = OSM_SM_STATE_SET_UCAST_TABLES; signal = osm_ucast_mgr_process( p_mgr->p_ucast_mgr ); - /* Break on single step mode */ - if( p_mgr->state_step_mode != OSM_STATE_STEP_CONTINUOUS ) - { - p_mgr->next_stage_signal = signal; - signal = OSM_SIGNAL_NONE; - } break; default: @@ -2507,17 +2476,7 @@ Idle: * take into account these lfts. */ p_mgr->p_subn->ignore_existing_lfts = FALSE; - /* If we run single step we have already done this */ - if( p_mgr->state_step_mode != OSM_STATE_STEP_TAKE_ONE ) - __osm_state_mgr_switch_config_msg( p_mgr ); - - /* Break on single step mode - just before taking next step */ - if( p_mgr->state_step_mode == OSM_STATE_STEP_BREAK ) - { - p_mgr->next_stage_signal = signal; - signal = OSM_SIGNAL_NONE; - break; - } + __osm_state_mgr_switch_config_msg( p_mgr ); if( !p_mgr->p_subn->opt.disable_multicast ) { @@ -2582,17 +2541,7 @@ Idle: { case OSM_SIGNAL_NO_PENDING_TRANSACTIONS: case OSM_SIGNAL_DONE: - /* If we run single step we have already done this */ - if( p_mgr->state_step_mode != OSM_STATE_STEP_TAKE_ONE ) - __osm_state_mgr_multicast_config_msg( p_mgr ); - - /* Break on single step mode - just before taking next step */ - if( p_mgr->state_step_mode == OSM_STATE_STEP_BREAK ) - { - p_mgr->next_stage_signal = signal; - signal = OSM_SIGNAL_NONE; - break; - } + __osm_state_mgr_multicast_config_msg( p_mgr ); p_mgr->state = OSM_SM_STATE_SET_LINK_PORTS; signal = osm_link_mgr_process( p_mgr->p_link_mgr, @@ -2714,17 +2663,7 @@ Idle: case OSM_SIGNAL_NO_PENDING_TRANSACTIONS: case OSM_SIGNAL_DONE: - /* If we run single step we have already done this */ - if( p_mgr->state_step_mode != OSM_STATE_STEP_TAKE_ONE ) - __osm_state_mgr_links_armed_msg( p_mgr ); - - /* Break on single step mode - just before taking next step */ - if( p_mgr->state_step_mode == OSM_STATE_STEP_BREAK ) - { - p_mgr->next_stage_signal = signal; - signal = OSM_SIGNAL_NONE; - break; - } + __osm_state_mgr_links_armed_msg( p_mgr ); p_mgr->state = OSM_SM_STATE_SET_ACTIVE; signal = osm_link_mgr_process( p_mgr->p_link_mgr, @@ -2925,14 +2864,6 @@ Idle: signal = OSM_SIGNAL_SWEEP; } - /* - * for single step mode - some stages need to break only - * after evaluating a single step. - * For those we track the fact we have already performed - * a single loop - */ - if( p_mgr->state_step_mode == OSM_STATE_STEP_TAKE_ONE ) - p_mgr->state_step_mode = OSM_STATE_STEP_BREAK; } cl_spinlock_release( &p_mgr->state_lock ); -- 1.5.2.1.137.g426c From sashak at voltaire.com Sun Jun 10 15:33:01 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 11 Jun 2007 01:33:01 +0300 Subject: [ofa-general] [PATCH] opensm: clean unused OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED Message-ID: <20070610223301.GC23029@sashak.voltaire.com> This removes unused OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED sm signal enum value. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_base.h | 1 - opensm/opensm/osm_helper.c | 7 +++---- opensm/opensm/osm_sm_state_mgr.c | 8 -------- 3 files changed, 3 insertions(+), 13 deletions(-) diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h index 6bdea24..9a50d7d 100644 --- a/opensm/include/opensm/osm_base.h +++ b/opensm/include/opensm/osm_base.h @@ -788,7 +788,6 @@ typedef enum _osm_sm_signal OSM_SM_SIGNAL_HANDOVER_SENT, OSM_SM_SIGNAL_ACKNOWLEDGE, OSM_SM_SIGNAL_STANDBY, - OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED, OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED_DONE, OSM_SM_SIGNAL_WAIT_FOR_HANDOVER, OSM_SM_SIGNAL_MAX diff --git a/opensm/opensm/osm_helper.c b/opensm/opensm/osm_helper.c index 3745b55..724ecdf 100644 --- a/opensm/opensm/osm_helper.c +++ b/opensm/opensm/osm_helper.c @@ -2501,10 +2501,9 @@ const char* const __osm_sm_mgr_signal_str[] = "OSM_SM_SIGNAL_HANDOVER_SENT", /* 7 */ "OSM_SM_SIGNAL_ACKNOWLEDGE", /* 8 */ "OSM_SM_SIGNAL_STANDBY", /* 9 */ - "OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED", /* 10 */ - "OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED_DONE", /* 11 */ - "OSM_SM_SIGNAL_WAIT_FOR_HANDOVER", /* 12 */ - "UNKNOWN STATE!!" /* 13 */ + "OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED_DONE", /* 10 */ + "OSM_SM_SIGNAL_WAIT_FOR_HANDOVER", /* 11 */ + "UNKNOWN STATE!!" /* 12 */ }; diff --git a/opensm/opensm/osm_sm_state_mgr.c b/opensm/opensm/osm_sm_state_mgr.c index 07c2af3..ccfb8b0 100644 --- a/opensm/opensm/osm_sm_state_mgr.c +++ b/opensm/opensm/osm_sm_state_mgr.c @@ -575,13 +575,6 @@ osm_sm_state_mgr_process( */ p_sm_mgr->p_subn->master_sm_base_lid = p_sm_mgr->p_subn->sm_base_lid; break; - case OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED: - /* - * Stop the discovering - */ - osm_state_mgr_process( p_sm_mgr->p_state_mgr, - OSM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED ); - break; case OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED_DONE: /* * Finished all discovery actions - move to STANDBY @@ -813,7 +806,6 @@ osm_sm_state_mgr_check_legality( switch ( signal ) { case OSM_SM_SIGNAL_DISCOVERY_COMPLETED: - case OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED: case OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED_DONE: case OSM_SM_SIGNAL_HANDOVER: status = IB_SUCCESS; -- 1.5.2.1.137.g426c From jwong at datallegro.com Sun Jun 10 18:03:23 2007 From: jwong at datallegro.com (Jeffrey Wong) Date: Sun, 10 Jun 2007 21:03:23 -0400 Subject: [ofa-general] Having trouble pingpong between two nodes. References: <466BABDD.20808@dev.mellanox.co.il> Message-ID: Well now it seems to be working after not doing anything at all. Thanks though for the info. Jeff -----Original Message----- From: Dotan Barak [mailto:dotanb at dev.mellanox.co.il] Sent: Sun 6/10/2007 3:44 AM To: Jeffrey Wong Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] Having trouble pingpong between two nodes. Jeffrey Wong wrote: > > Hello, > > I am trying to run a ibv_ud_pingpong between two nodes but I can't > seem to get them to communicate. I have used the ping command between > the ib interfaces and that works fine, but when I try to use the > ibv_ud_ping pong it says the following: > > ________________________________________________________________________________ > > root at centos5:node1 ~]# ibv_ud_pingpong 193.168.10.254 > > local address: LID 0x0002, QPN 0x0f0406, PSN 0xb067dc > > Couldn't connect to 193.168.10.254:18515 > > ____________________________________________________________________________ > This is trivial, but did you execute ibv_ud_pingpong as the server in 193.168.10.254? (because you give any test parameters to the client, it should be executed only with: ibv_ud_pingpong). Dotan -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at dev.mellanox.co.il Sun Jun 10 22:56:10 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 11 Jun 2007 08:56:10 +0300 Subject: [ofa-general] [PATCH trivial] IB/mlx4: include linux/mutex.h from mlx4_ib.h Message-ID: <20070611055610.GC1454@mellanox.co.il> mlx4_ib.h uses struct mutex, so while linux/mutex.h seems to be pulled in indirectly by one of the headers it includes, the right thing is to include linux/mutex.h directly. Signed-off-by: Michael S. Tsirkin --- Roland, can you queue this for upstream pls? Index: ofed_kernel/drivers/infiniband/hw/mlx4/mlx4_ib.h =================================================================== --- ofed_kernel.orig/drivers/infiniband/hw/mlx4/mlx4_ib.h 2007-05-29 14:50:06.000000000 +0300 +++ ofed_kernel/drivers/infiniband/hw/mlx4/mlx4_ib.h 2007-06-11 08:49:13.000000000 +0300 @@ -35,6 +35,7 @@ #include #include +#include #include #include -- MST From mst at dev.mellanox.co.il Sun Jun 10 23:09:42 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 11 Jun 2007 09:09:42 +0300 Subject: [ofa-general] [PATCH] net/mlx4: include linux/mutex.h from mlx4.h Message-ID: <20070611060942.GE1454@mellanox.co.il> mlx4.h uses struct mutex, so while linux/mutex.h seems to be pulled in indirectly by one of the headers it includes, the right thing to do is to include linux/mutex.h directly. Signed-off-by: Michael S. Tsirkin --- Roland, can you queue this for upstream pls? Index: ofed_kernel/drivers/net/mlx4/mlx4.h =================================================================== --- ofed_kernel.orig/drivers/net/mlx4/mlx4.h 2007-05-29 14:50:26.000000000 +0300 +++ ofed_kernel/drivers/net/mlx4/mlx4.h 2007-06-11 09:07:36.000000000 +0300 @@ -37,6 +37,7 @@ #ifndef MLX4_H #define MLX4_H +#include #include #include -- MST From eli at mellanox.co.il Mon Jun 11 02:26:59 2007 From: eli at mellanox.co.il (Eli Cohen) Date: Mon, 11 Jun 2007 12:26:59 +0300 Subject: [ofa-general] [PATCH} libmlx4: poll cq tail pointer Message-ID: <1181554019.12020.3.camel@mtls03> cast to uint16_t is required before assigning. Consider the following example: wqe_index = 0, wq->tail = 0x1ffff. You'd expect wq->tail to be 0x20000 but it will actually be 0x10000. The reason for this is that compiler upcasts the result of wqe_index - (uint16_t) wq->tail to unsigned which yields a large number and when added to the original value of tail it overflows and actually becomes 0x10000. Signed-off-by: Eli Cohen --- diff --git a/src/cq.c b/src/cq.c index c4a3ca4..7597a5a 100644 --- a/src/cq.c +++ b/src/cq.c @@ -238,7 +238,7 @@ static int mlx4_poll_one(struct mlx4_cq *cq, if (is_send) { wq = &(*cur_qp)->sq; wqe_index = ntohs(cqe->wqe_index); - wq->tail += wqe_index - (uint16_t) wq->tail; + wq->tail += (uint16_t)(wqe_index - (uint16_t) wq->tail); wc->wr_id = wq->wrid[wq->tail & (wq->max - 1)]; ++wq->tail; } else if ((*cur_qp)->ibv_qp.srq) { From kliteyn at dev.mellanox.co.il Mon Jun 11 02:33:08 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 11 Jun 2007 12:33:08 +0300 Subject: [ofa-general] [PATCH] osm: reading guids file in ucast_mgr Message-ID: <466D16D4.8000605@dev.mellanox.co.il> Hi Hal, This patch removes a code that was reading root guids file in osm_ucast_updn.c and replaces it with a more general function in osm_ucast_mgr.c This function will also be used by fat-tree routing. -- Yevgeny Signed-off-by: Yevgeny Kliteynik >From a8d32db1beacf6b42240357ab3e71584daadc791 Mon Sep 17 00:00:00 2001 From: Yevgeny Kliteynik Date: Mon, 11 Jun 2007 12:24:12 +0300 Subject: [PATCH 1/1] DELETE: make read_guid_file global no changes added to commit (use "git add" and/or "git commit -a") Signed-off-by: Yevgeny Kliteynik --- opensm/include/opensm/osm_base.h | 8 ++-- opensm/include/opensm/osm_ucast_mgr.h | 36 ++++++++++++++++ opensm/opensm/osm_ucast_mgr.c | 74 +++++++++++++++++++++++++++++++++ opensm/opensm/osm_ucast_updn.c | 48 +++------------------ 4 files changed, 120 insertions(+), 46 deletions(-) diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h index ee280d3..7f043a0 100644 --- a/opensm/include/opensm/osm_base.h +++ b/opensm/include/opensm/osm_base.h @@ -844,16 +844,16 @@ typedef enum _osm_mcast_req_type } osm_mcast_req_type_t; /***********/ -/****s* OpenSM: Base/MAX_UPDN_GUID_FILE_LINE_LENGTH +/****s* OpenSM: Base/MAX_GUID_FILE_LINE_LENGTH * NAME -* MAX_UPDN_GUID_FILE_LINE_LENGTH +* MAX_GUID_FILE_LINE_LENGTH * * DESCRIPTION -* The maximum line number when reading updn guid file +* The maximum line number when reading guid file * * SYNOPSIS */ -#define MAX_UPDN_GUID_FILE_LINE_LENGTH 120 +#define MAX_GUID_FILE_LINE_LENGTH 120 /**********/ /****s* OpenSM: Base/VendorOUIs diff --git a/opensm/include/opensm/osm_ucast_mgr.h b/opensm/include/opensm/osm_ucast_mgr.h index 39bf45a..e003f31 100644 --- a/opensm/include/opensm/osm_ucast_mgr.h +++ b/opensm/include/opensm/osm_ucast_mgr.h @@ -293,6 +293,42 @@ osm_ucast_mgr_build_lid_matrices( * Unicast Manager *********/ +/****f* OpenSM: Unicast Manager/osm_ucast_mgr_read_guid_file +* NAME +* osm_ucast_mgr_read_guid_file +* +* DESCRIPTION +* Read guid list from file. +* +* SYNOPSIS +*/ +cl_status_t +osm_ucast_mgr_read_guid_file( + IN osm_ucast_mgr_t * const p_mgr, + IN const char * guid_file_name, + IN cl_list_t * p_list ); +/* +* PARAMETERS +* p_mgr +* [in] Pointer to an osm_ucast_mgr_t object. +* +* guid_file_name +* [in] Name of the file to read. +* +* p_list +* [in] Pointer to the list that will be filled with guids. +* +* RETURN VALUES +* IB_SUCCESS if the file was read successfully. +* +* NOTES +* This function reads guids from a file and inserts them +* into a list. +* +* SEE ALSO +* Unicast Manager +*********/ + /****f* OpenSM: Unicast Manager/osm_ucast_mgr_process * NAME * osm_ucast_mgr_process diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c index 9f40242..5182718 100644 --- a/opensm/opensm/osm_ucast_mgr.c +++ b/opensm/opensm/osm_ucast_mgr.c @@ -1044,6 +1044,80 @@ ucast_mgr_setup_all_switches(osm_subn_t *p_subn) /********************************************************************** **********************************************************************/ + +cl_status_t +osm_ucast_mgr_read_guid_file( + IN osm_ucast_mgr_t * const p_mgr, + IN const char * guid_file_name, + IN cl_list_t * p_list ) +{ + cl_status_t status = IB_SUCCESS; + FILE * guid_file; + char line[MAX_GUID_FILE_LINE_LENGTH]; + char * endptr; + uint64_t * p_guid; + + OSM_LOG_ENTER(p_mgr->p_log, osm_ucast_mgr_read_guid_file); + + guid_file = fopen(guid_file_name, "r"); + if (guid_file == NULL) + { + osm_log( p_mgr->p_log, OSM_LOG_ERROR, + "osm_ucast_mgr_read_guid_file: ERR 3A13: " + "Failed to open guid list file (%s)\n", + guid_file_name ); + status = IB_NOT_FOUND; + goto Exit; + } + + while ( fgets(line, MAX_GUID_FILE_LINE_LENGTH, guid_file) ) + { + if (strcspn(line, " ,;.") != strlen(line)) + { + osm_log( p_mgr->p_log, OSM_LOG_ERROR, + "osm_ucast_mgr_read_guid_file: ERR 3A14: " + "Bad formatted guid in file (%s): %s\n", + guid_file_name, line ); + status = IB_NOT_FOUND; + break; + } + + /* Skip empty lines anywhere in the file - only one + char means the null termination */ + if (strlen(line) <= 1) + continue; + + p_guid = malloc(sizeof(uint64_t)); + if (!p_guid) + { + status = IB_ERROR; + goto Exit; + } + + *p_guid = strtoull(line, &endptr, 16); + + /* check that the string is a number */ + if (!(*p_guid) && (*endptr != '\0')) + { + osm_log( p_mgr->p_log, OSM_LOG_ERROR, + "osm_ucast_mgr_read_guid_file: ERR 3A15: " + "Bad formatted guid in file (%s): %s\n", + guid_file_name, line ); + status = IB_NOT_FOUND; + break; + } + + /* store the parsed guid */ + cl_list_insert_tail(p_list, p_guid); + } + +Exit : + OSM_LOG_EXIT( p_mgr->p_log ); + return (status); +} + +/********************************************************************** + **********************************************************************/ osm_signal_t osm_ucast_mgr_process( IN osm_ucast_mgr_t* const p_mgr ) diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c index 95a0622..23a9db5 100644 --- a/opensm/opensm/osm_ucast_updn.c +++ b/opensm/opensm/osm_ucast_updn.c @@ -53,6 +53,7 @@ #include #include #include +#include /* //////////////////////////// */ /* Local types */ @@ -303,9 +304,6 @@ updn_init( IN osm_opensm_t *p_osm ) { cl_list_t * p_list; - FILE* p_updn_guid_file; - char line[MAX_UPDN_GUID_FILE_LINE_LENGTH]; - uint64_t * p_tmp; cl_list_iterator_t guid_iterator; ib_api_status_t status = IB_SUCCESS; @@ -332,45 +330,11 @@ updn_init( */ if (p_osm->subn.opt.updn_guid_file) { - /* Now parse guid from file */ - p_updn_guid_file = fopen(p_osm->subn.opt.updn_guid_file, "r"); - if (p_updn_guid_file == NULL) - { - osm_log( &p_osm->log, OSM_LOG_ERROR, - "updn_init: ERR AA02: " - "Failed to open guid list file (%s)\n", - p_osm->subn.opt.updn_guid_file ); - status = IB_NOT_FOUND; - goto Exit; - } - - while ( fgets(line, MAX_UPDN_GUID_FILE_LINE_LENGTH, p_updn_guid_file) ) - { - if (strcspn(line, " ,;.") == strlen(line)) - { - /* Skip empty lines anywhere in the file - only one char means the Null termination */ - if (strlen(line) > 1) - { - p_tmp = malloc(sizeof(uint64_t)); - if (!p_tmp) - { - status = IB_ERROR; - goto Exit; - } - *p_tmp = strtoull(line, NULL, 16); - cl_list_insert_tail(p_updn->p_root_nodes, p_tmp); - } - } - else - { - osm_log( &p_osm->log, OSM_LOG_ERROR, - "updn_init: ERR AA03: " - "Bad formatted guid in file (%s): %s\n", - p_osm->subn.opt.updn_guid_file, line ); - status = IB_NOT_FOUND; - break; - } - } + status = osm_ucast_mgr_read_guid_file( &p_osm->sm.ucast_mgr, + p_osm->subn.opt.updn_guid_file, + p_updn->p_root_nodes ); + if (status != IB_SUCCESS) + goto Exit; /* For Debug Purposes ... */ osm_log( &p_osm->log, OSM_LOG_DEBUG, -- 1.5.1.4 From vlad at lists.openfabrics.org Mon Jun 11 02:43:49 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Mon, 11 Jun 2007 02:43:49 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070611-0200 daily build status Message-ID: <20070611094349.70A2FE6083E@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.15 Passed on x86_64 with linux-2.6.19 Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.17 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.17 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ppc64 with linux-2.6.13 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From mst at dev.mellanox.co.il Mon Jun 11 02:51:45 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 11 Jun 2007 12:51:45 +0300 Subject: [ofa-general] Re: [PATCH} libmlx4: poll cq tail pointer In-Reply-To: <1181554019.12020.3.camel@mtls03> References: <1181554019.12020.3.camel@mtls03> Message-ID: <20070611095145.GB13815@mellanox.co.il> > Quoting Eli Cohen : > Subject: [PATCH} libmlx4: poll cq tail pointer > > cast to uint16_t is required before assigning. > Consider the following example: > wqe_index = 0, wq->tail = 0x1ffff. You'd expect wq->tail to be 0x20000 > but it will actually be 0x10000. The reason for this is that compiler > upcasts the result of wqe_index - (uint16_t) wq->tail to unsigned which > yields a large number and when added to the original value of tail it > overflows and actually becomes 0x10000. > > Signed-off-by: Eli Cohen And a similiar patch would be needed for kernel, would it not? mthca does not seem to affected: it does all math on 32 bit integers. -- MST From halr at voltaire.com Mon Jun 11 03:28:15 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 11 Jun 2007 06:28:15 -0400 Subject: [ofa-general] OpenSM Up-Down algorithm In-Reply-To: <39C75744D164D948A170E9792AF8E7CA0D2914@exil.voltaire.com> References: <39C75744D164D948A170E9792AF8E7CA0D2914@exil.voltaire.com> Message-ID: <1181557691.8896.64610.camel@hal.voltaire.com> Hi Sagi, On Sun, 2007-06-10 at 11:59, Sagi Schlanger wrote: > Hi, > > I'm looking for some answers on Up-Down routing at OpenSM . > > Is anybody familiar with a utility/procedure to find credit loops > given a topology and routing settings? I know there was at least talk of ibdiagnet (in ibutils) checking this. Not sure if it is implemented (yet) or if it is routing algorithm independent. Eitan ? > Is there a handy spec describing the OpenSM Up-Down algorithm? The OpenSM up/down routing is based on the following paper: "Effective Strategy to Compute Forwarding Tables for InfiniBand Networks" Jose Carlos Sancho, Universidad Politécnica de Valencia Antonio Robles, Universidad Politécnica de Valencia Jose Duato, Universidad Politécnica de Valencia http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/icpp/2001/1257/00/1257toc.xml&DOI=10.1109/ICPP.2001.952046 > What is the scheme through which roots are defined on clos and non > clos/fat tree topologies? The admin can supply the roots via -a option when invoking OpenSM. Auto-detect root nodes - based on the CA hop length from any switch in the subnet, a statistical histogram is built for each switch (hop num vs number of occurrences). If the histogram reflects a specific column (higher than others) for a certain node, then it is marked as a root node. Since the algorithm is statistical, it may not find any root nodes. The list of the root nodes found by this auto-detect stage is used by the ranking process stage. Note 1: The user can override the node list manually. Note 2: If this stage cannot find any root nodes, and the user did not specify a guid list file, OpenSM defaults back to the Min Hop routing algorithm. > Is this algorithm always credit loop free? It's supposed to be. > How efficient is using this algorithm on non clos/fat tree topologies? What do you mean by efficiency ? Also, are you asking about pure fat tree or non pure fat tree (or both) ? -- Hal > Thanks for your cooperation, > Sagi > > ____________________________________________________________ > Sagi Schlanger | +972-9-9717651 (o) | +972-52-2385154 (m) > Software Engineer, IB Switch > Voltaire – The Grid Backbone > > www.voltaire.com > > > > ______________________________________________________________________ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From eli at mellanox.co.il Mon Jun 11 03:52:25 2007 From: eli at mellanox.co.il (Eli Cohen) Date: Mon, 11 Jun 2007 13:52:25 +0300 Subject: [ofa-general] Re: [PATCH} libmlx4: poll cq tail pointer In-Reply-To: <20070611095145.GB13815@mellanox.co.il> References: <1181554019.12020.3.camel@mtls03> <20070611095145.GB13815@mellanox.co.il> Message-ID: <1181559145.16174.0.camel@mtls03> On Mon, 2007-06-11 at 12:51 +0300, Michael S. Tsirkin wrote: > And a similiar patch would be needed for kernel, would it not? Yes, looks like. From kliteyn at dev.mellanox.co.il Mon Jun 11 04:02:23 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 11 Jun 2007 14:02:23 +0300 Subject: [ofa-general] [PATCHv2] osm: reading guids file in ucast_mgr Message-ID: <466D2BBF.60406@dev.mellanox.co.il> Hi Hal, | [V2] Nothing was changed in the patch, but the previous | mail had some garbage in the explanation text. This patch removes a code that was reading root guids file in osm_ucast_updn.c and replaces it with a more general function in osm_ucast_mgr.c This function will also be used by fat-tree routing. -- Yevgeny Signed-off-by: Yevgeny Kliteynik --- opensm/include/opensm/osm_base.h | 8 ++-- opensm/include/opensm/osm_ucast_mgr.h | 36 ++++++++++++++++ opensm/opensm/osm_ucast_mgr.c | 74 +++++++++++++++++++++++++++++++++ opensm/opensm/osm_ucast_updn.c | 48 +++------------------ 4 files changed, 120 insertions(+), 46 deletions(-) diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h index ee280d3..7f043a0 100644 --- a/opensm/include/opensm/osm_base.h +++ b/opensm/include/opensm/osm_base.h @@ -844,16 +844,16 @@ typedef enum _osm_mcast_req_type } osm_mcast_req_type_t; /***********/ -/****s* OpenSM: Base/MAX_UPDN_GUID_FILE_LINE_LENGTH +/****s* OpenSM: Base/MAX_GUID_FILE_LINE_LENGTH * NAME -* MAX_UPDN_GUID_FILE_LINE_LENGTH +* MAX_GUID_FILE_LINE_LENGTH * * DESCRIPTION -* The maximum line number when reading updn guid file +* The maximum line number when reading guid file * * SYNOPSIS */ -#define MAX_UPDN_GUID_FILE_LINE_LENGTH 120 +#define MAX_GUID_FILE_LINE_LENGTH 120 /**********/ /****s* OpenSM: Base/VendorOUIs diff --git a/opensm/include/opensm/osm_ucast_mgr.h b/opensm/include/opensm/osm_ucast_mgr.h index 39bf45a..e003f31 100644 --- a/opensm/include/opensm/osm_ucast_mgr.h +++ b/opensm/include/opensm/osm_ucast_mgr.h @@ -293,6 +293,42 @@ osm_ucast_mgr_build_lid_matrices( * Unicast Manager *********/ +/****f* OpenSM: Unicast Manager/osm_ucast_mgr_read_guid_file +* NAME +* osm_ucast_mgr_read_guid_file +* +* DESCRIPTION +* Read guid list from file. +* +* SYNOPSIS +*/ +cl_status_t +osm_ucast_mgr_read_guid_file( + IN osm_ucast_mgr_t * const p_mgr, + IN const char * guid_file_name, + IN cl_list_t * p_list ); +/* +* PARAMETERS +* p_mgr +* [in] Pointer to an osm_ucast_mgr_t object. +* +* guid_file_name +* [in] Name of the file to read. +* +* p_list +* [in] Pointer to the list that will be filled with guids. +* +* RETURN VALUES +* IB_SUCCESS if the file was read successfully. +* +* NOTES +* This function reads guids from a file and inserts them +* into a list. +* +* SEE ALSO +* Unicast Manager +*********/ + /****f* OpenSM: Unicast Manager/osm_ucast_mgr_process * NAME * osm_ucast_mgr_process diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c index 9f40242..5182718 100644 --- a/opensm/opensm/osm_ucast_mgr.c +++ b/opensm/opensm/osm_ucast_mgr.c @@ -1044,6 +1044,80 @@ ucast_mgr_setup_all_switches(osm_subn_t *p_subn) /********************************************************************** **********************************************************************/ + +cl_status_t +osm_ucast_mgr_read_guid_file( + IN osm_ucast_mgr_t * const p_mgr, + IN const char * guid_file_name, + IN cl_list_t * p_list ) +{ + cl_status_t status = IB_SUCCESS; + FILE * guid_file; + char line[MAX_GUID_FILE_LINE_LENGTH]; + char * endptr; + uint64_t * p_guid; + + OSM_LOG_ENTER(p_mgr->p_log, osm_ucast_mgr_read_guid_file); + + guid_file = fopen(guid_file_name, "r"); + if (guid_file == NULL) + { + osm_log( p_mgr->p_log, OSM_LOG_ERROR, + "osm_ucast_mgr_read_guid_file: ERR 3A13: " + "Failed to open guid list file (%s)\n", + guid_file_name ); + status = IB_NOT_FOUND; + goto Exit; + } + + while ( fgets(line, MAX_GUID_FILE_LINE_LENGTH, guid_file) ) + { + if (strcspn(line, " ,;.") != strlen(line)) + { + osm_log( p_mgr->p_log, OSM_LOG_ERROR, + "osm_ucast_mgr_read_guid_file: ERR 3A14: " + "Bad formatted guid in file (%s): %s\n", + guid_file_name, line ); + status = IB_NOT_FOUND; + break; + } + + /* Skip empty lines anywhere in the file - only one + char means the null termination */ + if (strlen(line) <= 1) + continue; + + p_guid = malloc(sizeof(uint64_t)); + if (!p_guid) + { + status = IB_ERROR; + goto Exit; + } + + *p_guid = strtoull(line, &endptr, 16); + + /* check that the string is a number */ + if (!(*p_guid) && (*endptr != '\0')) + { + osm_log( p_mgr->p_log, OSM_LOG_ERROR, + "osm_ucast_mgr_read_guid_file: ERR 3A15: " + "Bad formatted guid in file (%s): %s\n", + guid_file_name, line ); + status = IB_NOT_FOUND; + break; + } + + /* store the parsed guid */ + cl_list_insert_tail(p_list, p_guid); + } + +Exit : + OSM_LOG_EXIT( p_mgr->p_log ); + return (status); +} + +/********************************************************************** + **********************************************************************/ osm_signal_t osm_ucast_mgr_process( IN osm_ucast_mgr_t* const p_mgr ) diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c index 95a0622..23a9db5 100644 --- a/opensm/opensm/osm_ucast_updn.c +++ b/opensm/opensm/osm_ucast_updn.c @@ -53,6 +53,7 @@ #include #include #include +#include /* //////////////////////////// */ /* Local types */ @@ -303,9 +304,6 @@ updn_init( IN osm_opensm_t *p_osm ) { cl_list_t * p_list; - FILE* p_updn_guid_file; - char line[MAX_UPDN_GUID_FILE_LINE_LENGTH]; - uint64_t * p_tmp; cl_list_iterator_t guid_iterator; ib_api_status_t status = IB_SUCCESS; @@ -332,45 +330,11 @@ updn_init( */ if (p_osm->subn.opt.updn_guid_file) { - /* Now parse guid from file */ - p_updn_guid_file = fopen(p_osm->subn.opt.updn_guid_file, "r"); - if (p_updn_guid_file == NULL) - { - osm_log( &p_osm->log, OSM_LOG_ERROR, - "updn_init: ERR AA02: " - "Failed to open guid list file (%s)\n", - p_osm->subn.opt.updn_guid_file ); - status = IB_NOT_FOUND; - goto Exit; - } - - while ( fgets(line, MAX_UPDN_GUID_FILE_LINE_LENGTH, p_updn_guid_file) ) - { - if (strcspn(line, " ,;.") == strlen(line)) - { - /* Skip empty lines anywhere in the file - only one char means the Null termination */ - if (strlen(line) > 1) - { - p_tmp = malloc(sizeof(uint64_t)); - if (!p_tmp) - { - status = IB_ERROR; - goto Exit; - } - *p_tmp = strtoull(line, NULL, 16); - cl_list_insert_tail(p_updn->p_root_nodes, p_tmp); - } - } - else - { - osm_log( &p_osm->log, OSM_LOG_ERROR, - "updn_init: ERR AA03: " - "Bad formatted guid in file (%s): %s\n", - p_osm->subn.opt.updn_guid_file, line ); - status = IB_NOT_FOUND; - break; - } - } + status = osm_ucast_mgr_read_guid_file( &p_osm->sm.ucast_mgr, + p_osm->subn.opt.updn_guid_file, + p_updn->p_root_nodes ); + if (status != IB_SUCCESS) + goto Exit; /* For Debug Purposes ... */ osm_log( &p_osm->log, OSM_LOG_DEBUG, -- 1.5.1.4 From kliteyn at dev.mellanox.co.il Mon Jun 11 04:04:47 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 11 Jun 2007 14:04:47 +0300 Subject: [ofa-general] [PATCH] osm: adding 4 options to ftree routing Message-ID: <466D2C4F.4050108@dev.mellanox.co.il> Hi Hal, Adding four options for fat-tree routing: ftree_root_guid_file Name of the file that contains list of root guids that will be used by fat-tree routing (provided by User) ftree_cn_guid_file Name of the file that contains list of compute node guids that will be used by fat-tree routing (provided by User) ftree_include_guid_file Name of the file that contains list of node guids that will be included when performing fat-tree routing (provided by User) ftree_exclude_guid_file Name of the file that contains list of node guids that will be excluded when performing fat-tree routing (provided by User) For now, these options are exposed through options file only. -- Yevgeny Signed-off-by: Yevgeny Kliteynik --- opensm/include/opensm/osm_subnet.h | 20 +++++++++++++++ opensm/opensm/osm_subnet.c | 46 ++++++++++++++++++++++++++++++++++++ 2 files changed, 66 insertions(+), 0 deletions(-) diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index c62128b..39eed2b 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -279,6 +279,10 @@ typedef struct _osm_subn_opt char * lid_matrix_dump_file; char * ucast_dump_file; char * updn_guid_file; + char * ftree_root_guid_file; + char * ftree_cn_guid_file; + char * ftree_include_guid_file; + char * ftree_exclude_guid_file; char * sa_db_file; boolean_t exit_on_fatal; boolean_t honor_guid2lid_file; @@ -455,6 +459,22 @@ typedef struct _osm_subn_opt * updn_guid_file * Pointer to name of the UPDN guid file given by User * +* ftree_root_guid_file +* Name of the file that contains list of root guids that +* will be used by fat-tree routing (provided by User) +* +* ftree_cn_guid_file +* Name of the file that contains list of compute node guids that +* will be used by fat-tree routing (provided by User) +* +* ftree_include_guid_file +* Name of the file that contains list of node guids that +* will be included when performing fat-tree routing (provided by User) +* +* ftree_exclude_guid_file +* Name of the file that contains list of node guids that +* will be excluded when performing fat-tree routing (provided by User) +* * sa_db_file * Name of the SA database file. * diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 736f49a..7219876 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -501,6 +501,10 @@ osm_subn_set_default_opt( p_opt->lid_matrix_dump_file = NULL; p_opt->ucast_dump_file = NULL; p_opt->updn_guid_file = NULL; + p_opt->ftree_root_guid_file = NULL; + p_opt->ftree_cn_guid_file = NULL; + p_opt->ftree_include_guid_file = NULL; + p_opt->ftree_exclude_guid_file = NULL; p_opt->sa_db_file = NULL; p_opt->exit_on_fatal = TRUE; p_opt->enable_quirks = FALSE; @@ -1326,6 +1330,22 @@ osm_subn_parse_conf_file( "updn_guid_file", p_key, p_val, &p_opts->updn_guid_file); + __osm_subn_opts_unpack_charp( + "updn_guid_file", + p_key, p_val, &p_opts->ftree_root_guid_file); + + __osm_subn_opts_unpack_charp( + "updn_guid_file", + p_key, p_val, &p_opts->ftree_cn_guid_file); + + __osm_subn_opts_unpack_charp( + "updn_guid_file", + p_key, p_val, &p_opts->ftree_include_guid_file); + + __osm_subn_opts_unpack_charp( + "updn_guid_file", + p_key, p_val, &p_opts->ftree_exclude_guid_file); + __osm_subn_opts_unpack_charp( "sa_db_file", p_key, p_val, &p_opts->sa_db_file); @@ -1554,6 +1574,32 @@ osm_subn_write_conf_file( "# One guid in each line\n" "updn_guid_file %s\n\n", p_opts->updn_guid_file); + if (p_opts->ftree_root_guid_file) + fprintf( opts_file, + "# The file holding the fat-tree root node guids\n" + "# One guid in each line\n" + "ftree_root_guid_file %s\n\n", + p_opts->ftree_root_guid_file); + if (p_opts->ftree_cn_guid_file) + fprintf( opts_file, + "# The file holding the fat-tree compute node guids\n" + "# One guid in each line\n" + "ftree_cn_guid_file %s\n\n", + p_opts->ftree_cn_guid_file); + if (p_opts->ftree_include_guid_file) + fprintf( opts_file, + "# The file holding the node guids that should be included\n" + "# in fat-tree routing balancing\n" + "# One guid in each line\n" + "ftree_include_guid_file %s\n\n", + p_opts->ftree_include_guid_file); + if (p_opts->ftree_exclude_guid_file) + fprintf( opts_file, + "# The file holding the node guids that should be excluded\n" + "# from fat-tree routing balancing\n" + "# One guid in each line\n" + "ftree_exclude_guid_file %s\n\n", + p_opts->ftree_exclude_guid_file); if (p_opts->sa_db_file) fprintf( opts_file, "# SA database file name\n" -- 1.5.1.4 From sfac at telus.net Mon Jun 11 06:19:07 2007 From: sfac at telus.net (basis) Date: Mon, 11 Jun 2007 08:19:07 -0500 Subject: [ofa-general] delve Message-ID: <466D4BCB.2070806@telus.net> CAON Now Holds 12 Environmental Patents! Investors Respond! Chan-On International Inc. Symbol: CAON Close: $0.72 UP 4.35% CAON acquires Harbin Hongbo and its 12 patents. This company's new direction was released in a fact sheet Friday. Investors are already jumping all over it. Read the release and get all over CAON first thing Monday! All women enjoy the ectsacy of this front-fastening dildo penetrating them! Infomedia UK Ltd are not responsible for any of the content displayed on this page or any other part of the live webcams section of this site. You cannot buy Dinky Banger cheaper online in the UK! It is just like watching a small TV screen on your computer. The non-tarnashing nickel free clip gently squeezes and lifts the clitoris while the crystals move against the labia driving you both crazy when you make love. Your nipple is drawn out and held erect thus increasing sexual sensitivity. A powerful waterproof wireless bullet fits neatly into a ribbed and ridged sleeve to provide a seriously turbo-charged clitoral stimulator and cock ring set. It works with your body transmitting small muscle tremors and contractions via the 'plug' to your Prostate and via the up-turned probe to your Perineum. Smuggled from the orient, this mysterious recipe has been poorly copied and black-marketed for decades but only now has become available to the public in what's believed to be its original form. It works with your body transmitting small muscle tremors and contractions via the 'plug' to your Prostate and via the up-turned probe to your Perineum. Both massagers can be turned on separately or simultaneously and can be enjoyed by both partners or on your own, try putting one in your panty and one in your bra, then go out and have some fun! From his star-turning vehicle Stone Fox to the multi-award winning Bolt, Eddie Stone always delivers. You cannot buy Nipple Enlarger cheaper online in the UK! You cannot buy Finger Rabbit cheaper online in the UK! The ribbed part of the shaft that lands right on the clitoris gets you closer and closer to orgasm as you grind away. You cannot buy Clit Clip cheaper online in the UK! The non-tarnashing nickel free clip gently squeezes and lifts the clitoris while the crystals move against the labia driving you both crazy when you make love. The inside of the masturbator has a squirmy action that massages the shaft of your penis while the top of the masturbator contains a high-powered vibrating bullet. The non-tarnashing nickel free clip gently squeezes and lifts the clit. You pump this baby to build your manhood. Your pleasure is our business Sex toys and lingerie are a fun and safe way to bring excitement into your love life. So it is just about legal! You cannot buy Finger Rabbit cheaper online in the UK! Please note that you can view without having to download any software. Half of us have penises, the other half have vaginas, let's get together! It is probably not politically correct and We don't give a fuck about that. Many sites make this claim. They just assume that you are daft enough to take them at their word. AdultsExoticA is about having a laugh, about fun and about not taking things too seriously. The shaft itself is made of firm plastic covered in latex and features a strengthening rib. Half of us have penises, the other half have vaginas, let's get together! In the shower that is! It works with your body transmitting small muscle tremors and contractions via the 'plug' to your Prostate and via the up-turned probe to your Perineum. He was an overnight sensation since his first release, Detention. It will never subscribe to that narrow minded way of thinking. Our adult webcam chathosts include girls, guys, gays and lesbians, couples and groups, transvestites, transsexuals and cross dressers. So it is just about legal! All women enjoy the ectsacy of this front-fastening dildo penetrating them! You cannot buy Finger Rabbit cheaper online in the UK! Allows for skin on skin contact between partners with nothing to get in the way and regular use will strengthen the kegel muscles in the vagina which will help produce more intense orgasms. Please note our SPECIAL price. Use with the matching Cyberskin ring and micro-bullet for testicular or clitoral stimulation. Conquer your lover with unending passion and pleasure when you massge this secret potion on delicate vaginal walls. Your pleasure is our business Sex toys and lingerie are a fun and safe way to bring excitement into your love life. Moulded form Johhny himself this super life-like, highly detailed dong is over eight inches long and made from Sensafirm rubber for a smooth comfortable ride. The shaft itself is made of firm plastic covered in latex and features a strengthening rib. Why shop at Bionic Tonic? And there's a kinky surprise waiting around every corner. From halr at voltaire.com Mon Jun 11 06:22:44 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 11 Jun 2007 09:22:44 -0400 Subject: [ofa-general] Re: [PATCHv2] osm: reading guids file in ucast_mgr In-Reply-To: <466D2BBF.60406@dev.mellanox.co.il> References: <466D2BBF.60406@dev.mellanox.co.il> Message-ID: <1181568159.8896.75500.camel@hal.voltaire.com> Hi Yevgeny, On Mon, 2007-06-11 at 07:02, Yevgeny Kliteynik wrote: > Hi Hal, > > | [V2] Nothing was changed in the patch, but the previous > | mail had some garbage in the explanation text. This patch version causes: File to patch: include/opensm/osm_base.h patching file include/opensm/osm_base.h patch: **** malformed patch at line 65: } osm_mcast_req_type_t; So I used the original patch with the comments from here. > This patch removes a code that was reading root guids file in > osm_ucast_updn.c and replaces it with a more general function > in osm_ucast_mgr.c > > This function will also be used by fat-tree routing. > > -- Yevgeny > > Signed-off-by: Yevgeny Kliteynik > --- > opensm/include/opensm/osm_base.h | 8 ++-- > opensm/include/opensm/osm_ucast_mgr.h | 36 ++++++++++++++++ > opensm/opensm/osm_ucast_mgr.c | 74 +++++++++++++++++++++++++++++++++ > opensm/opensm/osm_ucast_updn.c | 48 +++------------------ > 4 files changed, 120 insertions(+), 46 deletions(-) Thanks. Applied. -- Hal From eitan at mellanox.co.il Mon Jun 11 06:32:47 2007 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 11 Jun 2007 16:32:47 +0300 Subject: [ofa-general] OpenSM Up-Down algorithm In-Reply-To: <1181557691.8896.64610.camel@hal.voltaire.com> References: <39C75744D164D948A170E9792AF8E7CA0D2914@exil.voltaire.com> <1181557691.8896.64610.camel@hal.voltaire.com> Message-ID: <6C2C79E72C305246B504CBA17B5500C901AAF78C@mtlexch01.mtl.com> Hi Hal, Sagi, > > On Sun, 2007-06-10 at 11:59, Sagi Schlanger wrote: > > Hi, > > > > I'm looking for some answers on Up-Down routing at OpenSM . > > > > Is anybody familiar with a utility/procedure to find credit loops > > given a topology and routing settings? > > I know there was at least talk of ibdiagnet (in ibutils) > checking this. > Not sure if it is implemented (yet) or if it is routing > algorithm independent. Eitan ? > > > Is there a handy spec describing the OpenSM Up-Down algorithm? > > The OpenSM up/down routing is based on the following paper: > > "Effective Strategy to Compute Forwarding Tables for > InfiniBand Networks" > Jose Carlos Sancho, Universidad Politécnica de Valencia > Antonio Robles, Universidad Politécnica de Valencia Jose > Duato, Universidad Politécnica de Valencia > > http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/ > dl/proceedings/&toc=comp/proceedings/icpp/2001/1257/00/1257toc > .xml&DOI=10.1109/ICPP.2001.952046 > > > What is the scheme through which roots are defined on clos and non > > clos/fat tree topologies? > > The admin can supply the roots via -a > option when invoking OpenSM. > > Auto-detect root nodes - based on the CA hop length > from any switch > in the subnet, a statistical histogram is built for > each switch (hop > num vs number of occurrences). If the histogram > reflects a specific > column (higher than others) for a certain node, then > it is marked as a > root node. Since the algorithm is statistical, it may > not find any root > nodes. The list of the root nodes found by this > auto-detect stage is > used by the ranking process stage. > > Note 1: The user can override the node list manually. > Note 2: If this stage cannot find any root nodes, > and the user did > not specify a guid list file, OpenSM > defaults back to the > Min Hop routing algorithm. > > > Is this algorithm always credit loop free? > YES IT IS > It's supposed to be. > > > How efficient is using this algorithm on non clos/fat tree > topologies? > > What do you mean by efficiency ? Also, are you asking about > pure fat tree or non pure fat tree (or both) ? > > -- Hal > > > Thanks for your cooperation, > > Sagi > > > > ____________________________________________________________ > > Sagi Schlanger | +972-9-9717651 (o) | +972-52-2385154 (m) > > Software Engineer, IB Switch > > Voltaire - The Grid Backbone > > > > www.voltaire.com > > > > > > > > > ______________________________________________________________________ > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From kliteyn at dev.mellanox.co.il Mon Jun 11 06:55:10 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 11 Jun 2007 16:55:10 +0300 Subject: [ofa-general] [PATCH] osm: adding 2 options to ftree routing Message-ID: <466D543E.2020209@dev.mellanox.co.il> Hi Hal, [this patch replaces the "adding 4 options to ftree routing" patch] Adding two options for fat-tree routing: ftree_root_guid_file Name of the file that contains list of root guids that will be used by fat-tree routing (provided by User) ftree_cn_guid_file Name of the file that contains list of compute node guids that will be used by fat-tree routing (provided by User) For now, these options are exposed through options file only. -- Yevgeny Signed-off-by: Yevgeny Kliteynik --- opensm/include/opensm/osm_subnet.h | 10 ++++++++++ opensm/opensm/osm_subnet.c | 22 ++++++++++++++++++++++ 2 files changed, 32 insertions(+), 0 deletions(-) diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index c62128b..46d90d6 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -279,6 +279,8 @@ typedef struct _osm_subn_opt char * lid_matrix_dump_file; char * ucast_dump_file; char * updn_guid_file; + char * ftree_root_guid_file; + char * ftree_cn_guid_file; char * sa_db_file; boolean_t exit_on_fatal; boolean_t honor_guid2lid_file; @@ -455,6 +457,14 @@ typedef struct _osm_subn_opt * updn_guid_file * Pointer to name of the UPDN guid file given by User * +* ftree_root_guid_file +* Name of the file that contains list of root guids that +* will be used by fat-tree routing (provided by User) +* +* ftree_cn_guid_file +* Name of the file that contains list of compute node guids that +* will be used by fat-tree routing (provided by User) +* * sa_db_file * Name of the SA database file. * diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 736f49a..a39ada6 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -501,6 +501,8 @@ osm_subn_set_default_opt( p_opt->lid_matrix_dump_file = NULL; p_opt->ucast_dump_file = NULL; p_opt->updn_guid_file = NULL; + p_opt->ftree_root_guid_file = NULL; + p_opt->ftree_cn_guid_file = NULL; p_opt->sa_db_file = NULL; p_opt->exit_on_fatal = TRUE; p_opt->enable_quirks = FALSE; @@ -1326,6 +1328,14 @@ osm_subn_parse_conf_file( "updn_guid_file", p_key, p_val, &p_opts->updn_guid_file); + __osm_subn_opts_unpack_charp( + "updn_guid_file", + p_key, p_val, &p_opts->ftree_root_guid_file); + + __osm_subn_opts_unpack_charp( + "updn_guid_file", + p_key, p_val, &p_opts->ftree_cn_guid_file); + __osm_subn_opts_unpack_charp( "sa_db_file", p_key, p_val, &p_opts->sa_db_file); @@ -1554,6 +1564,18 @@ osm_subn_write_conf_file( "# One guid in each line\n" "updn_guid_file %s\n\n", p_opts->updn_guid_file); + if (p_opts->ftree_root_guid_file) + fprintf( opts_file, + "# The file holding the fat-tree root node guids\n" + "# One guid in each line\n" + "ftree_root_guid_file %s\n\n", + p_opts->ftree_root_guid_file); + if (p_opts->ftree_cn_guid_file) + fprintf( opts_file, + "# The file holding the fat-tree compute node guids\n" + "# One guid in each line\n" + "ftree_cn_guid_file %s\n\n", + p_opts->ftree_cn_guid_file); if (p_opts->sa_db_file) fprintf( opts_file, "# SA database file name\n" -- 1.5.1.4 From kliteyn at dev.mellanox.co.il Mon Jun 11 07:15:14 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 11 Jun 2007 17:15:14 +0300 Subject: [ofa-general] [PATCH] osm: up/dn ranking - making code more intuitive In-Reply-To: <20070524225428.GK837@sashak.voltaire.com> References: <46503064.7010107@dev.mellanox.co.il> <20070520161034.GY19271@sashak.voltaire.com> <4651557E.2080400@dev.mellanox.co.il> <20070524225428.GK837@sashak.voltaire.com> Message-ID: <466D58F2.1020402@dev.mellanox.co.il> Hi Hal. Following up our discussion with Sasha regarding the ranking optimization in up/dn routing: >> I do think that to make the code more "intuitive" we might >> want to remove the __updn_update_rank() and do something like this: >> >> if (remote_u->rank > u->rank + 1) >> { >> remote_u->rank = u->rank + 1; >> max_rank = remote_u->rank; >> cl_qlist_insert_tail(&list, &remote_u->list); >> } Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_ucast_updn.c | 33 ++++++++------------------------- 1 files changed, 8 insertions(+), 25 deletions(-) diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c index 23a9db5..2448246 100644 --- a/opensm/opensm/osm_ucast_updn.c +++ b/opensm/opensm/osm_ucast_updn.c @@ -135,23 +135,6 @@ __updn_get_dir( } /********************************************************************** - **********************************************************************/ -/* This function updates rank value for a node */ -/* Return 0 if no need to further update 1 if determined a new value */ -static int -__updn_update_rank( - IN struct updn_node *u, - IN unsigned rank ) -{ - if (u->rank > rank) - { - u->rank = rank; - return 1; - } - return 0; -} - -/********************************************************************** * This function does the bfs of min hop table calculation by guid index * as a starting point. **********************************************************************/ @@ -375,7 +358,6 @@ updn_subn_rank( osm_switch_t *p_sw; osm_physp_t *p_physp, *p_remote_physp; cl_qlist_t list; - cl_status_t did_cause_update; struct updn_node *u, *remote_u; uint8_t num_ports, port_num; osm_log_t *p_log = &p_updn->p_osm->log; @@ -403,7 +385,7 @@ updn_subn_rank( osm_log( p_log, OSM_LOG_DEBUG, "updn_subn_rank: " "Ranking root port GUID 0x%" PRIx64 "\n", guid_list[idx] ); - __updn_update_rank(u, 0); + u->rank = 0; cl_qlist_insert_tail(&list, &u->list); } @@ -438,7 +420,13 @@ updn_subn_rank( { remote_u = p_remote_physp->p_node->sw->priv; port_guid = p_remote_physp->port_guid; - did_cause_update = __updn_update_rank(remote_u, u->rank+1); + + if (remote_u->rank > u->rank+1) + { + remote_u->rank = u->rank + 1; + max_rank = remote_u->rank; + cl_qlist_insert_tail(&list, &remote_u->list); + } osm_log( p_log, OSM_LOG_DEBUG, "updn_subn_rank: " @@ -446,11 +434,6 @@ updn_subn_rank( cl_ntoh64(port_guid), remote_u->rank ); - if (did_cause_update) - { - cl_qlist_insert_tail(&list, &remote_u->list); - max_rank = remote_u->rank; - } } } } -- 1.5.1.4 From kliteyn at dev.mellanox.co.il Mon Jun 11 07:21:37 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 11 Jun 2007 17:21:37 +0300 Subject: [ofa-general] [PATCH] osm: TRIVIAL bug fix Message-ID: <466D5A71.2040301@dev.mellanox.co.il> Hi Hal, Fixing a small bug that was "inherited" when moved code that reads guid file from osm_ucast_updn.c to osm_ucast_mgr.c - closing file descriptor when finished reading the guid file. -- Yevgeny Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_ucast_mgr.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c index b080f59..d855683 100644 --- a/opensm/opensm/osm_ucast_mgr.c +++ b/opensm/opensm/osm_ucast_mgr.c @@ -1052,7 +1052,7 @@ osm_ucast_mgr_read_guid_file( IN cl_list_t * p_list ) { cl_status_t status = IB_SUCCESS; - FILE * guid_file; + FILE * guid_file = NULL; char line[MAX_GUID_FILE_LINE_LENGTH]; char * endptr; uint64_t * p_guid; @@ -1112,6 +1112,8 @@ osm_ucast_mgr_read_guid_file( } Exit : + if (guid_file) + fclose(guid_file); OSM_LOG_EXIT( p_mgr->p_log ); return (status); } -- 1.5.1.4 From halr at voltaire.com Mon Jun 11 07:39:19 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 11 Jun 2007 10:39:19 -0400 Subject: [ofa-general] Re: [PATCH] osm: TRIVIAL bug fix In-Reply-To: <466D5A71.2040301@dev.mellanox.co.il> References: <466D5A71.2040301@dev.mellanox.co.il> Message-ID: <1181572757.8896.80271.camel@hal.voltaire.com> Hi Yevgeny, On Mon, 2007-06-11 at 10:21, Yevgeny Kliteynik wrote: > Hi Hal, > > Fixing a small bug that was "inherited" when moved code that > reads guid file from osm_ucast_updn.c to osm_ucast_mgr.c - > closing file descriptor when finished reading the guid file. > > -- Yevgeny > > Signed-off-by: Yevgeny Kliteynik > --- > opensm/opensm/osm_ucast_mgr.c | 4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > > diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c > index b080f59..d855683 100644 > --- a/opensm/opensm/osm_ucast_mgr.c > +++ b/opensm/opensm/osm_ucast_mgr.c > @@ -1052,7 +1052,7 @@ osm_ucast_mgr_read_guid_file( > IN cl_list_t * p_list ) > { > cl_status_t status = IB_SUCCESS; > - FILE * guid_file; > + FILE * guid_file = NULL; Is this really needed ? Doesn't fopen return NULL on error ? -- Hal > char line[MAX_GUID_FILE_LINE_LENGTH]; > char * endptr; > uint64_t * p_guid; > @@ -1112,6 +1112,8 @@ osm_ucast_mgr_read_guid_file( > } > > Exit : > + if (guid_file) > + fclose(guid_file); > OSM_LOG_EXIT( p_mgr->p_log ); > return (status); > } From kliteyn at dev.mellanox.co.il Mon Jun 11 07:50:38 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 11 Jun 2007 17:50:38 +0300 Subject: [ofa-general] Re: [PATCH] osm: TRIVIAL bug fix In-Reply-To: <1181572757.8896.80271.camel@hal.voltaire.com> References: <466D5A71.2040301@dev.mellanox.co.il> <1181572757.8896.80271.camel@hal.voltaire.com> Message-ID: <466D613E.8080107@dev.mellanox.co.il> Hi Hal, Hal Rosenstock wrote: > Hi Yevgeny, > > On Mon, 2007-06-11 at 10:21, Yevgeny Kliteynik wrote: >> Hi Hal, >> >> Fixing a small bug that was "inherited" when moved code that >> reads guid file from osm_ucast_updn.c to osm_ucast_mgr.c - >> closing file descriptor when finished reading the guid file. >> >> -- Yevgeny >> >> Signed-off-by: Yevgeny Kliteynik >> --- >> opensm/opensm/osm_ucast_mgr.c | 4 +++- >> 1 files changed, 3 insertions(+), 1 deletions(-) >> >> diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c >> index b080f59..d855683 100644 >> --- a/opensm/opensm/osm_ucast_mgr.c >> +++ b/opensm/opensm/osm_ucast_mgr.c >> @@ -1052,7 +1052,7 @@ osm_ucast_mgr_read_guid_file( >> IN cl_list_t * p_list ) >> { >> cl_status_t status = IB_SUCCESS; >> - FILE * guid_file; >> + FILE * guid_file = NULL; > > Is this really needed ? Doesn't fopen return NULL on error ? You're right, it's not needed. -- Yevgeny. > -- Hal > >> char line[MAX_GUID_FILE_LINE_LENGTH]; >> char * endptr; >> uint64_t * p_guid; >> @@ -1112,6 +1112,8 @@ osm_ucast_mgr_read_guid_file( >> } >> >> Exit : >> + if (guid_file) >> + fclose(guid_file); >> OSM_LOG_EXIT( p_mgr->p_log ); >> return (status); >> } > > From jackm at dev.mellanox.co.il Mon Jun 11 08:09:50 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Mon, 11 Jun 2007 18:09:50 +0300 Subject: [ofa-general] [PATCH] libmlx4: fix problem in post_send error flow (inline wqes) Message-ID: <200706111809.51070.jackm@dev.mellanox.co.il> Prevents the following error: caller posts a 2-wqe list, with the second wqe in the list being an INLINE which is too long. In this case, post_send goes to "out" with: nreq = 1, inl positive, and size in the range allowing blueflame. All the blueflame test conditions are met. However, the cntl pointer now points to the invalid wqe, and this will be "blueflamed". Signed-off-by: Jack Morgenstein diff --git a/src/qp.c b/src/qp.c index 92edec6..7df3311 100644 --- a/src/qp.c +++ b/src/qp.c @@ -236,6 +236,7 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, inl += len; if (inl > qp->max_inline_data) { + inl = 0; ret = -1; *bad_wr = wr; goto out; From halr at voltaire.com Mon Jun 11 08:10:18 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 11 Jun 2007 11:10:18 -0400 Subject: [ofa-general] Re: [PATCH] osm: TRIVIAL bug fix In-Reply-To: <466D5A71.2040301@dev.mellanox.co.il> References: <466D5A71.2040301@dev.mellanox.co.il> Message-ID: <1181574616.8896.82260.camel@hal.voltaire.com> Hi Yevgeny, On Mon, 2007-06-11 at 10:21, Yevgeny Kliteynik wrote: > Hi Hal, > > Fixing a small bug that was "inherited" when moved code that > reads guid file from osm_ucast_updn.c to osm_ucast_mgr.c - > closing file descriptor when finished reading the guid file. > > -- Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied without the initialization of *guid_file to NULL as discussed. -- Hal > --- > opensm/opensm/osm_ucast_mgr.c | 4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > > diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c > index b080f59..d855683 100644 > --- a/opensm/opensm/osm_ucast_mgr.c > +++ b/opensm/opensm/osm_ucast_mgr.c > @@ -1052,7 +1052,7 @@ osm_ucast_mgr_read_guid_file( > IN cl_list_t * p_list ) > { > cl_status_t status = IB_SUCCESS; > - FILE * guid_file; > + FILE * guid_file = NULL; > char line[MAX_GUID_FILE_LINE_LENGTH]; > char * endptr; > uint64_t * p_guid; > @@ -1112,6 +1112,8 @@ osm_ucast_mgr_read_guid_file( > } > > Exit : > + if (guid_file) > + fclose(guid_file); > OSM_LOG_EXIT( p_mgr->p_log ); > return (status); > } From halr at voltaire.com Mon Jun 11 08:12:24 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 11 Jun 2007 11:12:24 -0400 Subject: [ofa-general] Re: [PATCH] opensm: clean unused OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED In-Reply-To: <20070610223301.GC23029@sashak.voltaire.com> References: <20070610223301.GC23029@sashak.voltaire.com> Message-ID: <1181574618.8896.82262.camel@hal.voltaire.com> On Sun, 2007-06-10 at 18:33, Sasha Khapyorsky wrote: > This removes unused OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED sm signal > enum value. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From tziporet at dev.mellanox.co.il Mon Jun 11 09:28:57 2007 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 11 Jun 2007 19:28:57 +0300 Subject: [ofa-general] Re: [ewg] OFED teleconference today - meeting summary In-Reply-To: <466D667B.8060605@mellanox.co.il> References: <28F7CA62-7C0E-4B03-A8BC-5AC40C32DC35@cisco.com> <466D667B.8060605@mellanox.co.il> Message-ID: <466D7849.1070707@mellanox.co.il> > Agenda for the meeting today: > - Review open bugs and decide on the release > 567 blocker rolandd at cisco.com RHEL5 ppc64 UD verbs failures > 577 critical ishai at mellanox.co.il SRP multipath failover too slow > (minutes, not seconds) > 629 major monis at voltaire.com ib-bonding: sometimes slow failover is > noticed > 541 major mst at mellanox.co.il slow failover with IPoIB CM > bonding/ipoibtools HA > 642 major pasha at mellanox.co.il Failed to build mvapich with PGI > compiler > > > > My suggestion wait only for Bonding and MPI fixes and have RC5 done on > Wed. > This RC5 should become the official release > > In the meeting today we decided the following: For RC5 we will fix only 2 more issues: 629 - new bonding module is already ready 642 - got approval from OSU so we will enhance MPI to support PGI compiler 558 - Scott should find with Roland if there is a fix for tvflush for SLES10 SP1 and if its fixed we can take this one too. RC5 will be published on Wed June 13, and it is targeted to become the GA release. GA release will be published after a week of QA - target date is June 20. Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Mon Jun 11 09:30:21 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 11 Jun 2007 09:30:21 -0700 Subject: [ofa-general] Re: [ewg] OFED teleconference today - meeting summary In-Reply-To: <466D7849.1070707@mellanox.co.il> References: <28F7CA62-7C0E-4B03-A8BC-5AC40C32DC35@cisco.com><466D667B.8060605@mellanox.co.il> <466D7849.1070707@mellanox.co.il> Message-ID: I'm not touching tvflush! :-) ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Tziporet Koren Sent: Monday, June 11, 2007 9:29 AM To: Tziporet Koren; EWG; OpenFabrics General Subject: [ofa-general] Re: [ewg] OFED teleconference today - meeting summary Agenda for the meeting today: - Review open bugs and decide on the release 567 blocker rolandd at cisco.com RHEL5 ppc64 UD verbs failures 577 critical ishai at mellanox.co.il SRP multipath failover too slow (minutes, not seconds) 629 major monis at voltaire.com ib-bonding: sometimes slow failover is noticed 541 major mst at mellanox.co.il slow failover with IPoIB CM bonding/ipoibtools HA 642 major pasha at mellanox.co.il Failed to build mvapich with PGI compiler My suggestion wait only for Bonding and MPI fixes and have RC5 done on Wed. This RC5 should become the official release In the meeting today we decided the following: For RC5 we will fix only 2 more issues: 629 - new bonding module is already ready 642 - got approval from OSU so we will enhance MPI to support PGI compiler 558 - Scott should find with Roland if there is a fix for tvflush for SLES10 SP1 and if its fixed we can take this one too. RC5 will be published on Wed June 13, and it is targeted to become the GA release. GA release will be published after a week of QA - target date is June 20. Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at mellanox.co.il Mon Jun 11 09:34:39 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 11 Jun 2007 19:34:39 +0300 Subject: [ofa-general] Re: [ewg] OFED teleconference today - meeting summary In-Reply-To: References: <28F7CA62-7C0E-4B03-A8BC-5AC40C32DC35@cisco.com><466D667B.8060605@mellanox.co.il> <466D7849.1070707@mellanox.co.il> Message-ID: <6C2C79E72C305246B504CBA17B5500C9015635D5@mtlexch01.mtl.com> Vlad, please disable tvflush on SLES10 SP1 Thanks, Tziporet ________________________________ From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Sent: Monday, June 11, 2007 7:30 PM To: Tziporet Koren; Tziporet Koren; EWG; OpenFabrics General Subject: RE: [ofa-general] Re: [ewg] OFED teleconference today - meeting summary I'm not touching tvflush! :-) ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Tziporet Koren Sent: Monday, June 11, 2007 9:29 AM To: Tziporet Koren; EWG; OpenFabrics General Subject: [ofa-general] Re: [ewg] OFED teleconference today - meeting summary Agenda for the meeting today: - Review open bugs and decide on the release 567 blocker rolandd at cisco.com RHEL5 ppc64 UD verbs failures 577 critical ishai at mellanox.co.il SRP multipath failover too slow (minutes, not seconds) 629 major monis at voltaire.com ib-bonding: sometimes slow failover is noticed 541 major mst at mellanox.co.il slow failover with IPoIB CM bonding/ipoibtools HA 642 major pasha at mellanox.co.il Failed to build mvapich with PGI compiler My suggestion wait only for Bonding and MPI fixes and have RC5 done on Wed. This RC5 should become the official release In the meeting today we decided the following: For RC5 we will fix only 2 more issues: 629 - new bonding module is already ready 642 - got approval from OSU so we will enhance MPI to support PGI compiler 558 - Scott should find with Roland if there is a fix for tvflush for SLES10 SP1 and if its fixed we can take this one too. RC5 will be published on Wed June 13, and it is targeted to become the GA release. GA release will be published after a week of QA - target date is June 20. Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Mon Jun 11 09:35:30 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 11 Jun 2007 09:35:30 -0700 Subject: [ofa-general] Re: [ewg] OFED teleconference today - meeting summary In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9015635D5@mtlexch01.mtl.com> References: <28F7CA62-7C0E-4B03-A8BC-5AC40C32DC35@cisco.com><466D667B.8060605@mellanox.co.il> <466D7849.1070707@mellanox.co.il> <6C2C79E72C305246B504CBA17B5500C9015635D5@mtlexch01.mtl.com> Message-ID: You missed the joke, it's *tvflash* not *tvflush*. I have asked Roland about tvflash. Scott ________________________________ From: Tziporet Koren [mailto:tziporet at mellanox.co.il] Sent: Monday, June 11, 2007 9:35 AM To: Scott Weitzenkamp (sweitzen); Tziporet Koren; EWG; OpenFabrics General Subject: RE: [ofa-general] Re: [ewg] OFED teleconference today - meeting summary Vlad, please disable tvflush on SLES10 SP1 Thanks, Tziporet ________________________________ From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Sent: Monday, June 11, 2007 7:30 PM To: Tziporet Koren; Tziporet Koren; EWG; OpenFabrics General Subject: RE: [ofa-general] Re: [ewg] OFED teleconference today - meeting summary I'm not touching tvflush! :-) ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Tziporet Koren Sent: Monday, June 11, 2007 9:29 AM To: Tziporet Koren; EWG; OpenFabrics General Subject: [ofa-general] Re: [ewg] OFED teleconference today - meeting summary Agenda for the meeting today: - Review open bugs and decide on the release 567 blocker rolandd at cisco.com RHEL5 ppc64 UD verbs failures 577 critical ishai at mellanox.co.il SRP multipath failover too slow (minutes, not seconds) 629 major monis at voltaire.com ib-bonding: sometimes slow failover is noticed 541 major mst at mellanox.co.il slow failover with IPoIB CM bonding/ipoibtools HA 642 major pasha at mellanox.co.il Failed to build mvapich with PGI compiler My suggestion wait only for Bonding and MPI fixes and have RC5 done on Wed. This RC5 should become the official release In the meeting today we decided the following: For RC5 we will fix only 2 more issues: 629 - new bonding module is already ready 642 - got approval from OSU so we will enhance MPI to support PGI compiler 558 - Scott should find with Roland if there is a fix for tvflush for SLES10 SP1 and if its fixed we can take this one too. RC5 will be published on Wed June 13, and it is targeted to become the GA release. GA release will be published after a week of QA - target date is June 20. Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsquyres at cisco.com Mon Jun 11 09:52:40 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Mon, 11 Jun 2007 12:52:40 -0400 Subject: [ofa-general] New OMPI / MPI_READ release notes patch Message-ID: Tziporet -- Here's a new patch for the OMPI release notes based on your current git. It includes updated information for Open MPI and text about mpi- selector. Note that there are a few areas in MPI_README that I need OSU and Mellanox to proofread. It would also be nice if someone else could eyeball the mpi-selector text and ensure it makes sense to a naive reader. -- Jeff Squyres Cisco Systems -------------- next part -------------- A non-text attachment was scrubbed... Name: ofed-1.2-mpi-docs.patch Type: application/octet-stream Size: 17419 bytes Desc: not available URL: -------------- next part -------------- From pradeeps at linux.vnet.ibm.com Mon Jun 11 11:08:47 2007 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Mon, 11 Jun 2007 11:08:47 -0700 Subject: [ofa-general] Re: IPOIB CM (NOSRQ) extension In-Reply-To: <20070610044146.GA4959@mellanox.co.il> References: <46687642.8040208@linux.vnet.ibm.com> <20070610044146.GA4959@mellanox.co.il> Message-ID: <466D8FAF.5090800@linux.vnet.ibm.com> Michael S. Tsirkin wrote: >> Quoting Pradeep Satyanarayana : >> Subject: IPOIB CM (NOSRQ) extension >> >> This patch handles the corner case of running out of RC QPs. In that >> case it switches to UD mode. This patch can be used both by NOSRQ and >> SRQ code. >> >> Signed-off-by: Pradeep Satyanarayana > > You don't provide any way to retry going back to connected mode, > after a failure, which is really intermittent by nature. That's pretty bad. This node switched to datagram mode, because the passive side was under a resource crunch (no RC QPs). And, the user is indeed alerted about this condition. So, yes we do not attempt to go back to connected mode. > >> --- >> >> --- c/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c >> 2007-06-07 11:13:55.000000000 -0400 >> +++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c >> 2007-06-07 11:11:21.000000000 -0400 >> @@ -1383,6 +1383,11 @@ static int ipoib_cm_tx_handler(struct ib >> break; >> case IB_CM_REQ_ERROR: >> case IB_CM_REJ_RECEIVED: >> + ipoib_warn(priv, "REJ received\n"); >> + neigh = tx->neigh; >> + if (neigh) >> + clear_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags); >> + break; >> case IB_CM_TIMEWAIT_EXIT: >> ipoib_dbg(priv, "CM error %d.\n", event->event); >> spin_lock_irq(&priv->tx_lock); > > This has an effect of dropping down to datagram mode > on errors such as CM timeout, or a reject due to stale connection. > I think this is a wrong thing to do. I can make this conditional upon there being no RC QPs. Will code that up in the next patch. Pradeep From mst at dev.mellanox.co.il Mon Jun 11 11:18:04 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 11 Jun 2007 21:18:04 +0300 Subject: [ofa-general] Re: IPOIB CM (NOSRQ) extension In-Reply-To: <466D8FAF.5090800@linux.vnet.ibm.com> References: <46687642.8040208@linux.vnet.ibm.com> <20070610044146.GA4959@mellanox.co.il> <466D8FAF.5090800@linux.vnet.ibm.com> Message-ID: <20070611181804.GE6470@mellanox.co.il> > Quoting Pradeep Satyanarayana : > Subject: Re: IPOIB CM (NOSRQ) extension > > Michael S. Tsirkin wrote: > >>Quoting Pradeep Satyanarayana : > >>Subject: IPOIB CM (NOSRQ) extension > >> > >>This patch handles the corner case of running out of RC QPs. In that > >>case it switches to UD mode. This patch can be used both by NOSRQ and > >>SRQ code. > >> > >>Signed-off-by: Pradeep Satyanarayana > > > >You don't provide any way to retry going back to connected mode, > >after a failure, which is really intermittent by nature. That's pretty bad. > > This node switched to datagram mode, because the passive side was > under a resource crunch (no RC QPs). And, the user is indeed alerted > about this condition. So, yes we do not attempt to go back to connected > mode. Need to retry switching to datagram mode after a while. -- MST From mst at dev.mellanox.co.il Mon Jun 11 11:18:49 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 11 Jun 2007 21:18:49 +0300 Subject: [ofa-general] Re: IPOIB CM (NOSRQ) extension In-Reply-To: <20070611181804.GE6470@mellanox.co.il> References: <46687642.8040208@linux.vnet.ibm.com> <20070610044146.GA4959@mellanox.co.il> <466D8FAF.5090800@linux.vnet.ibm.com> <20070611181804.GE6470@mellanox.co.il> Message-ID: <20070611181849.GF6470@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: Re: IPOIB CM (NOSRQ) extension > > > Quoting Pradeep Satyanarayana : > > Subject: Re: IPOIB CM (NOSRQ) extension > > > > Michael S. Tsirkin wrote: > > >>Quoting Pradeep Satyanarayana : > > >>Subject: IPOIB CM (NOSRQ) extension > > >> > > >>This patch handles the corner case of running out of RC QPs. In that > > >>case it switches to UD mode. This patch can be used both by NOSRQ and > > >>SRQ code. > > >> > > >>Signed-off-by: Pradeep Satyanarayana > > > > > >You don't provide any way to retry going back to connected mode, > > >after a failure, which is really intermittent by nature. That's pretty bad. > > > > This node switched to datagram mode, because the passive side was > > under a resource crunch (no RC QPs). And, the user is indeed alerted > > about this condition. So, yes we do not attempt to go back to connected > > mode. > > Need to retry switching to datagram mode after a while. Sorry, that should have been "switching to connected mode". -- MST From halr at voltaire.com Mon Jun 11 11:40:17 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 11 Jun 2007 14:40:17 -0400 Subject: [ofa-general] Re: [PATCH] opensm: remove unused state_step_mode In-Reply-To: <20070610223159.GB23029@sashak.voltaire.com> References: <20070610223159.GB23029@sashak.voltaire.com> Message-ID: <1181587216.8896.95583.camel@hal.voltaire.com> On Sun, 2007-06-10 at 18:31, Sasha Khapyorsky wrote: > This removes unused state_step_mode and associated flow from > osm_state_mgr_process(). > > Signed-off-by: Sasha Khapyorsky > --- > opensm/include/opensm/osm_base.h | 29 ------------ > opensm/include/opensm/osm_state_mgr.h | 2 - > opensm/opensm/osm_state_mgr.c | 81 +++------------------------------ Thanks. Applied. -- Hal From pradeeps at linux.vnet.ibm.com Mon Jun 11 11:44:37 2007 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Mon, 11 Jun 2007 11:44:37 -0700 Subject: [ofa-general] Re: IPOIB CM (NOSRQ)[PATCH V5] patch In-Reply-To: <20070610044945.GB4959@mellanox.co.il> References: <46687636.5050101@linux.vnet.ibm.com> <20070610044945.GB4959@mellanox.co.il> Message-ID: <466D9815.7030009@linux.vnet.ibm.com> Michael S. Tsirkin wrote: >> Quoting Pradeep Satyanarayana : >> Subject: IPOIB CM (NOSRQ)[PATCH V5] patch >> >> Here is a fifth version of the IPOIB_CM_NOSRQ patch. This patch will >> benefit adapters that do not support shared receive queues. >> >> This patch incorporates the following review comments and subsequent >> discussions on this mailing list from v4: >> >> 1. Reduce the number of if(srq) tests in the packet receive path > > I could still count at least 2 of these, and I don't see why there can't be just 1, > or even 0 if the QP pool is hidden under the SRQ interface. Yes, there are 2 of these now. Previously, only ipoib_poll() needed to be altered to incorporate this. Now I would have to add ipoib_drain_cq() as well. As previously mentioned we do need to keep in mind the maintainability aspects and the way it is, all the changes are well contained. Isn't it time that we should stop quibbling about one extra if(srq)? If you are so inclined you can submit a patch on top of this one. We can then debate the merits of that patch and make an appropriate decision. > >> +int current_rc_qp = 0; /* Active RC QPs for NOSRQ */ >> #define IPOIB_CM_IETF_ID 0x1000000000000000ULL >> >> #define IPOIB_CM_RX_UPDATE_TIME (256 * HZ) > > I don't see any locking for current_rc_qp, which looks wrong. Yes, I will correct that. Pradeep From vu at mellanox.com Mon Jun 11 12:13:01 2007 From: vu at mellanox.com (Vu Pham) Date: Mon, 11 Jun 2007 12:13:01 -0700 Subject: [ofa-general] OFED 1.x (Gen 2) based SRP target code released! In-Reply-To: <465AD2D1.2070100@voltaire.com> References: <9FA59C95FFCBB34EA5E42C1A8573784F6F91AB@mtiexch01.mti.com> <465AD2D1.2070100@voltaire.com> Message-ID: <466D9EBD.3090809@mellanox.com> Erez Zilber wrote: > Sujal Das wrote: > >> Hello all, >> >> >> >> Mellanox is pleased to release the OFED 1.x (Gen 2) - based SRP Target >> source code to the OpenFabrics community, OEMs and end users. >> >> >> >> This release is an upgrade to the previously released SRP Target source >> code that was based on the Mellanox IBGold driver and Gen 1 software >> interface. The code has been tested to work with Mellanox InfiniBand >> adapters and is available under Open Fabrics open source license terms. >> >> > I'm trying to build srpt according to the instructions, but it does not get built at all. Here's what I did: > > tar xzf OFED-1.2-rc3.tgz > cd OFED-1.2-rc3/SRPMS > rpm2cpio ofa_kernel-1.2-rc3.src.rpm |cpio -i > tar xzf ofa_kernel-1.2.tgz > cd ofa_kernel-1.2 > patch -p1 < ~/srpt_inc/add_srpt_01.patch > patch -p1 < ~/srpt_inc/add_srpt_03.patch > You forget to patch -p1 < ~/srpt_inc/add_srpt_04.patch -vu > cp -r ~/srpt drivers/infiniband/ulp/srpt > ./configure --with-core-mod --with-ipoib-mod --with-srp-target-mod --with-mthca-mod > From rdreier at cisco.com Mon Jun 11 14:45:20 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 11 Jun 2007 14:45:20 -0700 Subject: [ofa-general] Re: [PATCH} libmlx4: poll cq tail pointer In-Reply-To: <1181554019.12020.3.camel@mtls03> (Eli Cohen's message of "Mon, 11 Jun 2007 12:26:59 +0300") References: <1181554019.12020.3.camel@mtls03> Message-ID: thanks, applied, and also the following for the kernel: commit a4668873c7271c4b6c540d6c4e8891c24e8a1736 Author: Roland Dreier Date: Mon Jun 11 14:44:42 2007 -0700 IB/mlx4: Fix handling of wq->tail for send completions Cast the increment added to wq->tail when send completions are processed to u16 to avoid using wrong values caused by standard integer promotions. The same bug was fixed in libmlx4 by Eli Cohen . Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index b2a290c..048c527 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -354,7 +354,7 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq, if (is_send) { wq = &(*cur_qp)->sq; wqe_ctr = be16_to_cpu(cqe->wqe_index); - wq->tail += wqe_ctr - (u16) wq->tail; + wq->tail += (u16) (wqe_ctr - (u16) wq->tail); wc->wr_id = wq->wrid[wq->tail & (wq->max - 1)]; ++wq->tail; } else if ((*cur_qp)->ibqp.srq) { From rdreier at cisco.com Mon Jun 11 14:48:41 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 11 Jun 2007 14:48:41 -0700 Subject: [ofa-general] Re: [PATCH] libmlx4: fix problem in post_send error flow (inline wqes) In-Reply-To: <200706111809.51070.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Mon, 11 Jun 2007 18:09:50 +0300") References: <200706111809.51070.jackm@dev.mellanox.co.il> Message-ID: thanks, applied. From rdreier at cisco.com Mon Jun 11 15:24:06 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 11 Jun 2007 15:24:06 -0700 Subject: [ofa-general] Re: [PATCH] net/mlx4: include linux/mutex.h from mlx4.h In-Reply-To: <20070611060942.GE1454@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 11 Jun 2007 09:09:42 +0300") References: <20070611060942.GE1454@mellanox.co.il> Message-ID: thanks, applied both mutex patches to for-2.6.23 From rdreier at cisco.com Mon Jun 11 15:30:19 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 11 Jun 2007 15:30:19 -0700 Subject: [ofa-general] [RESEND #2] [GIT PULL] please pull infiniband.git In-Reply-To: (Roland Dreier's message of "Fri, 08 Jun 2007 07:22:24 -0700") References: Message-ID: [Sorry to keep bugging you but I haven't seen this pulled and you haven't told me that something is wrong with these patches... is this getting lost in your queue or are you dropping it intentionally?] Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get a bunch of fixes to the new mlx4 driver, and one fix for port assignment by the RDMA CM: Eli Cohen (1): mlx4_core: Fix CQ context layout Jack Morgenstein (2): mlx4_core: Don't set MTT address in dMPT entries with PA set IB/mlx4: Fix zeroing of rnr_retry value in ib_modify_qp() Roland Dreier (5): mlx4_core: Initialize ctx_list and ctx_lock earlier mlx4_core: Free catastrophic error MSI-X interrupt with correct dev_id IB/mthca, mlx4_core: Fix typo in comment mlx4_core: Check firmware command interface revision IB/mlx4: Make sure RQ allocation is always valid Sean Hefty (1): RDMA/cma: Fix initialization of next_port drivers/infiniband/core/cma.c | 4 +- drivers/infiniband/hw/mlx4/qp.c | 33 ++++++++++++++++++++---------- drivers/infiniband/hw/mthca/mthca_cmd.c | 2 +- drivers/net/mlx4/cq.c | 2 +- drivers/net/mlx4/eq.c | 4 ++- drivers/net/mlx4/fw.c | 27 ++++++++++++++++++++++-- drivers/net/mlx4/intf.c | 3 -- drivers/net/mlx4/main.c | 2 + drivers/net/mlx4/mr.c | 8 ++++-- 9 files changed, 60 insertions(+), 25 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 2eb52b7..32a0e66 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -2773,8 +2773,8 @@ static int cma_init(void) int ret; get_random_bytes(&next_port, sizeof next_port); - next_port = (next_port % (sysctl_local_port_range[1] - - sysctl_local_port_range[0])) + + next_port = ((unsigned int) next_port % + (sysctl_local_port_range[1] - sysctl_local_port_range[0])) + sysctl_local_port_range[0]; cma_wq = create_singlethread_workqueue("rdma_cm"); if (!cma_wq) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index dc137de..5c6d054 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -189,18 +189,28 @@ static int send_wqe_overhead(enum ib_qp_type type) } static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, - struct mlx4_ib_qp *qp) + int is_user, int has_srq, struct mlx4_ib_qp *qp) { /* Sanity check RQ size before proceeding */ if (cap->max_recv_wr > dev->dev->caps.max_wqes || cap->max_recv_sge > dev->dev->caps.max_rq_sg) return -EINVAL; - qp->rq.max = cap->max_recv_wr ? roundup_pow_of_two(cap->max_recv_wr) : 0; + if (has_srq) { + /* QPs attached to an SRQ should have no RQ */ + if (cap->max_recv_wr) + return -EINVAL; + + qp->rq.max = qp->rq.max_gs = 0; + } else { + /* HW requires >= 1 RQ entry with >= 1 gather entry */ + if (is_user && (!cap->max_recv_wr || !cap->max_recv_sge)) + return -EINVAL; - qp->rq.wqe_shift = ilog2(roundup_pow_of_two(cap->max_recv_sge * - sizeof (struct mlx4_wqe_data_seg))); - qp->rq.max_gs = (1 << qp->rq.wqe_shift) / sizeof (struct mlx4_wqe_data_seg); + qp->rq.max = roundup_pow_of_two(max(1, cap->max_recv_wr)); + qp->rq.max_gs = roundup_pow_of_two(max(1, cap->max_recv_sge)); + qp->rq.wqe_shift = ilog2(qp->rq.max_gs * sizeof (struct mlx4_wqe_data_seg)); + } cap->max_recv_wr = qp->rq.max; cap->max_recv_sge = qp->rq.max_gs; @@ -285,7 +295,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, qp->sq.head = 0; qp->sq.tail = 0; - err = set_rq_size(dev, &init_attr->cap, qp); + err = set_rq_size(dev, &init_attr->cap, !!pd->uobject, !!init_attr->srq, qp); if (err) goto err; @@ -762,11 +772,6 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, optpar |= MLX4_QP_OPTPAR_PKEY_INDEX; } - if (attr_mask & IB_QP_RNR_RETRY) { - context->params1 |= cpu_to_be32(attr->rnr_retry << 13); - optpar |= MLX4_QP_OPTPAR_RNR_RETRY; - } - if (attr_mask & IB_QP_AV) { if (mlx4_set_path(dev, &attr->ah_attr, &context->pri_path, attr_mask & IB_QP_PORT ? attr->port_num : qp->port)) { @@ -802,6 +807,12 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, context->pd = cpu_to_be32(to_mpd(ibqp->pd)->pdn); context->params1 = cpu_to_be32(MLX4_IB_ACK_REQ_FREQ << 28); + + if (attr_mask & IB_QP_RNR_RETRY) { + context->params1 |= cpu_to_be32(attr->rnr_retry << 13); + optpar |= MLX4_QP_OPTPAR_RNR_RETRY; + } + if (attr_mask & IB_QP_RETRY_CNT) { context->params1 |= cpu_to_be32(attr->retry_cnt << 16); optpar |= MLX4_QP_OPTPAR_RETRY_COUNT; diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c index 3810252..f40558d 100644 --- a/drivers/infiniband/hw/mthca/mthca_cmd.c +++ b/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -772,7 +772,7 @@ int mthca_QUERY_FW(struct mthca_dev *dev, u8 *status) MTHCA_GET(dev->fw_ver, outbox, QUERY_FW_VER_OFFSET); /* - * FW subminor version is at more signifant bits than minor + * FW subminor version is at more significant bits than minor * version, so swap here. */ dev->fw_ver = (dev->fw_ver & 0xffff00000000ull) | diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c index 437d78a..39253d0 100644 --- a/drivers/net/mlx4/cq.c +++ b/drivers/net/mlx4/cq.c @@ -61,7 +61,7 @@ struct mlx4_cq_context { __be32 solicit_producer_index; __be32 consumer_index; __be32 producer_index; - u8 reserved6[2]; + u32 reserved6[2]; __be64 db_rec_addr; }; diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c index 0f11adb..27a82ce 100644 --- a/drivers/net/mlx4/eq.c +++ b/drivers/net/mlx4/eq.c @@ -490,9 +490,11 @@ static void mlx4_free_irqs(struct mlx4_dev *dev) if (eq_table->have_irq) free_irq(dev->pdev->irq, dev); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < MLX4_EQ_CATAS; ++i) if (eq_table->eq[i].have_irq) free_irq(eq_table->eq[i].irq, eq_table->eq + i); + if (eq_table->eq[MLX4_EQ_CATAS].have_irq) + free_irq(eq_table->eq[MLX4_EQ_CATAS].irq, dev); } static int __devinit mlx4_map_clr_int(struct mlx4_dev *dev) diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c index cfa5cc0..e7ca118 100644 --- a/drivers/net/mlx4/fw.c +++ b/drivers/net/mlx4/fw.c @@ -37,6 +37,10 @@ #include "fw.h" #include "icm.h" +enum { + MLX4_COMMAND_INTERFACE_REV = 1 +}; + extern void __buggy_use_of_MLX4_GET(void); extern void __buggy_use_of_MLX4_PUT(void); @@ -452,10 +456,12 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev) u32 *outbox; int err = 0; u64 fw_ver; + u16 cmd_if_rev; u8 lg; #define QUERY_FW_OUT_SIZE 0x100 #define QUERY_FW_VER_OFFSET 0x00 +#define QUERY_FW_CMD_IF_REV_OFFSET 0x0a #define QUERY_FW_MAX_CMD_OFFSET 0x0f #define QUERY_FW_ERR_START_OFFSET 0x30 #define QUERY_FW_ERR_SIZE_OFFSET 0x38 @@ -477,21 +483,36 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev) MLX4_GET(fw_ver, outbox, QUERY_FW_VER_OFFSET); /* - * FW subminor version is at more signifant bits than minor + * FW subminor version is at more significant bits than minor * version, so swap here. */ dev->caps.fw_ver = (fw_ver & 0xffff00000000ull) | ((fw_ver & 0xffff0000ull) >> 16) | ((fw_ver & 0x0000ffffull) << 16); + MLX4_GET(cmd_if_rev, outbox, QUERY_FW_CMD_IF_REV_OFFSET); + if (cmd_if_rev != MLX4_COMMAND_INTERFACE_REV) { + mlx4_err(dev, "Installed FW has unsupported " + "command interface revision %d.\n", + cmd_if_rev); + mlx4_err(dev, "(Installed FW version is %d.%d.%03d)\n", + (int) (dev->caps.fw_ver >> 32), + (int) (dev->caps.fw_ver >> 16) & 0xffff, + (int) dev->caps.fw_ver & 0xffff); + mlx4_err(dev, "This driver version supports only revision %d.\n", + MLX4_COMMAND_INTERFACE_REV); + err = -ENODEV; + goto out; + } + MLX4_GET(lg, outbox, QUERY_FW_MAX_CMD_OFFSET); cmd->max_cmds = 1 << lg; - mlx4_dbg(dev, "FW version %d.%d.%03d, max commands %d\n", + mlx4_dbg(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n", (int) (dev->caps.fw_ver >> 32), (int) (dev->caps.fw_ver >> 16) & 0xffff, (int) dev->caps.fw_ver & 0xffff, - cmd->max_cmds); + cmd_if_rev, cmd->max_cmds); MLX4_GET(fw->catas_offset, outbox, QUERY_FW_ERR_START_OFFSET); MLX4_GET(fw->catas_size, outbox, QUERY_FW_ERR_SIZE_OFFSET); diff --git a/drivers/net/mlx4/intf.c b/drivers/net/mlx4/intf.c index 65854f9..9ae951b 100644 --- a/drivers/net/mlx4/intf.c +++ b/drivers/net/mlx4/intf.c @@ -135,9 +135,6 @@ int mlx4_register_device(struct mlx4_dev *dev) struct mlx4_priv *priv = mlx4_priv(dev); struct mlx4_interface *intf; - INIT_LIST_HEAD(&priv->ctx_list); - spin_lock_init(&priv->ctx_lock); - mutex_lock(&intf_mutex); list_add_tail(&priv->dev_list, &dev_list); diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 20b8c0d..d417293 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -787,6 +787,8 @@ static int __devinit mlx4_init_one(struct pci_dev *pdev, dev = &priv->dev; dev->pdev = pdev; + INIT_LIST_HEAD(&priv->ctx_list); + spin_lock_init(&priv->ctx_lock); /* * Now reset the HCA before we touch the PCI capabilities or diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c index b33864d..d0808fa 100644 --- a/drivers/net/mlx4/mr.c +++ b/drivers/net/mlx4/mr.c @@ -324,15 +324,17 @@ int mlx4_mr_enable(struct mlx4_dev *dev, struct mlx4_mr *mr) MLX4_MPT_FLAG_MIO | MLX4_MPT_FLAG_REGION | mr->access); - if (mr->mtt.order < 0) - mpt_entry->flags |= cpu_to_be32(MLX4_MPT_FLAG_PHYSICAL); mpt_entry->key = cpu_to_be32(key_to_hw_index(mr->key)); mpt_entry->pd = cpu_to_be32(mr->pd); mpt_entry->start = cpu_to_be64(mr->iova); mpt_entry->length = cpu_to_be64(mr->size); mpt_entry->entity_size = cpu_to_be32(mr->mtt.page_shift); - mpt_entry->mtt_seg = cpu_to_be64(mlx4_mtt_addr(dev, &mr->mtt)); + if (mr->mtt.order < 0) { + mpt_entry->flags |= cpu_to_be32(MLX4_MPT_FLAG_PHYSICAL); + mpt_entry->mtt_seg = 0; + } else + mpt_entry->mtt_seg = cpu_to_be64(mlx4_mtt_addr(dev, &mr->mtt)); err = mlx4_SW2HW_MPT(dev, mailbox, key_to_hw_index(mr->key) & (dev->caps.num_mpts - 1)); From halr at voltaire.com Mon Jun 11 15:59:53 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 11 Jun 2007 18:59:53 -0400 Subject: [ofa-general] Re: [PATCH] osm: up/dn ranking - making code more intuitive In-Reply-To: <466D58F2.1020402@dev.mellanox.co.il> References: <46503064.7010107@dev.mellanox.co.il> <20070520161034.GY19271@sashak.voltaire.com> <4651557E.2080400@dev.mellanox.co.il> <20070524225428.GK837@sashak.voltaire.com> <466D58F2.1020402@dev.mellanox.co.il> Message-ID: <1181602792.5681.1081.camel@hal.voltaire.com> Hi Yevgeny, On Mon, 2007-06-11 at 10:15, Yevgeny Kliteynik wrote: > Hi Hal. > > Following up our discussion with Sasha regarding the ranking > optimization in up/dn routing: > > >> I do think that to make the code more "intuitive" we might > >> want to remove the __updn_update_rank() and do something like this: > >> > >> if (remote_u->rank > u->rank + 1) > >> { > >> remote_u->rank = u->rank + 1; > >> max_rank = remote_u->rank; > >> cl_qlist_insert_tail(&list, &remote_u->list); > >> } > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied. -- Hal From tziporet at dev.mellanox.co.il Mon Jun 11 23:47:04 2007 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 12 Jun 2007 09:47:04 +0300 Subject: [ofa-general] [GIT PULL] OFED 1.2 uDAPL release notes In-Reply-To: <466718AB.5050507@ichips.intel.com> References: <466718AB.5050507@ichips.intel.com> Message-ID: <466E4168.2030206@mellanox.co.il> Arlin Davis wrote: > Vlad, please pull the latest OFED 1.2 release notes from uDAPL > project (ofed_1_2 branch) > > dapl/doc/uDAPL_release_notes.txt > > Signed-off by: Arlin Davis ardavis at ichips.intel.com > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > done Tziporet From erezz at voltaire.com Tue Jun 12 00:27:20 2007 From: erezz at voltaire.com (Erez Zilber) Date: Tue, 12 Jun 2007 10:27:20 +0300 Subject: [ofa-general] OFED 1.x (Gen 2) based SRP target code released! In-Reply-To: <466D9EBD.3090809@mellanox.com> References: <9FA59C95FFCBB34EA5E42C1A8573784F6F91AB@mtiexch01.mti.com> <465AD2D1.2070100@voltaire.com> <466D9EBD.3090809@mellanox.com> Message-ID: <466E4AD8.6090804@voltaire.com> >>> >> I'm trying to build srpt according to the instructions, but it does >> not get built at all. Here's what I did: >> >> tar xzf OFED-1.2-rc3.tgz >> cd OFED-1.2-rc3/SRPMS >> rpm2cpio ofa_kernel-1.2-rc3.src.rpm |cpio -i >> tar xzf ofa_kernel-1.2.tgz >> cd ofa_kernel-1.2 >> patch -p1 < ~/srpt_inc/add_srpt_01.patch >> patch -p1 < ~/srpt_inc/add_srpt_03.patch >> > > You forget to > patch -p1 < ~/srpt_inc/add_srpt_04.patch > > -vu You may want to add it to the README file (it is not mentioned there). Is it documented anywhere in openfabrics wiki? Erez From tziporet at mellanox.co.il Tue Jun 12 01:12:58 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 12 Jun 2007 11:12:58 +0300 Subject: [ofa-general] RE: [ewg] New OMPI / MPI_READ release notes patch In-Reply-To: References: Message-ID: <6C2C79E72C305246B504CBA17B5500C9015635E9@mtlexch01.mtl.com> Done Tziporet -----Original Message----- From: ewg-bounces at lists.openfabrics.org [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Jeff Squyres Sent: Monday, June 11, 2007 7:53 PM To: OpenFabrics General Cc: OpenFabrics EWG Subject: [ewg] New OMPI / MPI_READ release notes patch Tziporet -- Here's a new patch for the OMPI release notes based on your current git. It includes updated information for Open MPI and text about mpi- selector. Note that there are a few areas in MPI_README that I need OSU and Mellanox to proofread. It would also be nice if someone else could eyeball the mpi-selector text and ensure it makes sense to a naive reader. -- Jeff Squyres Cisco Systems From mst at dev.mellanox.co.il Tue Jun 12 01:41:08 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 12 Jun 2007 11:41:08 +0300 Subject: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits Message-ID: <20070612084108.GK6470@mellanox.co.il> For whom it may concern, I have created an ofed git tree updated with kernel bits from 2.6.22-rc4, and put that up at git://git.openfabrics.org/~mst/ofed_kernel.git It may be useful to anyone interested in testing 2.6.22-rc4 technology (such as mlx4) on older kernels, testing SDP with 2.6.22-rc4 bits, etc. This tree also might (or might not) become a basis for kernel bits for future ofed kernel releases. This tree was test-built with ofa cross-build script and builds on as wide the range of kernels as OFED 1.2 did. No testing was done as yet. Erez, and other iser maintainers, I had a problem with RHEL4 iscsi backports (scsi_flush_work isn't exported) I decided that since it isn't called on older kernels it's reasonably safe to just comment it out, but would be interested to hear you opinion. See it in this sub-directory: kernel_patches/backport/2.6.9_U2/libiscsi_no_flush_to_2_6_9.patch I went over patches in kernel_patches/fixes/ and tried to remove only these that were already applied, and update these that weren't. But I'd like to ask all relevant parties to double-check nothing that should be there is missing (and, hint hint, maybe think about pushing the patches upstream). In particular, there were a ton of ipath patches that it seems were for the most part applied. Qlogic maintainers, please help double check that I did not miss something of value. -- MST From vlad at lists.openfabrics.org Tue Jun 12 02:41:41 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Tue, 12 Jun 2007 02:41:41 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070612-0200 daily build status Message-ID: <20070612094141.E899DE60882@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.15 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.15 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on ia64 with linux-2.6.13 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.14 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.17 Passed on powerpc with linux-2.6.13 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.16 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.13 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.14 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From jsquyres at cisco.com Tue Jun 12 04:57:03 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 12 Jun 2007 07:57:03 -0400 Subject: [ofa-general] Re: [ewg] New OMPI / MPI_READ release notes patch In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9015635E9@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C9015635E9@mtlexch01.mtl.com> Message-ID: Note that git still shows the following in the ofed_1_2 branch: Example1: Running the OSU bandwidth: !!! SOMEONE PLEASE CHECK THESE DIRECTORIES AND EXECUTABLE NAMES > cd /usr/mpi/gcc/openmpi-1.2.2-1/tests/osu_benchmarks-2.2 > mpirun -np -hostfile osu_bw Example2: Running the Intel MPI Benchmark benchmarks: !!! SOMEONE PLEASE CHECK THESE DIRECTORIES AND EXECUTABLE NAMES > cd /usr/mpi/gcc/openmpi-1.2.2-1/tests/IMB-2.3 > mpirun -np -hostfile IMB-MPI1 Example3: Running the Presta benchmarks: !!! SOMEONE PLEASE CHECK THESE DIRECTORIES AND EXECUTABLE NAMES > cd /usr/mpi/gcc/openmpi-1.2.2-1/tests/presta-1.4.0 > mpirun -np -hostfile com -o 100 On Jun 12, 2007, at 4:12 AM, Tziporet Koren wrote: > Done > Tziporet > > -----Original Message----- > From: ewg-bounces at lists.openfabrics.org > [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Jeff Squyres > Sent: Monday, June 11, 2007 7:53 PM > To: OpenFabrics General > Cc: OpenFabrics EWG > Subject: [ewg] New OMPI / MPI_READ release notes patch > > Tziporet -- > > Here's a new patch for the OMPI release notes based on your current > git. It includes updated information for Open MPI and text about mpi- > selector. > > Note that there are a few areas in MPI_README that I need OSU and > Mellanox to proofread. It would also be nice if someone else could > eyeball the mpi-selector text and ensure it makes sense to a naive > reader. > > -- > Jeff Squyres > Cisco Systems -- Jeff Squyres Cisco Systems From arthur.jones at qlogic.com Tue Jun 12 09:27:03 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 12 Jun 2007 09:27:03 -0700 Subject: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits In-Reply-To: <20070612084108.GK6470@mellanox.co.il> References: <20070612084108.GK6470@mellanox.co.il> Message-ID: <20070612162703.GA26197@bauxite.pathscale.com> hi michael, ... On Tue, Jun 12, 2007 at 11:41:08AM +0300, Michael S. Tsirkin wrote: > [...] > In particular, there were a ton of ipath patches that it seems were > for the most part applied. > Qlogic maintainers, please help double check that I did not miss something > of value. we've amassed a boatload of patches that are due to go to roland soon. it's prob best if we have a look at your repo once these patches are integrated... arthur From swise at opengridcomputing.com Tue Jun 12 09:48:12 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 12 Jun 2007 11:48:12 -0500 Subject: [ofa-general] IB and iWarp HCA in same node In-Reply-To: <46674722.6090302@ucla.edu> References: <46674722.6090302@ucla.edu> Message-ID: <466ECE4C.1080106@opengridcomputing.com> Scott A. Friedman wrote: > I have a working IB cluster where I have added a Chelsio iWarp card to > one node. Another node is connected to that with only an identical iWarp > card. I cannot seem to get the iWarp cards to come up. They work through > regular ethernet just fineand the IB stuff still works as well. But, > when I modprobe iw_cxgb3 and iw_cm utilities like ibstat show the > following. Which explains why nothing is working. > > Question is, why? Am I missing or forgetting something? I just want to > test the two iWarp cards back to back. Not trying to get some kind of > auto bridging or routing working. > > # ibstat > iWARP RNIC 'cxgb3_0' > iWARP RNIC type: cxgb3 > Number of ports: 1 > Firmware version: T 4.0.0 > Hardware version: 1 > Node GUID: 0x0007430506ea0000 > System image GUID: 0x0007430506ea0000 > Port 1: > State: Active > Physical state: No state change > Rate: 20 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x009f0000 > Port GUID: 0x0000000000000000 This all looks normal. What application are you trying to run over rdma on the chelsio interface? rping? From swise at opengridcomputing.com Tue Jun 12 09:51:03 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 12 Jun 2007 11:51:03 -0500 Subject: [ofa-general] problem with mvapich2 over iwarp In-Reply-To: <20070607180437.GD16228@osc.edu> References: <466042AE.4000006@opengridcomputing.com> <20070607180437.GD16228@osc.edu> Message-ID: <466ECEF7.3080504@opengridcomputing.com> Pete Wyckoff wrote: > swise at opengridcomputing.com wrote on Fri, 01 Jun 2007 11:00 -0500: >> I'm helping a customer who is trying to run mvapich2 over chelsio's >> rnic. They're running a simple program that does an mpi init, 1000 >> barriers, then a finalize. They're using ofed-1.2-rc3, mpiexec-0.82, >> and mvapich2-0.9.8-p2 (not the mvapich2 from the ofed kit). Also they >> aren't using mpd to start up stuff. They're using pmi I guess (I'm not >> sure what pmi is, but the mpiexec has -comm=pmi. BTW: I can run the >> same program fine on my 8 node cluster using mpd and the ofa mvapich2 code. > > Hey Steve. The "customer" contacted me about helping with the > mpiexec aspects of things, assuming we're talking about the same > people. It's just an alternative to the MPD startup program, but > uses the same PMI mechanisms under the hood as does MPD. And it's a > much better way to launch parallel jobs, but I'm biased since I > wrote it. :) > > The hang in rdma_destroy_id() that you describe, does it happen for > both both mpd and mpiexec startup? > > I doubt that the mpiexec issue would matter, but frequently tell > people to try it using straight mpirun just to make sure. The PMI > protocol under the hood is just a way for processes to exchange > data---mpiexec doesn't know anything about MPI itself or iwarp, it > just moves the information around. So we generally don't see any > problems with starting up mpich2 programs on all sorts of weird > hardware. > > Offering to help if you have any more information. I've asked for > them to send me debug logs of the mpd and mpiexec startups, but > don't have an account on their machine yet. > > -- Pete Thanks Pete. I've been out of town until today. I think they have it working. I believe the bug they saw was in an older version of mvapich2 that Sundeep fixed a while back. After rebuilding and re-installing, they don't seem to hit it anymore. The symptoms definitely seemed like the previous bug he fixed. Anyway, thanks for helping and explaining mpiexec. I'll hollar if anything else comes up. Steve. From pradeeps at linux.vnet.ibm.com Tue Jun 12 10:11:52 2007 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Tue, 12 Jun 2007 10:11:52 -0700 Subject: [ofa-general] Re: IPOIB CM (NOSRQ) extension In-Reply-To: <20070611181849.GF6470@mellanox.co.il> References: <46687642.8040208@linux.vnet.ibm.com> <20070610044146.GA4959@mellanox.co.il> <466D8FAF.5090800@linux.vnet.ibm.com> <20070611181804.GE6470@mellanox.co.il> <20070611181849.GF6470@mellanox.co.il> Message-ID: <466ED3D8.7000607@linux.vnet.ibm.com> Michael S. Tsirkin wrote: >> Quoting Michael S. Tsirkin : >> Subject: Re: IPOIB CM (NOSRQ) extension >> >>> Quoting Pradeep Satyanarayana : >>> Subject: Re: IPOIB CM (NOSRQ) extension >>> >>> Michael S. Tsirkin wrote: >>>>> Quoting Pradeep Satyanarayana : >>>>> Subject: IPOIB CM (NOSRQ) extension >>>>> >>>>> This patch handles the corner case of running out of RC QPs. In that >>>>> case it switches to UD mode. This patch can be used both by NOSRQ and >>>>> SRQ code. >>>>> >>>>> Signed-off-by: Pradeep Satyanarayana >>>> You don't provide any way to retry going back to connected mode, >>>> after a failure, which is really intermittent by nature. That's pretty bad. >>> This node switched to datagram mode, because the passive side was >>> under a resource crunch (no RC QPs). And, the user is indeed alerted >>> about this condition. So, yes we do not attempt to go back to connected >>> mode. >> Need to retry switching to datagram mode after a while. > > Sorry, that should have been "switching to connected mode". So, you are suggesting that we ping-pong between datagram mode and connected mode. In the first place I was opposed to just switching to datagram mode when there are no RC QPs. This suggestion goes even further. We seem to have polar opposite view points on this issue. And rather than simply persisting with our viewpoints we need to back that up with more concrete reasoning. The reason I disagree with this approach is for the following reasons: 1) This switch to datagram mode happens when we are in a resource crunch kind of situation. The resource crunch should be flagged and corrective action needs to be taken. Switching to datagram mode simply prolongs the agony. 2) Ping-Ponging between connected mode and datagram mode makes the situation even worse. In HPC environments cluster nodes simply do not appear and disappear. They continue to stay on (in the cluster). So, trying to switch to connected mode does not achieve any purpose. Can you tell me why "switching to connected mode" is a must? Pradeep From swise at opengridcomputing.com Tue Jun 12 10:21:45 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 12 Jun 2007 12:21:45 -0500 Subject: [ofa-general] problem with mvapich2 over iwarp In-Reply-To: <466ECEF7.3080504@opengridcomputing.com> References: <466042AE.4000006@opengridcomputing.com> <20070607180437.GD16228@osc.edu> <466ECEF7.3080504@opengridcomputing.com> Message-ID: <466ED629.20208@opengridcomputing.com> Steve Wise wrote: > Pete Wyckoff wrote: >> swise at opengridcomputing.com wrote on Fri, 01 Jun 2007 11:00 -0500: >>> I'm helping a customer who is trying to run mvapich2 over chelsio's >>> rnic. They're running a simple program that does an mpi init, 1000 >>> barriers, then a finalize. They're using ofed-1.2-rc3, mpiexec-0.82, >>> and mvapich2-0.9.8-p2 (not the mvapich2 from the ofed kit). Also >>> they aren't using mpd to start up stuff. They're using pmi I guess >>> (I'm not sure what pmi is, but the mpiexec has -comm=pmi. BTW: I can >>> run the same program fine on my 8 node cluster using mpd and the ofa >>> mvapich2 code. >> >> Hey Steve. The "customer" contacted me about helping with the >> mpiexec aspects of things, assuming we're talking about the same >> people. It's just an alternative to the MPD startup program, but >> uses the same PMI mechanisms under the hood as does MPD. And it's a >> much better way to launch parallel jobs, but I'm biased since I >> wrote it. :) >> >> The hang in rdma_destroy_id() that you describe, does it happen for >> both both mpd and mpiexec startup? >> >> I doubt that the mpiexec issue would matter, but frequently tell >> people to try it using straight mpirun just to make sure. The PMI >> protocol under the hood is just a way for processes to exchange >> data---mpiexec doesn't know anything about MPI itself or iwarp, it >> just moves the information around. So we generally don't see any >> problems with starting up mpich2 programs on all sorts of weird >> hardware. >> >> Offering to help if you have any more information. I've asked for >> them to send me debug logs of the mpd and mpiexec startups, but >> don't have an account on their machine yet. >> >> -- Pete > > Thanks Pete. > > I've been out of town until today. I think they have it working. I > believe the bug they saw was in an older version of mvapich2 that > Sundeep fixed a while back. After rebuilding and re-installing, they > don't seem to hit it anymore. The symptoms definitely seemed like the > previous bug he fixed. > > Anyway, thanks for helping and explaining mpiexec. I'll hollar if > anything else comes up. > > Steve. Ignore this last reply. I hadn't caught up on my email for that issue and I think maybe there are still problems with all this. Steve. From sean.hefty at intel.com Tue Jun 12 11:03:24 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 12 Jun 2007 11:03:24 -0700 Subject: [ofa-general] crash in ipoib Message-ID: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> Copying ofa general list. We've seen a crash similar to this now a total of 4 times. These are x64, 2.6.9-42.EL. The crashes only seem to occur on a specific set of systems in our cluster. The latest crash has a similar stack trace as the one listed below. badness in 18042_panic_blink drivers/input/serio/18042.c : 992 18042_panic_blink + 485 panic + 445 apic_timer_interrupt + 133 oops_end + 38 oops_end + 65 do_page_fault + 1204 ipoib_cm_send + 433 error_exit ipoib_ib_completion + 0 ipoib_cm_handle_rx_wc + 239 (the trace goes on and on) - Sean >>No known issues with IPoIB. Can you send the command line and all >>details on the machine you work. >>Also - do you have the oops printout > >Woody will need to provide details on the machine. Here's what's available >from the oops printout: (might not be related to ipoib or cm) > >(top portion is cut off) >badness in 18042_panic_blink drivers/input/serio/18042.c : 992 >18042_panic_blink + 485 >panic + 445 >apic_timer_interrupt + 133 >oops_end + 38 >oops_end + 65 >do_page_fault + 1204 >error_exit >ipoib_ib_completion >ipoib_cm_handle_rx_wc + 378 >ipoib_ib_completion + 144 >usb_hcd_irq >mthca_eq_int + 221 >ret_from_intr >mthca_tavor_interrupt + 95 >handle_IRQ_event >do_IRQ >ret_from_intr >csum_partial + 725 >skb_checksum + 308 >ip_conntrack:tcp_error + 312 >ip_conntrack_in + 163 >try_to_wake_up + 876 >nf_iterate + 82 >ip_rcv_finish >ip_rcv + 1119 >net1f_receive_sck + 791 >process_backlog + 136 >net_rx_action >do_softirq >do_IRQ >ret_from_intr >spin_unlock_irqrestore >ib_send_cm_rep >ib_ipoib_cm_rx_handler >cm_alloc_msg >ib_send_cm_rtu >ipoib_cm_rx_event_handler >ib_find_cached_pkey >cm_process_work >cm_req_handler >cm_work_handler >cm_work_handler >worker_thread >blah blah blah From pradeeps at linux.vnet.ibm.com Tue Jun 12 11:04:17 2007 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Tue, 12 Jun 2007 11:04:17 -0700 Subject: [ofa-general] IPOB CM (NOSRQ) [PATCH V6] patch Message-ID: <466EE021.30302@linux.vnet.ibm.com> Here is a sixth version of the IPOIB_CM_NOSRQ patch. This patch will benefit adapters that do not support shared receive queues. Changes from V4: 1. Eliminated some redundant #defines and corrected printk 2. Introduced missing spinlock. This patch has been tested with linux-2.6.22-rc4 derived from Roland's for-2.6.23 git tree on 06/11 on ppc64 machines Signed-off-by: Pradeep Satyanarayana --- --- a/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib.h 2007-05-30 14:56:25.000000000 -0400 +++ b/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib.h 2007-06-11 19:24:24.000000000 -0400 @@ -95,11 +95,16 @@ enum { IPOIB_MCAST_FLAG_ATTACHED = 3, }; +#define CM_PACKET_SIZE (1ul << 16) #define IPOIB_OP_RECV (1ul << 31) #ifdef CONFIG_INFINIBAND_IPOIB_CM -#define IPOIB_CM_OP_SRQ (1ul << 30) +#define IPOIB_CM_OP_RECV (1ul << 30) + +#define NOSRQ_INDEX_TABLE_SIZE 128 +#define NOSRQ_INDEX_MASK (NOSRQ_INDEX_TABLE_SIZE -1) + #else -#define IPOIB_CM_OP_SRQ (0) +#define IPOIB_CM_OP_RECV (0) #endif /* structs */ @@ -166,11 +171,14 @@ enum ipoib_cm_state { }; struct ipoib_cm_rx { - struct ib_cm_id *id; - struct ib_qp *qp; - struct list_head list; - struct net_device *dev; - unsigned long jiffies; + struct ib_cm_id *id; + struct ib_qp *qp; + struct ipoib_cm_rx_buf *rx_ring; /* Used by NOSRQ only */ + struct list_head list; + struct net_device *dev; + unsigned long jiffies; + u32 index; /* wr_ids are distinguished by index + * to identify the QP -NOSRQ only */ enum ipoib_cm_state state; }; @@ -215,6 +223,8 @@ struct ipoib_cm_dev_priv { struct ib_wc ibwc[IPOIB_NUM_WC]; struct ib_sge rx_sge[IPOIB_CM_RX_SG]; struct ib_recv_wr rx_wr; + struct ipoib_cm_rx **rx_index_table; /* See ipoib_cm_dev_init() + *for usage of this element */ }; /* @@ -564,10 +574,9 @@ static inline void ipoib_cm_skb_too_long dev_kfree_skb_any(skb); } -static inline void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) +void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) { } - #endif #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG --- a/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-05-30 14:56:25.000000000 -0400 +++ b/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-06-11 19:36:32.000000000 -0400 @@ -49,6 +49,20 @@ MODULE_PARM_DESC(cm_data_debug_level, #include "ipoib.h" +int max_rc_qp = NOSRQ_INDEX_TABLE_SIZE; +int max_recv_buf = 1024; /* Default is 1024 MB */ + +module_param_named(nosrq_max_rc_qp, max_rc_qp, int, 0644); +MODULE_PARM_DESC(nosrq_max_rc_qp, "Max number of NOSRQ RC QPs supported"); + +module_param_named(max_recieve_buffer, max_recv_buf, int, 0644); +MODULE_PARM_DESC(max_recieve_buffer, "Max Recieve Buffer Size in MB"); + +struct ipoib_cm_nosrq_count { + spinlock_t lock; + int current_rc_qp; /* Active number of RC QPs for NOSRQ */ +} nosrq_count; + #define IPOIB_CM_IETF_ID 0x1000000000000000ULL #define IPOIB_CM_RX_UPDATE_TIME (256 * HZ) @@ -88,20 +102,20 @@ static void ipoib_cm_dma_unmap_rx(struct ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE); } -static int ipoib_cm_post_receive(struct net_device *dev, int id) +static int post_receive_srq(struct net_device *dev, u64 id) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_recv_wr *bad_wr; int i, ret; - priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ; + priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_RECV; for (i = 0; i < IPOIB_CM_RX_SG; ++i) priv->cm.rx_sge[i].addr = priv->cm.srq_ring[id].mapping[i]; ret = ib_post_srq_recv(priv->cm.srq, &priv->cm.rx_wr, &bad_wr); if (unlikely(ret)) { - ipoib_warn(priv, "post srq failed for buf %d (%d)\n", id, ret); + ipoib_warn(priv, "post srq failed for buf %ld (%d)\n", id, ret); ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, priv->cm.srq_ring[id].mapping); dev_kfree_skb_any(priv->cm.srq_ring[id].skb); @@ -111,12 +125,47 @@ static int ipoib_cm_post_receive(struct return ret; } -static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, int frags, +static int post_receive_nosrq(struct net_device *dev, u64 id) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_recv_wr *bad_wr; + int i, ret; + u32 index; + u32 wr_id; + struct ipoib_cm_rx *rx_ptr; + + index = id & NOSRQ_INDEX_MASK ; + wr_id = id >> 32; + + rx_ptr = priv->cm.rx_index_table[index]; + + priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_RECV; + + for (i = 0; i < IPOIB_CM_RX_SG; ++i) + priv->cm.rx_sge[i].addr = rx_ptr->rx_ring[wr_id].mapping[i]; + + ret = ib_post_recv(rx_ptr->qp, &priv->cm.rx_wr, &bad_wr); + if (unlikely(ret)) { + ipoib_warn(priv, "post recv failed for buf %d (%d)\n", + wr_id, ret); + ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1, + rx_ptr->rx_ring[wr_id].mapping); + dev_kfree_skb_any(rx_ptr->rx_ring[wr_id].skb); + rx_ptr->rx_ring[wr_id].skb = NULL; + } + + return ret; +} + +static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, u64 id, + int frags, u64 mapping[IPOIB_CM_RX_SG]) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct sk_buff *skb; int i; + struct ipoib_cm_rx *rx_ptr; + u32 index, wr_id; skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12); if (unlikely(!skb)) @@ -148,7 +197,14 @@ static struct sk_buff *ipoib_cm_alloc_rx goto partial_error; } - priv->cm.srq_ring[id].skb = skb; + if (priv->cm.srq) + priv->cm.srq_ring[id].skb = skb; + else { + index = id & NOSRQ_INDEX_MASK ; + wr_id = id >> 32; + rx_ptr = priv->cm.rx_index_table[index]; + rx_ptr->rx_ring[wr_id].skb = skb; + } return skb; partial_error: @@ -205,16 +261,21 @@ static struct ib_qp *ipoib_cm_create_rx_ { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_init_attr attr = { - .event_handler = ipoib_cm_rx_event_handler, .send_cq = priv->cq, /* For drain WR */ .recv_cq = priv->cq, .srq = priv->cm.srq, .cap.max_send_wr = 1, /* For drain WR */ + .cap.max_recv_wr = ipoib_recvq_size + 1, .cap.max_send_sge = 1, /* FIXME: 0 Seems not to work */ .sq_sig_type = IB_SIGNAL_ALL_WR, .qp_type = IB_QPT_RC, .qp_context = p, }; + if (!priv->cm.srq) { + attr.cap.max_recv_sge = IPOIB_CM_RX_SG; + attr.event_handler = NULL; + } else + attr.event_handler = ipoib_cm_rx_event_handler; return ib_create_qp(priv->pd, &attr); } @@ -289,12 +350,120 @@ static int ipoib_cm_send_rep(struct net_ rep.flow_control = 0; rep.rnr_retry_count = req->rnr_retry_count; rep.target_ack_delay = 20; /* FIXME */ - rep.srq = 1; rep.qp_num = qp->qp_num; rep.starting_psn = psn; + rep.srq = !!priv->cm.srq; return ib_send_cm_rep(cm_id, &rep); } +static void init_context_and_add_list(struct ib_cm_id *cm_id, + struct ipoib_cm_rx *p, + struct ipoib_dev_priv *priv) +{ + cm_id->context = p; + p->jiffies = jiffies; + spin_lock_irq(&priv->lock); + if (list_empty(&priv->cm.passive_ids)) + queue_delayed_work(ipoib_workqueue, + &priv->cm.stale_task, IPOIB_CM_RX_DELAY); + list_add(&p->list, &priv->cm.passive_ids); + spin_unlock_irq(&priv->lock); +} + +static int allocate_and_post_rbuf_nosrq(struct ib_cm_id *cm_id, + struct ipoib_cm_rx *p, unsigned psn) +{ + struct net_device *dev = cm_id->context; + struct ipoib_dev_priv *priv = netdev_priv(dev); + int ret; + u32 qp_num, index; + u64 i, recv_mem_used; + + qp_num = p->qp->qp_num; + + /* In the SRQ case there is a common rx buffer called the srq_ring. + * However, for the NOSRQ we create an rx_ring for every + * struct ipoib_cm_rx. + */ + p->rx_ring = kzalloc(ipoib_recvq_size * sizeof *p->rx_ring, GFP_KERNEL); + if (!p->rx_ring) { + printk(KERN_WARNING "Failed to allocate rx_ring for 0x%x\n", + qp_num); + return -ENOMEM; + } + + init_context_and_add_list(cm_id, p, priv); + spin_lock_irq(&priv->lock); + + for (index = 0; index < max_rc_qp; index++) + if (priv->cm.rx_index_table[index] == NULL) + break; + + spin_lock(&nosrq_count.lock); + recv_mem_used = (u64)ipoib_recvq_size * (u64)nosrq_count.current_rc_qp + * CM_PACKET_SIZE; /* packets are 64K */ + spin_unlock(&nosrq_count.lock); + if ((index == max_rc_qp) || + ( recv_mem_used >= max_recv_buf * (1ul << 20))) { + spin_unlock_irq(&priv->lock); + ipoib_warn(priv, "NOSRQ has reached the configurable limit " + "of either %d RC QPs or, max recv buf size of " + "0x%lx MB\n", max_rc_qp, max_recv_buf); + + /* We send a REJ to the remote side indicating that we + * have no more free RC QPs and leave it to the remote side + * to take appropriate action. This should leave the + * current set of QPs unaffected and any subsequent REQs + * will be able to use RC QPs if they are available. + */ + ib_send_cm_rej(cm_id, IB_CM_REJ_NO_QP, NULL, 0, NULL, 0); + ret = -EINVAL; + goto err_send_rej; + } + + priv->cm.rx_index_table[index] = p; + spin_unlock_irq(&priv->lock); + + /* We will subsequently use this stored pointer while freeing + * resources in stale task */ + p->index = index; + + ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn); + if (ret) { + ipoib_warn(priv, "ipoib_cm_modify_rx_qp() failed %d\n", ret); + ipoib_cm_dev_cleanup(dev); + goto err_modify_nosrq; + } + + for (i = 0; i < ipoib_recvq_size; ++i) { + if (!ipoib_cm_alloc_rx_skb(dev, i << 32 | index, + IPOIB_CM_RX_SG - 1, + p->rx_ring[i].mapping)) { + ipoib_warn(priv, "failed to allocate receive " + "buffer %ld\n", i); + ipoib_cm_dev_cleanup(dev); + ret = -ENOMEM; + goto err_alloc_and_post; + } + + if (post_receive_nosrq(dev, i << 32 | index)) { + ipoib_warn(priv, "post_receive_nosrq " + "failed for buf %ld\n", i); + ipoib_cm_dev_cleanup(dev); + ret = -EIO; + goto err_alloc_and_post; + } + } + + return 0; + +err_send_rej: +err_modify_nosrq: +err_alloc_and_post: + kfree(p->rx_ring); + return ret; +} + static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) { struct net_device *dev = cm_id->context; @@ -305,8 +474,11 @@ static int ipoib_cm_req_handler(struct i ipoib_dbg(priv, "REQ arrived\n"); p = kzalloc(sizeof *p, GFP_KERNEL); - if (!p) + if (!p) { + printk(KERN_WARNING "Failed to allocate RX control block when " + "REQ arrived\n"); return -ENOMEM; + } p->dev = dev; p->id = cm_id; p->qp = ipoib_cm_create_rx_qp(dev, p); @@ -316,9 +488,18 @@ static int ipoib_cm_req_handler(struct i } psn = random32() & 0xffffff; - ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn); - if (ret) - goto err_modify; + if (!priv->cm.srq) { + spin_lock(&nosrq_count.lock); + nosrq_count.current_rc_qp++; + spin_unlock(&nosrq_count.lock); + if (ret = allocate_and_post_rbuf_nosrq(cm_id, p, psn)) + goto err_post_nosrq; + } else { + p->rx_ring = NULL; + ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn); + if (ret) + goto err_modify; + } ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); if (ret) { @@ -326,18 +507,18 @@ static int ipoib_cm_req_handler(struct i goto err_rep; } - cm_id->context = p; - p->jiffies = jiffies; - p->state = IPOIB_CM_RX_LIVE; - spin_lock_irq(&priv->lock); - if (list_empty(&priv->cm.passive_ids)) - queue_delayed_work(ipoib_workqueue, - &priv->cm.stale_task, IPOIB_CM_RX_DELAY); - list_add(&p->list, &priv->cm.passive_ids); - spin_unlock_irq(&priv->lock); + if (priv->cm.srq) { + init_context_and_add_list(cm_id, p, priv); + p->state = IPOIB_CM_RX_LIVE; + } return 0; err_rep: +err_post_nosrq: + list_del_init(&p->list); + spin_lock(&nosrq_count.lock); + nosrq_count.current_rc_qp--; + spin_unlock(&nosrq_count.lock); err_modify: ib_destroy_qp(p->qp); err_qp: @@ -401,21 +582,51 @@ static void skb_put_frags(struct sk_buff } } -void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) +static void timer_check_srq(struct ipoib_dev_priv *priv, struct ipoib_cm_rx *p) +{ + unsigned long flags; + + if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { + spin_lock_irqsave(&priv->lock, flags); + p->jiffies = jiffies; + /* Move this entry to list head, but do + * not re-add it if it has been removed. */ + if (p->state == IPOIB_CM_RX_LIVE) + list_move(&p->list, &priv->cm.passive_ids); + spin_unlock_irqrestore(&priv->lock, flags); + } +} + +static void timer_check_nosrq(struct ipoib_dev_priv *priv, struct ipoib_cm_rx *p) +{ + unsigned long flags; + + if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { + spin_lock_irqsave(&priv->lock, flags); + p->jiffies = jiffies; + /* Move this entry to list head, but do + * not re-add it if it has been removed. */ + if (!list_empty(&p->list)) + list_move(&p->list, &priv->cm.passive_ids); + spin_unlock_irqrestore(&priv->lock, flags); + } +} + +void handle_rx_wc_srq(struct net_device *dev, struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); - unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ; + u64 wr_id = wc->wr_id & ~IPOIB_CM_OP_RECV; struct sk_buff *skb, *newskb; struct ipoib_cm_rx *p; unsigned long flags; u64 mapping[IPOIB_CM_RX_SG]; - int frags; + int frags, ret; ipoib_dbg_data(priv, "cm recv completion: id %d, status: %d\n", wr_id, wc->status); if (unlikely(wr_id >= ipoib_recvq_size)) { - if (wr_id == (IPOIB_CM_RX_DRAIN_WRID & ~IPOIB_CM_OP_SRQ)) { + if (wr_id == (IPOIB_CM_RX_DRAIN_WRID & ~IPOIB_CM_OP_RECV)) { spin_lock_irqsave(&priv->lock, flags); list_splice_init(&priv->cm.rx_drain_list, &priv->cm.rx_reap_list); ipoib_cm_start_rx_drain(priv); @@ -434,20 +645,12 @@ void ipoib_cm_handle_rx_wc(struct net_de "(status=%d, wrid=%d vend_err %x)\n", wc->status, wr_id, wc->vendor_err); ++priv->stats.rx_dropped; - goto repost; + goto repost_srq; } if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) { p = wc->qp->qp_context; - if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { - spin_lock_irqsave(&priv->lock, flags); - p->jiffies = jiffies; - /* Move this entry to list head, but do not re-add it - * if it has been moved out of list. */ - if (p->state == IPOIB_CM_RX_LIVE) - list_move(&p->list, &priv->cm.passive_ids); - spin_unlock_irqrestore(&priv->lock, flags); - } + timer_check_srq(priv, p); } frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len, @@ -459,13 +662,113 @@ void ipoib_cm_handle_rx_wc(struct net_de * If we can't allocate a new RX buffer, dump * this packet and reuse the old buffer. */ - ipoib_dbg(priv, "failed to allocate receive buffer %d\n", wr_id); + ipoib_dbg(priv, "failed to allocate receive buffer %ld\n", wr_id); + ++priv->stats.rx_dropped; + goto repost_srq; + } + + ipoib_cm_dma_unmap_rx(priv, frags, + priv->cm.srq_ring[wr_id].mapping); + memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, + (frags + 1) * sizeof *mapping); + ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", + wc->byte_len, wc->slid); + + skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len, newskb); + + skb->protocol = ((struct ipoib_header *) skb->data)->proto; + skb_reset_mac_header(skb); + skb_pull(skb, IPOIB_ENCAP_LEN); + + dev->last_rx = jiffies; + ++priv->stats.rx_packets; + priv->stats.rx_bytes += skb->len; + + skb->dev = dev; + /* XXX get correct PACKET_ type here */ + skb->pkt_type = PACKET_HOST; + netif_rx_ni(skb); + +repost_srq: + ret = post_receive_srq(dev, wr_id); + + if (unlikely(ret)) + ipoib_warn(priv, "post_receive_srq failed for buf %ld\n", + wr_id); + +} + +static void handle_rx_wc_nosrq(struct net_device *dev, struct ib_wc *wc) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct sk_buff *skb, *newskb; + u64 mapping[IPOIB_CM_RX_SG], wr_id = wc->wr_id >> 32; + u32 index; + struct ipoib_cm_rx *p, *rx_ptr; + int frags, ret; + + + ipoib_dbg_data(priv, "cm recv completion: id %d, status: %d\n", + wr_id, wc->status); + + if (unlikely(wr_id >= ipoib_recvq_size)) { + ipoib_warn(priv, "cm recv completion event with wrid %d (> %d)\n", + wr_id, ipoib_recvq_size); + return; + } + + index = (wc->wr_id & ~IPOIB_CM_OP_RECV) & NOSRQ_INDEX_MASK ; + + /* This is the only place where rx_ptr could be a NULL - could + * have just received a packet from a connection that has become + * stale and so is going away. We will simply drop the packet and + * let the hardware (it s IB_QPT_RC) handle the dropped packet. + * In the timer_check() function below, p->jiffies is updated and + * hence the connection will not be stale after that. + */ + rx_ptr = priv->cm.rx_index_table[index]; + if (unlikely(!rx_ptr)) { + ipoib_warn(priv, "Received packet from a connection " + "that is going away. Hardware will handle it.\n"); + return; + } + + skb = rx_ptr->rx_ring[wr_id].skb; + + if (unlikely(wc->status != IB_WC_SUCCESS)) { + ipoib_dbg(priv, "cm recv error " + "(status=%d, wrid=%ld vend_err %x)\n", + wc->status, wr_id, wc->vendor_err); + ++priv->stats.rx_dropped; + goto repost_nosrq; + } + + if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) { + /* There are no guarantees that wc->qp is not NULL for HCAs + * that do not support SRQ. */ + p = rx_ptr; + timer_check_nosrq(priv, p); + } + + frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len, + (unsigned)IPOIB_CM_HEAD_SIZE)) / PAGE_SIZE; + + newskb = ipoib_cm_alloc_rx_skb(dev, wr_id << 32 | index, frags, + mapping); + if (unlikely(!newskb)) { + /* + * If we can't allocate a new RX buffer, dump + * this packet and reuse the old buffer. + */ + ipoib_dbg(priv, "failed to allocate receive buffer %ld\n", wr_id); ++priv->stats.rx_dropped; - goto repost; + goto repost_nosrq; } - ipoib_cm_dma_unmap_rx(priv, frags, priv->cm.srq_ring[wr_id].mapping); - memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, (frags + 1) * sizeof *mapping); + ipoib_cm_dma_unmap_rx(priv, frags, + rx_ptr->rx_ring[wr_id].mapping); + memcpy(rx_ptr->rx_ring[wr_id].mapping, mapping, + (frags + 1) * sizeof *mapping); ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", wc->byte_len, wc->slid); @@ -485,10 +788,22 @@ void ipoib_cm_handle_rx_wc(struct net_de skb->pkt_type = PACKET_HOST; netif_receive_skb(skb); -repost: - if (unlikely(ipoib_cm_post_receive(dev, wr_id))) - ipoib_warn(priv, "ipoib_cm_post_receive failed " - "for buf %d\n", wr_id); +repost_nosrq: + ret = post_receive_nosrq(dev, wr_id << 32 | index); + + if (unlikely(ret)) + ipoib_warn(priv, "post_receive_nosrq failed for buf %ld\n", + wr_id); +} + +void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + + if (priv->cm.srq) + handle_rx_wc_srq(dev, wc); + else + handle_rx_wc_nosrq(dev, wc); } static inline int post_send(struct ipoib_dev_priv *priv, @@ -680,6 +995,44 @@ err_cm: return ret; } +static void free_resources_nosrq(struct ipoib_dev_priv *priv, struct ipoib_cm_rx *p) +{ + int i; + + for(i = 0; i < ipoib_recvq_size; ++i) + if(p->rx_ring[i].skb) { + ipoib_cm_dma_unmap_rx(priv, + IPOIB_CM_RX_SG - 1, + p->rx_ring[i].mapping); + dev_kfree_skb_any(p->rx_ring[i].skb); + p->rx_ring[i].skb = NULL; + } + kfree(p->rx_ring); +} + +void dev_stop_nosrq(struct ipoib_dev_priv *priv) +{ + struct ipoib_cm_rx *p; + + spin_lock_irq(&priv->lock); + while (!list_empty(&priv->cm.passive_ids)) { + p = list_entry(priv->cm.passive_ids.next, typeof(*p), list); + free_resources_nosrq(priv, p); + list_del_init(&p->list); + spin_unlock_irq(&priv->lock); + ib_destroy_cm_id(p->id); + ib_destroy_qp(p->qp); + spin_lock(&nosrq_count.lock); + nosrq_count.current_rc_qp--; + spin_unlock(&nosrq_count.lock); + kfree(p); + spin_lock_irq(&priv->lock); + } + spin_unlock_irq(&priv->lock); + + cancel_delayed_work(&priv->cm.stale_task); +} + void ipoib_cm_dev_stop(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -694,6 +1047,11 @@ void ipoib_cm_dev_stop(struct net_device ib_destroy_cm_id(priv->cm.id); priv->cm.id = NULL; + if (!priv->cm.srq) { + dev_stop_nosrq(priv); + return; + } + spin_lock_irq(&priv->lock); while (!list_empty(&priv->cm.passive_ids)) { p = list_entry(priv->cm.passive_ids.next, typeof(*p), list); @@ -739,6 +1097,7 @@ void ipoib_cm_dev_stop(struct net_device kfree(p); } + cancel_delayed_work(&priv->cm.stale_task); } @@ -817,7 +1176,9 @@ static struct ib_qp *ipoib_cm_create_tx_ attr.recv_cq = priv->cq; attr.srq = priv->cm.srq; attr.cap.max_send_wr = ipoib_sendq_size; + attr.cap.max_recv_wr = 1; attr.cap.max_send_sge = 1; + attr.cap.max_recv_sge = 1; attr.sq_sig_type = IB_SIGNAL_ALL_WR; attr.qp_type = IB_QPT_RC; attr.send_cq = cq; @@ -857,7 +1218,7 @@ static int ipoib_cm_send_req(struct net_ req.retry_count = 0; /* RFC draft warns against retries */ req.rnr_retry_count = 0; /* RFC draft warns against retries */ req.max_cm_retries = 15; - req.srq = 1; + req.srq = !!priv->cm.srq; return ib_send_cm_req(id, &req); } @@ -1202,6 +1563,11 @@ static void ipoib_cm_rx_reap(struct work list_for_each_entry_safe(p, n, &list, list) { ib_destroy_cm_id(p->id); ib_destroy_qp(p->qp); + if (!priv->cm.srq) { + spin_lock(&nosrq_count.lock); + nosrq_count.current_rc_qp--; + spin_unlock(&nosrq_count.lock); + } kfree(p); } } @@ -1220,12 +1586,19 @@ static void ipoib_cm_stale_task(struct w p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list); if (time_before_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT)) break; - list_move(&p->list, &priv->cm.rx_error_list); - p->state = IPOIB_CM_RX_ERROR; - spin_unlock_irq(&priv->lock); - ret = ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE); - if (ret) - ipoib_warn(priv, "unable to move qp to error state: %d\n", ret); + if (!priv->cm.srq) { + free_resources_nosrq(priv, p); + list_del_init(&p->list); + priv->cm.rx_index_table[p->index] = NULL; + spin_unlock_irq(&priv->lock); + } else { + list_move(&p->list, &priv->cm.rx_error_list); + p->state = IPOIB_CM_RX_ERROR; + spin_unlock_irq(&priv->lock); + ret = ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE); + if (ret) + ipoib_warn(priv, "unable to move qp to error state: %d\n", ret); + } spin_lock_irq(&priv->lock); } @@ -1279,16 +1652,40 @@ int ipoib_cm_add_mode_attr(struct net_de return device_create_file(&dev->dev, &dev_attr_mode); } +static int create_srq(struct net_device *dev, struct ipoib_dev_priv *priv) +{ + struct ib_srq_init_attr srq_init_attr; + int ret; + + srq_init_attr.attr.max_wr = ipoib_recvq_size; + srq_init_attr.attr.max_sge = IPOIB_CM_RX_SG; + + priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr); + if (IS_ERR(priv->cm.srq)) { + ret = PTR_ERR(priv->cm.srq); + priv->cm.srq = NULL; + return ret; + } + + priv->cm.srq_ring = kzalloc(ipoib_recvq_size * + sizeof *priv->cm.srq_ring, + GFP_KERNEL); + if (!priv->cm.srq_ring) { + printk(KERN_WARNING "%s: failed to allocate CM ring " + "(%d entries)\n", + priv->ca->name, ipoib_recvq_size); + ipoib_cm_dev_cleanup(dev); + return -ENOMEM; + } + + return 0; +} + int ipoib_cm_dev_init(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ib_srq_init_attr srq_init_attr = { - .attr = { - .max_wr = ipoib_recvq_size, - .max_sge = IPOIB_CM_RX_SG - } - }; int ret, i; + struct ib_device_attr attr; INIT_LIST_HEAD(&priv->cm.passive_ids); INIT_LIST_HEAD(&priv->cm.reap_list); @@ -1305,20 +1702,33 @@ int ipoib_cm_dev_init(struct net_device skb_queue_head_init(&priv->cm.skb_queue); - priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr); - if (IS_ERR(priv->cm.srq)) { - ret = PTR_ERR(priv->cm.srq); - priv->cm.srq = NULL; + if (ret = ib_query_device(priv->ca, &attr)) return ret; - } - priv->cm.srq_ring = kzalloc(ipoib_recvq_size * sizeof *priv->cm.srq_ring, - GFP_KERNEL); - if (!priv->cm.srq_ring) { - printk(KERN_WARNING "%s: failed to allocate CM ring (%d entries)\n", - priv->ca->name, ipoib_recvq_size); - ipoib_cm_dev_cleanup(dev); - return -ENOMEM; + if (attr.max_srq) { + /* This device supports SRQ */ + if (ret = create_srq(dev, priv)) + return ret; + priv->cm.rx_index_table = NULL; + } else { + priv->cm.srq = NULL; + priv->cm.srq_ring = NULL; + + /* Every new REQ that arrives creates a struct ipoib_cm_rx. + * These structures form a link list starting with the + * passive_ids. For quick and easy access we maintain a table + * of pointers to struct ipoib_cm_rx called the rx_index_table + */ + priv->cm.rx_index_table = kzalloc(NOSRQ_INDEX_TABLE_SIZE * + sizeof *priv->cm.rx_index_table, + GFP_KERNEL); + if (!priv->cm.rx_index_table) { + printk(KERN_WARNING "Failed to allocate NOSRQ_INDEX_TABLE\n"); + return -ENOMEM; + } + + spin_lock_init(&nosrq_count.lock); + nosrq_count.current_rc_qp = 0; } for (i = 0; i < IPOIB_CM_RX_SG; ++i) @@ -1331,17 +1741,23 @@ int ipoib_cm_dev_init(struct net_device priv->cm.rx_wr.sg_list = priv->cm.rx_sge; priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG; - for (i = 0; i < ipoib_recvq_size; ++i) { - if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1, + /* One can post receive buffers even before the RX QP is created + * only in the SRQ case. Therefore for NOSRQ we skip the rest of init + * and do that in ipoib_cm_req_handler() */ + + if (priv->cm.srq) { + for (i = 0; i < ipoib_recvq_size; ++i) { + if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1, priv->cm.srq_ring[i].mapping)) { - ipoib_warn(priv, "failed to allocate receive buffer %d\n", i); - ipoib_cm_dev_cleanup(dev); - return -ENOMEM; - } - if (ipoib_cm_post_receive(dev, i)) { - ipoib_warn(priv, "ipoib_ib_post_receive failed for buf %d\n", i); - ipoib_cm_dev_cleanup(dev); - return -EIO; + ipoib_warn(priv, "failed to allocate receive buffer %d\n", i); + ipoib_cm_dev_cleanup(dev); + return -ENOMEM; + } + if (post_receive_srq(dev, i)) { + ipoib_warn(priv, "post_receive_srq failed for buf %d\n", i); + ipoib_cm_dev_cleanup(dev); + return -EIO; + } } } --- a/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-05-30 14:56:25.000000000 -0400 +++ b/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-06-11 16:36:59.000000000 -0400 @@ -299,7 +299,7 @@ int ipoib_poll(struct net_device *dev, i for (i = 0; i < n; ++i) { struct ib_wc *wc = priv->ibwc + i; - if (wc->wr_id & IPOIB_CM_OP_SRQ) { + if (wc->wr_id & IPOIB_CM_OP_RECV) { ++done; --max; ipoib_cm_handle_rx_wc(dev, wc); @@ -557,7 +557,7 @@ void ipoib_drain_cq(struct net_device *d do { n = ib_poll_cq(priv->cq, IPOIB_NUM_WC, priv->ibwc); for (i = 0; i < n; ++i) { - if (priv->ibwc[i].wr_id & IPOIB_CM_OP_SRQ) + if (priv->ibwc[i].wr_id & IPOIB_CM_OP_RECV) ipoib_cm_handle_rx_wc(dev, priv->ibwc + i); else if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) ipoib_ib_handle_rx_wc(dev, priv->ibwc + i); --- a/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-05-30 14:56:25.000000000 -0400 +++ b/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2007-06-11 16:36:59.000000000 -0400 @@ -175,6 +175,15 @@ int ipoib_transport_dev_init(struct net_ if (!ret) size += ipoib_recvq_size + 1 /* 1 extra for rx_drain_qp */; + /* We increase the size of the CQ in the NOSRQ case to prevent CQ + * overflow. Every new REQ creates a new RX QP and each QP has an + * RX ring associated with it. Therefore we could have + * NOSRQ_INDEX_TABLE_SIZE*ipoib_recvq_size + ipoib_sendq_size CQEs + * in a CQ. + */ + if(!priv->cm.srq) + size += (NOSRQ_INDEX_TABLE_SIZE -1)* ipoib_recvq_size; + priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, size, 0); if (IS_ERR(priv->cq)) { printk(KERN_WARNING "%s: failed to create CQ\n", ca->name); From pradeeps at linux.vnet.ibm.com Tue Jun 12 11:10:59 2007 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Tue, 12 Jun 2007 11:10:59 -0700 Subject: [ofa-general] IPOIB CM (NOSRQ) extension [PATCH V2] patch Message-ID: <466EE1B3.5040806@linux.vnet.ibm.com> This patch handles the corner case of running out of RC QPs. In that case it switches to UD mode. This patch can be used both by NOSRQ and SRQ code. Changes from V1; 1. The switch to datagram mode conditionally happens only when there no resources (QPs) available on the passive side. This patch has been tested with linux-2.6.22-rc4 derived from Roland's for-2.6.23 git tree on 06/11 on ppc64 machines Signed-off-by: Pradeep Satyanarayana --- --- c/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-06-12 12:35:07.000000000 -0400 +++ b/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-06-12 12:39:47.000000000 -0400 @@ -1378,8 +1378,18 @@ static int ipoib_cm_tx_handler(struct ib ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, NULL, 0); break; - case IB_CM_REQ_ERROR: case IB_CM_REJ_RECEIVED: + ipoib_warn(priv, "REJ received\n"); + spin_lock(&priv->lock); + neigh = tx->neigh; + spin_unlock(&priv->lock); + + if ((neigh) && (event->param.rej_rcvd.reason == + IB_CM_REJ_NO_QP)) { + clear_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags); + break; + } + case IB_CM_REQ_ERROR: case IB_CM_TIMEWAIT_EXIT: ipoib_dbg(priv, "CM error %d.\n", event->event); spin_lock_irq(&priv->tx_lock); --- c/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-05-30 14:56:25.000000000 -0400 +++ b/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-06-11 21:08:07.000000000 -0400 @@ -679,11 +679,10 @@ static int ipoib_start_xmit(struct sk_bu neigh = *to_ipoib_neigh(skb->dst->neighbour); - if (ipoib_cm_get(neigh)) { - if (ipoib_cm_up(neigh)) { + if (ipoib_cm_get(neigh) && ipoib_cm_up(neigh) && + test_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags)) { ipoib_cm_send(dev, skb, ipoib_cm_get(neigh)); goto out; - } } else if (neigh->ah) { if (unlikely(memcmp(&neigh->dgid.raw, skb->dst->neighbour->ha + 4, From mst at dev.mellanox.co.il Tue Jun 12 11:35:21 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 12 Jun 2007 21:35:21 +0300 Subject: [ofa-general] Re: crash in ipoib In-Reply-To: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> Message-ID: <20070612183521.GC10688@mellanox.co.il> > Quoting Sean Hefty : > Subject: crash in ipoib > > Copying ofa general list. > > We've seen a crash similar to this now a total of 4 times. > > These are x64, 2.6.9-42.EL. The crashes only seem to occur on a specific set of > systems in our cluster. > > The latest crash has a similar stack trace as the one listed below. > > badness in 18042_panic_blink drivers/input/serio/18042.c : 992 > 18042_panic_blink + 485 > panic + 445 > apic_timer_interrupt + 133 > oops_end + 38 > oops_end + 65 > do_page_fault + 1204 > ipoib_cm_send + 433 > error_exit > ipoib_ib_completion + 0 > ipoib_cm_handle_rx_wc + 239 > > (the trace goes on and on) where in source are ipoib_cm_send + 433 and ipoib_cm_handle_rx_wc + 239 on your systems? -- MST From friedman at ucla.edu Tue Jun 12 12:10:48 2007 From: friedman at ucla.edu (Scott A. Friedman) Date: Tue, 12 Jun 2007 12:10:48 -0700 Subject: [ofa-general] Re: IB and iWarp HCA in same node In-Reply-To: <20070612171709.82A46E60849@openfabrics.org> References: <20070612171709.82A46E60849@openfabrics.org> Message-ID: <466EEFB8.8010208@ucla.edu> > Scott A. Friedman wrote: >> > I have a working IB cluster where I have added a Chelsio iWarp card to >> > one node. Another node is connected to that with only an identical iWarp >> > card. I cannot seem to get the iWarp cards to come up. They work through >> > regular ethernet just fineand the IB stuff still works as well. But, >> > when I modprobe iw_cxgb3 and iw_cm utilities like ibstat show the >> > following. Which explains why nothing is working. >> > >> > Question is, why? Am I missing or forgetting something? I just want to >> > test the two iWarp cards back to back. Not trying to get some kind of >> > auto bridging or routing working. >> > >> > # ibstat >> > iWARP RNIC 'cxgb3_0' >> > iWARP RNIC type: cxgb3 >> > Number of ports: 1 >> > Firmware version: T 4.0.0 >> > Hardware version: 1 >> > Node GUID: 0x0007430506ea0000 >> > System image GUID: 0x0007430506ea0000 >> > Port 1: >> > State: Active >> > Physical state: No state change >> > Rate: 20 >> > Base lid: 0 >> > LMC: 0 >> > SM lid: 0 >> > Capability mask: 0x009f0000 >> > Port GUID: 0x0000000000000000 > > This all looks normal. What application are you trying to run over rdma > on the chelsio interface? rping? > Yes, rping, anything. It turns out that since I posted this the Chelsio people explained ibstat's funny output and suggested using their latest release of the cxgb3 driver - and that works (without TOE for now, separate issue). The main problem was that the driver that ships with OFED would give me 'connection rejected' errors when trying to do anything (rdma_cm based), my code, sample code, utilities. Replacing the driver made the problem go away. Currently, I am using their 1.0.094 driver w/o TOE and the OFED-1.2-rc3 iWarp stuff (their suggestion) and it appears to work fine so far. Going to just wait for rc5 or final to test that with their driver as well as that is what we will want to use for the rest of our test cluster using IB. From sean.hefty at intel.com Tue Jun 12 12:13:37 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 12 Jun 2007 12:13:37 -0700 Subject: [ofa-general] IPOB CM (NOSRQ) [PATCH V6] patch In-Reply-To: <466EE021.30302@linux.vnet.ibm.com> Message-ID: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com> >+module_param_named(max_recieve_buffer, max_recv_buf, int, 0644); >+MODULE_PARM_DESC(max_recieve_buffer, "Max Recieve Buffer Size in MB"); nit: receive misspelled >+static int allocate_and_post_rbuf_nosrq(struct ib_cm_id *cm_id, >+ struct ipoib_cm_rx *p, unsigned psn) >+{ >+ struct net_device *dev = cm_id->context; >+ struct ipoib_dev_priv *priv = netdev_priv(dev); >+ int ret; >+ u32 qp_num, index; >+ u64 i, recv_mem_used; >+ >+ qp_num = p->qp->qp_num; >+ >+ /* In the SRQ case there is a common rx buffer called the srq_ring. >+ * However, for the NOSRQ we create an rx_ring for every >+ * struct ipoib_cm_rx. >+ */ >+ p->rx_ring = kzalloc(ipoib_recvq_size * sizeof *p->rx_ring, GFP_KERNEL); >+ if (!p->rx_ring) { >+ printk(KERN_WARNING "Failed to allocate rx_ring for 0x%x\n", >+ qp_num); >+ return -ENOMEM; >+ } >+ >+ init_context_and_add_list(cm_id, p, priv); >+ spin_lock_irq(&priv->lock); >+ >+ for (index = 0; index < max_rc_qp; index++) >+ if (priv->cm.rx_index_table[index] == NULL) >+ break; >+ >+ spin_lock(&nosrq_count.lock); >+ recv_mem_used = (u64)ipoib_recvq_size * (u64)nosrq_count.current_rc_qp >+ * CM_PACKET_SIZE; /* packets are 64K */ >+ spin_unlock(&nosrq_count.lock); Is a spin lock needed here? Could you make current_rc_qp an atomic? >+err_send_rej: >+err_modify_nosrq: >+err_alloc_and_post: Maybe just use a single label? >@@ -316,9 +488,18 @@ static int ipoib_cm_req_handler(struct i > } > > psn = random32() & 0xffffff; >- ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn); >- if (ret) >- goto err_modify; >+ if (!priv->cm.srq) { >+ spin_lock(&nosrq_count.lock); >+ nosrq_count.current_rc_qp++; >+ spin_unlock(&nosrq_count.lock); >+ if (ret = allocate_and_post_rbuf_nosrq(cm_id, p, psn)) Use double parens around assignment: if ((ret = ..)) >+ goto err_post_nosrq; >+ } else { >+ p->rx_ring = NULL; >+ ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn); >+ if (ret) >+ goto err_modify; >+ } > > ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); > if (ret) { >@@ -326,18 +507,18 @@ static int ipoib_cm_req_handler(struct i > goto err_rep; > } > >- cm_id->context = p; >- p->jiffies = jiffies; >- p->state = IPOIB_CM_RX_LIVE; >- spin_lock_irq(&priv->lock); >- if (list_empty(&priv->cm.passive_ids)) >- queue_delayed_work(ipoib_workqueue, >- &priv->cm.stale_task, IPOIB_CM_RX_DELAY); >- list_add(&p->list, &priv->cm.passive_ids); >- spin_unlock_irq(&priv->lock); >+ if (priv->cm.srq) { >+ init_context_and_add_list(cm_id, p, priv); >+ p->state = IPOIB_CM_RX_LIVE; The order between setting p->state and adding the item to the list changes here. I don't know if this matters, but it's now possible for the work queue to execute before p->state is set. >+ } > return 0; > > err_rep: >+err_post_nosrq: >+ list_del_init(&p->list); Is this correct? Is p->list on any list at this point? >+ spin_lock(&nosrq_count.lock); >+ nosrq_count.current_rc_qp--; >+ spin_unlock(&nosrq_count.lock); > err_modify: > ib_destroy_qp(p->qp); > err_qp: >@@ -401,21 +582,51 @@ static void skb_put_frags(struct sk_buff > } > } > >-void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) >+static void timer_check_srq(struct ipoib_dev_priv *priv, struct >ipoib_cm_rx *p) >+{ >+ unsigned long flags; >+ >+ if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { >+ spin_lock_irqsave(&priv->lock, flags); >+ p->jiffies = jiffies; >+ /* Move this entry to list head, but do >+ * not re-add it if it has been removed. */ nit: There are several places in the patch where the commenting style needs updating. >+ if (p->state == IPOIB_CM_RX_LIVE) >+ list_move(&p->list, &priv->cm.passive_ids); >+ spin_unlock_irqrestore(&priv->lock, flags); >+ } >+} >+ >+static void timer_check_nosrq(struct ipoib_dev_priv *priv, struct >ipoib_cm_rx *p) >+{ >+ unsigned long flags; >+ >+ if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { >+ spin_lock_irqsave(&priv->lock, flags); >+ p->jiffies = jiffies; >+ /* Move this entry to list head, but do >+ * not re-add it if it has been removed. */ >+ if (!list_empty(&p->list)) This line is the only difference between this function and the previous one. Is it possible to always use the state check? >+ list_move(&p->list, &priv->cm.passive_ids); >+ spin_unlock_irqrestore(&priv->lock, flags); >+ } >+} >+static void handle_rx_wc_nosrq(struct net_device *dev, struct ib_wc *wc) >+{ >+ struct ipoib_dev_priv *priv = netdev_priv(dev); >+ struct sk_buff *skb, *newskb; >+ u64 mapping[IPOIB_CM_RX_SG], wr_id = wc->wr_id >> 32; >+ u32 index; >+ struct ipoib_cm_rx *p, *rx_ptr; >+ int frags, ret; >+ >+ >+ ipoib_dbg_data(priv, "cm recv completion: id %d, status: %d\n", >+ wr_id, wc->status); >+ >+ if (unlikely(wr_id >= ipoib_recvq_size)) { >+ ipoib_warn(priv, "cm recv completion event with wrid %d (> %d)\n", >+ wr_id, ipoib_recvq_size); >+ return; >+ } >+ >+ index = (wc->wr_id & ~IPOIB_CM_OP_RECV) & NOSRQ_INDEX_MASK ; >+ >+ /* This is the only place where rx_ptr could be a NULL - could >+ * have just received a packet from a connection that has become >+ * stale and so is going away. We will simply drop the packet and >+ * let the hardware (it s IB_QPT_RC) handle the dropped packet. >+ * In the timer_check() function below, p->jiffies is updated and >+ * hence the connection will not be stale after that. >+ */ >+ rx_ptr = priv->cm.rx_index_table[index]; Is synchronization needed here? >+ if (unlikely(!rx_ptr)) { >+ ipoib_warn(priv, "Received packet from a connection " >+ "that is going away. Hardware will handle it.\n"); >+ return; >+ } >+ >+ skb = rx_ptr->rx_ring[wr_id].skb; >+ >+ if (unlikely(wc->status != IB_WC_SUCCESS)) { >+ ipoib_dbg(priv, "cm recv error " >+ "(status=%d, wrid=%ld vend_err %x)\n", >+ wc->status, wr_id, wc->vendor_err); >+ ++priv->stats.rx_dropped; >+ goto repost_nosrq; >+ } >+ >+ if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) { >+ /* There are no guarantees that wc->qp is not NULL for HCAs >+ * that do not support SRQ. */ >+ p = rx_ptr; >+ timer_check_nosrq(priv, p); This appears to be the only place 'p' is used in this call. I think we can just remove it. >+ } >+ >+ frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len, >+ (unsigned)IPOIB_CM_HEAD_SIZE)) / PAGE_SIZE; >+ >+ newskb = ipoib_cm_alloc_rx_skb(dev, wr_id << 32 | index, frags, >+ mapping); >+ if (unlikely(!newskb)) { >+ /* >+ * If we can't allocate a new RX buffer, dump >+ * this packet and reuse the old buffer. >+ */ >+ ipoib_dbg(priv, "failed to allocate receive buffer %ld\n", wr_id); > ++priv->stats.rx_dropped; >- goto repost; >+ goto repost_nosrq; > } > >- ipoib_cm_dma_unmap_rx(priv, frags, priv->cm.srq_ring[wr_id].mapping); >- memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, (frags + 1) * sizeof >*mapping); >+ ipoib_cm_dma_unmap_rx(priv, frags, >+ rx_ptr->rx_ring[wr_id].mapping); >+ memcpy(rx_ptr->rx_ring[wr_id].mapping, mapping, >+ (frags + 1) * sizeof *mapping); > > ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", > wc->byte_len, wc->slid); >@@ -485,10 +788,22 @@ void ipoib_cm_handle_rx_wc(struct net_de > skb->pkt_type = PACKET_HOST; > netif_receive_skb(skb); > >-repost: >- if (unlikely(ipoib_cm_post_receive(dev, wr_id))) >- ipoib_warn(priv, "ipoib_cm_post_receive failed " >- "for buf %d\n", wr_id); >+repost_nosrq: >+ ret = post_receive_nosrq(dev, wr_id << 32 | index); >+ >+ if (unlikely(ret)) >+ ipoib_warn(priv, "post_receive_nosrq failed for buf %ld\n", >+ wr_id); >+} >+ >+void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) >+{ >+ struct ipoib_dev_priv *priv = netdev_priv(dev); >+ >+ if (priv->cm.srq) >+ handle_rx_wc_srq(dev, wc); >+ else >+ handle_rx_wc_nosrq(dev, wc); > } > > static inline int post_send(struct ipoib_dev_priv *priv, >@@ -680,6 +995,44 @@ err_cm: > return ret; > } > >+static void free_resources_nosrq(struct ipoib_dev_priv *priv, struct >ipoib_cm_rx *p) >+{ >+ int i; >+ >+ for(i = 0; i < ipoib_recvq_size; ++i) >+ if(p->rx_ring[i].skb) { >+ ipoib_cm_dma_unmap_rx(priv, >+ IPOIB_CM_RX_SG - 1, >+ p->rx_ring[i].mapping); >+ dev_kfree_skb_any(p->rx_ring[i].skb); >+ p->rx_ring[i].skb = NULL; >+ } >+ kfree(p->rx_ring); >+} >+ >+void dev_stop_nosrq(struct ipoib_dev_priv *priv) >+{ >+ struct ipoib_cm_rx *p; >+ >+ spin_lock_irq(&priv->lock); >+ while (!list_empty(&priv->cm.passive_ids)) { >+ p = list_entry(priv->cm.passive_ids.next, typeof(*p), list); >+ free_resources_nosrq(priv, p); >+ list_del_init(&p->list); just list_del should work here >+ spin_unlock_irq(&priv->lock); >+ ib_destroy_cm_id(p->id); >+ ib_destroy_qp(p->qp); >+ spin_lock(&nosrq_count.lock); >+ nosrq_count.current_rc_qp--; >+ spin_unlock(&nosrq_count.lock); >+ kfree(p); >+ spin_lock_irq(&priv->lock); >+ } >+ spin_unlock_irq(&priv->lock); >+ >+ cancel_delayed_work(&priv->cm.stale_task); >+} >+ > void ipoib_cm_dev_stop(struct net_device *dev) > { > struct ipoib_dev_priv *priv = netdev_priv(dev); >@@ -694,6 +1047,11 @@ void ipoib_cm_dev_stop(struct net_device > ib_destroy_cm_id(priv->cm.id); > priv->cm.id = NULL; > >+ if (!priv->cm.srq) { >+ dev_stop_nosrq(priv); >+ return; >+ } >+ > spin_lock_irq(&priv->lock); > while (!list_empty(&priv->cm.passive_ids)) { > p = list_entry(priv->cm.passive_ids.next, typeof(*p), list); >@@ -739,6 +1097,7 @@ void ipoib_cm_dev_stop(struct net_device > kfree(p); > } > >+ > cancel_delayed_work(&priv->cm.stale_task); > } > >@@ -817,7 +1176,9 @@ static struct ib_qp *ipoib_cm_create_tx_ > attr.recv_cq = priv->cq; > attr.srq = priv->cm.srq; > attr.cap.max_send_wr = ipoib_sendq_size; >+ attr.cap.max_recv_wr = 1; > attr.cap.max_send_sge = 1; >+ attr.cap.max_recv_sge = 1; > attr.sq_sig_type = IB_SIGNAL_ALL_WR; > attr.qp_type = IB_QPT_RC; > attr.send_cq = cq; >@@ -857,7 +1218,7 @@ static int ipoib_cm_send_req(struct net_ > req.retry_count = 0; /* RFC draft warns against retries */ > req.rnr_retry_count = 0; /* RFC draft warns against retries */ > req.max_cm_retries = 15; >- req.srq = 1; >+ req.srq = !!priv->cm.srq; > return ib_send_cm_req(id, &req); > } > >@@ -1202,6 +1563,11 @@ static void ipoib_cm_rx_reap(struct work > list_for_each_entry_safe(p, n, &list, list) { > ib_destroy_cm_id(p->id); > ib_destroy_qp(p->qp); >+ if (!priv->cm.srq) { >+ spin_lock(&nosrq_count.lock); >+ nosrq_count.current_rc_qp--; >+ spin_unlock(&nosrq_count.lock); >+ } > kfree(p); > } > } >@@ -1220,12 +1586,19 @@ static void ipoib_cm_stale_task(struct w > p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list); > if (time_before_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT)) > break; >- list_move(&p->list, &priv->cm.rx_error_list); >- p->state = IPOIB_CM_RX_ERROR; >- spin_unlock_irq(&priv->lock); >- ret = ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE); >- if (ret) >- ipoib_warn(priv, "unable to move qp to error state: %d\n", >ret); >+ if (!priv->cm.srq) { >+ free_resources_nosrq(priv, p); >+ list_del_init(&p->list); >+ priv->cm.rx_index_table[p->index] = NULL; >+ spin_unlock_irq(&priv->lock); >+ } else { >+ list_move(&p->list, &priv->cm.rx_error_list); >+ p->state = IPOIB_CM_RX_ERROR; >+ spin_unlock_irq(&priv->lock); >+ ret = ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE); >+ if (ret) >+ ipoib_warn(priv, "unable to move qp to error state: >%d\n", ret); >+ } > spin_lock_irq(&priv->lock); > } > >@@ -1279,16 +1652,40 @@ int ipoib_cm_add_mode_attr(struct net_de > return device_create_file(&dev->dev, &dev_attr_mode); > } > >+static int create_srq(struct net_device *dev, struct ipoib_dev_priv *priv) >+{ >+ struct ib_srq_init_attr srq_init_attr; >+ int ret; >+ >+ srq_init_attr.attr.max_wr = ipoib_recvq_size; >+ srq_init_attr.attr.max_sge = IPOIB_CM_RX_SG; >+ >+ priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr); >+ if (IS_ERR(priv->cm.srq)) { >+ ret = PTR_ERR(priv->cm.srq); >+ priv->cm.srq = NULL; >+ return ret; nit: you can just return PTR_ERR here, and remove the ret stack variable >+ } >+ >+ priv->cm.srq_ring = kzalloc(ipoib_recvq_size * >+ sizeof *priv->cm.srq_ring, >+ GFP_KERNEL); >+ if (!priv->cm.srq_ring) { >+ printk(KERN_WARNING "%s: failed to allocate CM ring " >+ "(%d entries)\n", >+ priv->ca->name, ipoib_recvq_size); >+ ipoib_cm_dev_cleanup(dev); >+ return -ENOMEM; >+ } >+ >+ return 0; >+} >+ > int ipoib_cm_dev_init(struct net_device *dev) > { > struct ipoib_dev_priv *priv = netdev_priv(dev); >- struct ib_srq_init_attr srq_init_attr = { >- .attr = { >- .max_wr = ipoib_recvq_size, >- .max_sge = IPOIB_CM_RX_SG >- } >- }; > int ret, i; >+ struct ib_device_attr attr; > > INIT_LIST_HEAD(&priv->cm.passive_ids); > INIT_LIST_HEAD(&priv->cm.reap_list); >@@ -1305,20 +1702,33 @@ int ipoib_cm_dev_init(struct net_device > > skb_queue_head_init(&priv->cm.skb_queue); > >- priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr); >- if (IS_ERR(priv->cm.srq)) { >- ret = PTR_ERR(priv->cm.srq); >- priv->cm.srq = NULL; >+ if (ret = ib_query_device(priv->ca, &attr)) > return ret; double parens around assignment - also below >- } > >- priv->cm.srq_ring = kzalloc(ipoib_recvq_size * sizeof *priv->cm.srq_ring, >- GFP_KERNEL); >- if (!priv->cm.srq_ring) { >- printk(KERN_WARNING "%s: failed to allocate CM ring (%d >entries)\n", >- priv->ca->name, ipoib_recvq_size); >- ipoib_cm_dev_cleanup(dev); >- return -ENOMEM; >+ if (attr.max_srq) { >+ /* This device supports SRQ */ >+ if (ret = create_srq(dev, priv)) >+ return ret; >+ priv->cm.rx_index_table = NULL; >+ } else { >+ priv->cm.srq = NULL; >+ priv->cm.srq_ring = NULL; >+ >+ /* Every new REQ that arrives creates a struct ipoib_cm_rx. >+ * These structures form a link list starting with the >+ * passive_ids. For quick and easy access we maintain a table >+ * of pointers to struct ipoib_cm_rx called the rx_index_table >+ */ Why store the structures in a linked list if they're stored in a table? >+ priv->cm.rx_index_table = kzalloc(NOSRQ_INDEX_TABLE_SIZE * >+ sizeof *priv->cm.rx_index_table, >+ GFP_KERNEL); >+ if (!priv->cm.rx_index_table) { >+ printk(KERN_WARNING "Failed to allocate >NOSRQ_INDEX_TABLE\n"); >+ return -ENOMEM; >+ } >+ >+ spin_lock_init(&nosrq_count.lock); >+ nosrq_count.current_rc_qp = 0; > } > > for (i = 0; i < IPOIB_CM_RX_SG; ++i) >@@ -1331,17 +1741,23 @@ int ipoib_cm_dev_init(struct net_device > priv->cm.rx_wr.sg_list = priv->cm.rx_sge; > priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG; > >- for (i = 0; i < ipoib_recvq_size; ++i) { >- if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1, >+ /* One can post receive buffers even before the RX QP is created >+ * only in the SRQ case. Therefore for NOSRQ we skip the rest of init >+ * and do that in ipoib_cm_req_handler() */ This is separate from this patch, but why not wait to post receives to a SRQ only after we've received a REQ? Would this simplify the code any? - Sean From rowland at cse.ohio-state.edu Tue Jun 12 14:52:43 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Tue, 12 Jun 2007 17:52:43 -0400 Subject: [ofa-general] Re: [ewg] New OMPI / MPI_READ release notes patch In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C9015635E9@mtlexch01.mtl.com> Message-ID: <466F15AB.5060406@cse.ohio-state.edu> Jeff Squyres wrote: > Note that git still shows the following in the ofed_1_2 branch: > > Example1: Running the OSU bandwidth: > > !!! SOMEONE PLEASE CHECK THESE DIRECTORIES AND EXECUTABLE NAMES > > cd /usr/mpi/gcc/openmpi-1.2.2-1/tests/osu_benchmarks-2.2 > > mpirun -np -hostfile osu_bw > > Example2: Running the Intel MPI Benchmark benchmarks: > > !!! SOMEONE PLEASE CHECK THESE DIRECTORIES AND EXECUTABLE NAMES > > cd /usr/mpi/gcc/openmpi-1.2.2-1/tests/IMB-2.3 > > mpirun -np -hostfile IMB-MPI1 > > Example3: Running the Presta benchmarks: > > !!! SOMEONE PLEASE CHECK THESE DIRECTORIES AND EXECUTABLE NAMES > > cd /usr/mpi/gcc/openmpi-1.2.2-1/tests/presta-1.4.0 > > mpirun -np -hostfile com -o 100 The above information is correct for a standard gcc build. I didn't see that this was answered, but I could have missed that. -- Shaun Rowland rowland at cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From rowland at cse.ohio-state.edu Tue Jun 12 15:03:56 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Tue, 12 Jun 2007 18:03:56 -0400 Subject: [ofa-general] New OMPI / MPI_READ release notes patch In-Reply-To: References: Message-ID: <466F184C.70606@cse.ohio-state.edu> Jeff Squyres wrote: > Tziporet -- > > Here's a new patch for the OMPI release notes based on your current > git. It includes updated information for Open MPI and text about > mpi-selector. > > Note that there are a few areas in MPI_README that I need OSU and > Mellanox to proofread. It would also be nice if someone else could > eyeball the mpi-selector text and ensure it makes sense to a naive reader. I took a look at the documentation in your patch quickly. I think it should be clear how this works. Also, I saw that the links to download were in the current MPI_README.txt, so that should be good. -- Shaun Rowland rowland at cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From mshefty at ichips.intel.com Tue Jun 12 15:41:30 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 12 Jun 2007 15:41:30 -0700 Subject: [ofa-general] Re: crash in ipoib In-Reply-To: <20070612183521.GC10688@mellanox.co.il> References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> <20070612183521.GC10688@mellanox.co.il> Message-ID: <466F211A.1000005@ichips.intel.com> They're around > ipoib_cm_send + 433 if (unlikely(post_send(priv, tx, tx->tx_head... > ipoib_cm_handle_rx_wc + 239 skb = priv->cm.srq_ring[wr_id].skb or if (unlikely(wc->status != IB_WC_SUCCESS)) { (This one isn't matching up quite right, but appears to be in this area.) In my earlier trace, I saw ipoib_cm_handle_rx_wc + 378, which is around: if (p->state == IPOIB_CM_RX_LIVE) -> list_move(&p->list, &priv->cm.passive_ids); - Sean From pradeeps at linux.vnet.ibm.com Tue Jun 12 17:14:00 2007 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Tue, 12 Jun 2007 17:14:00 -0700 Subject: [ofa-general] IPOB CM (NOSRQ) [PATCH V6] patch In-Reply-To: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com> References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com> Message-ID: <466F36C8.5010507@linux.vnet.ibm.com> Sean, Thanks for looking through this. My responses below. Pradeep Sean Hefty wrote: >> +module_param_named(max_recieve_buffer, max_recv_buf, int, 0644); >> +MODULE_PARM_DESC(max_recieve_buffer, "Max Recieve Buffer Size in MB"); > > nit: receive misspelled you are correct. > >> +static int allocate_and_post_rbuf_nosrq(struct ib_cm_id *cm_id, >> + struct ipoib_cm_rx *p, unsigned psn) >> +{ >> + struct net_device *dev = cm_id->context; >> + struct ipoib_dev_priv *priv = netdev_priv(dev); >> + int ret; >> + u32 qp_num, index; >> + u64 i, recv_mem_used; >> + >> + qp_num = p->qp->qp_num; >> + >> + /* In the SRQ case there is a common rx buffer called the srq_ring. >> + * However, for the NOSRQ we create an rx_ring for every >> + * struct ipoib_cm_rx. >> + */ >> + p->rx_ring = kzalloc(ipoib_recvq_size * sizeof *p->rx_ring, GFP_KERNEL); >> + if (!p->rx_ring) { >> + printk(KERN_WARNING "Failed to allocate rx_ring for 0x%x\n", >> + qp_num); >> + return -ENOMEM; >> + } >> + >> + init_context_and_add_list(cm_id, p, priv); >> + spin_lock_irq(&priv->lock); >> + >> + for (index = 0; index < max_rc_qp; index++) >> + if (priv->cm.rx_index_table[index] == NULL) >> + break; >> + >> + spin_lock(&nosrq_count.lock); >> + recv_mem_used = (u64)ipoib_recvq_size * (u64)nosrq_count.current_rc_qp >> + * CM_PACKET_SIZE; /* packets are 64K */ >> + spin_unlock(&nosrq_count.lock); > > Is a spin lock needed here? Could you make current_rc_qp an atomic? This function is called only when a REQ is received. Otherwise current_rc_qp is only used in the error case, or when the connection is being torn down. Hence I don't think it makes a significant difference which one is used. > >> +err_send_rej: >> +err_modify_nosrq: >> +err_alloc_and_post: > > Maybe just use a single label? Yes, that is doable > >> @@ -316,9 +488,18 @@ static int ipoib_cm_req_handler(struct i >> } >> >> psn = random32() & 0xffffff; >> - ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn); >> - if (ret) >> - goto err_modify; >> + if (!priv->cm.srq) { >> + spin_lock(&nosrq_count.lock); >> + nosrq_count.current_rc_qp++; >> + spin_unlock(&nosrq_count.lock); >> + if (ret = allocate_and_post_rbuf_nosrq(cm_id, p, psn)) > > Use double parens around assignment: if ((ret = ..)) okay > >> + goto err_post_nosrq; >> + } else { >> + p->rx_ring = NULL; >> + ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn); >> + if (ret) >> + goto err_modify; >> + } >> >> ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); >> if (ret) { >> @@ -326,18 +507,18 @@ static int ipoib_cm_req_handler(struct i >> goto err_rep; >> } >> >> - cm_id->context = p; >> - p->jiffies = jiffies; >> - p->state = IPOIB_CM_RX_LIVE; >> - spin_lock_irq(&priv->lock); >> - if (list_empty(&priv->cm.passive_ids)) >> - queue_delayed_work(ipoib_workqueue, >> - &priv->cm.stale_task, IPOIB_CM_RX_DELAY); >> - list_add(&p->list, &priv->cm.passive_ids); >> - spin_unlock_irq(&priv->lock); >> + if (priv->cm.srq) { >> + init_context_and_add_list(cm_id, p, priv); >> + p->state = IPOIB_CM_RX_LIVE; > > The order between setting p->state and adding the item to the list changes here. > I don't know if this matters, but it's now possible for the work queue to > execute before p->state is set. You are correct. I need to set p->state first and then call init_context_and add_list(). > >> + } >> return 0; >> >> err_rep: >> +err_post_nosrq: >> + list_del_init(&p->list); > > Is this correct? Is p->list on any list at this point? > >> + spin_lock(&nosrq_count.lock); >> + nosrq_count.current_rc_qp--; >> + spin_unlock(&nosrq_count.lock); >> err_modify: >> ib_destroy_qp(p->qp); >> err_qp: >> @@ -401,21 +582,51 @@ static void skb_put_frags(struct sk_buff >> } >> } >> >> -void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) >> +static void timer_check_srq(struct ipoib_dev_priv *priv, struct >> ipoib_cm_rx *p) >> +{ >> + unsigned long flags; >> + >> + if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { >> + spin_lock_irqsave(&priv->lock, flags); >> + p->jiffies = jiffies; >> + /* Move this entry to list head, but do >> + * not re-add it if it has been removed. */ > > nit: There are several places in the patch where the commenting style needs > updating. Move the closing "*/" to the next line? > >> + if (p->state == IPOIB_CM_RX_LIVE) >> + list_move(&p->list, &priv->cm.passive_ids); >> + spin_unlock_irqrestore(&priv->lock, flags); >> + } >> +} >> + >> +static void timer_check_nosrq(struct ipoib_dev_priv *priv, struct >> ipoib_cm_rx *p) >> +{ >> + unsigned long flags; >> + >> + if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { >> + spin_lock_irqsave(&priv->lock, flags); >> + p->jiffies = jiffies; >> + /* Move this entry to list head, but do >> + * not re-add it if it has been removed. */ >> + if (!list_empty(&p->list)) > > This line is the only difference between this function and the previous one. Is > it possible to always use the state check? The state check is only used in the SRQ case. > >> + list_move(&p->list, &priv->cm.passive_ids); >> + spin_unlock_irqrestore(&priv->lock, flags); >> + } >> +} > > >> +static void handle_rx_wc_nosrq(struct net_device *dev, struct ib_wc *wc) >> +{ >> + struct ipoib_dev_priv *priv = netdev_priv(dev); >> + struct sk_buff *skb, *newskb; >> + u64 mapping[IPOIB_CM_RX_SG], wr_id = wc->wr_id >> 32; >> + u32 index; >> + struct ipoib_cm_rx *p, *rx_ptr; >> + int frags, ret; >> + >> + >> + ipoib_dbg_data(priv, "cm recv completion: id %d, status: %d\n", >> + wr_id, wc->status); >> + >> + if (unlikely(wr_id >= ipoib_recvq_size)) { >> + ipoib_warn(priv, "cm recv completion event with wrid %d (> > %d)\n", >> + wr_id, ipoib_recvq_size); >> + return; >> + } >> + >> + index = (wc->wr_id & ~IPOIB_CM_OP_RECV) & NOSRQ_INDEX_MASK ; >> + >> + /* This is the only place where rx_ptr could be a NULL - could >> + * have just received a packet from a connection that has become >> + * stale and so is going away. We will simply drop the packet and >> + * let the hardware (it s IB_QPT_RC) handle the dropped packet. >> + * In the timer_check() function below, p->jiffies is updated and >> + * hence the connection will not be stale after that. >> + */ >> + rx_ptr = priv->cm.rx_index_table[index]; > > Is synchronization needed here? No locking required > >> + if (unlikely(!rx_ptr)) { >> + ipoib_warn(priv, "Received packet from a connection " >> + "that is going away. Hardware will handle it.\n"); >> + return; >> + } >> + >> + skb = rx_ptr->rx_ring[wr_id].skb; >> + >> + if (unlikely(wc->status != IB_WC_SUCCESS)) { >> + ipoib_dbg(priv, "cm recv error " >> + "(status=%d, wrid=%ld vend_err %x)\n", >> + wc->status, wr_id, wc->vendor_err); >> + ++priv->stats.rx_dropped; >> + goto repost_nosrq; >> + } >> + >> + if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) { >> + /* There are no guarantees that wc->qp is not NULL for HCAs >> + * that do not support SRQ. */ >> + p = rx_ptr; >> + timer_check_nosrq(priv, p); > > This appears to be the only place 'p' is used in this call. I think we can just > remove it. correct. > >> + } >> + >> + frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len, >> + (unsigned)IPOIB_CM_HEAD_SIZE)) / > PAGE_SIZE; >> + >> + newskb = ipoib_cm_alloc_rx_skb(dev, wr_id << 32 | index, frags, >> + mapping); >> + if (unlikely(!newskb)) { >> + /* >> + * If we can't allocate a new RX buffer, dump >> + * this packet and reuse the old buffer. >> + */ >> + ipoib_dbg(priv, "failed to allocate receive buffer %ld\n", > wr_id); >> ++priv->stats.rx_dropped; >> - goto repost; >> + goto repost_nosrq; >> } >> >> - ipoib_cm_dma_unmap_rx(priv, frags, priv->cm.srq_ring[wr_id].mapping); >> - memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, (frags + 1) * sizeof >> *mapping); >> + ipoib_cm_dma_unmap_rx(priv, frags, >> + rx_ptr->rx_ring[wr_id].mapping); >> + memcpy(rx_ptr->rx_ring[wr_id].mapping, mapping, >> + (frags + 1) * sizeof *mapping); >> >> ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", >> wc->byte_len, wc->slid); >> @@ -485,10 +788,22 @@ void ipoib_cm_handle_rx_wc(struct net_de >> skb->pkt_type = PACKET_HOST; >> netif_receive_skb(skb); >> >> -repost: >> - if (unlikely(ipoib_cm_post_receive(dev, wr_id))) >> - ipoib_warn(priv, "ipoib_cm_post_receive failed " >> - "for buf %d\n", wr_id); >> +repost_nosrq: >> + ret = post_receive_nosrq(dev, wr_id << 32 | index); >> + >> + if (unlikely(ret)) >> + ipoib_warn(priv, "post_receive_nosrq failed for buf %ld\n", >> + wr_id); >> +} >> + >> +void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc) >> +{ >> + struct ipoib_dev_priv *priv = netdev_priv(dev); >> + >> + if (priv->cm.srq) >> + handle_rx_wc_srq(dev, wc); >> + else >> + handle_rx_wc_nosrq(dev, wc); >> } >> >> static inline int post_send(struct ipoib_dev_priv *priv, >> @@ -680,6 +995,44 @@ err_cm: >> return ret; >> } >> >> +static void free_resources_nosrq(struct ipoib_dev_priv *priv, struct >> ipoib_cm_rx *p) >> +{ >> + int i; >> + >> + for(i = 0; i < ipoib_recvq_size; ++i) >> + if(p->rx_ring[i].skb) { >> + ipoib_cm_dma_unmap_rx(priv, >> + IPOIB_CM_RX_SG - 1, >> + p->rx_ring[i].mapping); >> + dev_kfree_skb_any(p->rx_ring[i].skb); >> + p->rx_ring[i].skb = NULL; >> + } >> + kfree(p->rx_ring); >> +} >> + >> +void dev_stop_nosrq(struct ipoib_dev_priv *priv) >> +{ >> + struct ipoib_cm_rx *p; >> + >> + spin_lock_irq(&priv->lock); >> + while (!list_empty(&priv->cm.passive_ids)) { >> + p = list_entry(priv->cm.passive_ids.next, typeof(*p), list); >> + free_resources_nosrq(priv, p); >> + list_del_init(&p->list); > > just list_del should work here > >> + spin_unlock_irq(&priv->lock); >> + ib_destroy_cm_id(p->id); >> + ib_destroy_qp(p->qp); >> + spin_lock(&nosrq_count.lock); >> + nosrq_count.current_rc_qp--; >> + spin_unlock(&nosrq_count.lock); >> + kfree(p); >> + spin_lock_irq(&priv->lock); >> + } >> + spin_unlock_irq(&priv->lock); >> + >> + cancel_delayed_work(&priv->cm.stale_task); >> +} >> + >> void ipoib_cm_dev_stop(struct net_device *dev) >> { >> struct ipoib_dev_priv *priv = netdev_priv(dev); >> @@ -694,6 +1047,11 @@ void ipoib_cm_dev_stop(struct net_device >> ib_destroy_cm_id(priv->cm.id); >> priv->cm.id = NULL; >> >> + if (!priv->cm.srq) { >> + dev_stop_nosrq(priv); >> + return; >> + } >> + >> spin_lock_irq(&priv->lock); >> while (!list_empty(&priv->cm.passive_ids)) { >> p = list_entry(priv->cm.passive_ids.next, typeof(*p), list); >> @@ -739,6 +1097,7 @@ void ipoib_cm_dev_stop(struct net_device >> kfree(p); >> } >> >> + >> cancel_delayed_work(&priv->cm.stale_task); >> } >> >> @@ -817,7 +1176,9 @@ static struct ib_qp *ipoib_cm_create_tx_ >> attr.recv_cq = priv->cq; >> attr.srq = priv->cm.srq; >> attr.cap.max_send_wr = ipoib_sendq_size; >> + attr.cap.max_recv_wr = 1; >> attr.cap.max_send_sge = 1; >> + attr.cap.max_recv_sge = 1; >> attr.sq_sig_type = IB_SIGNAL_ALL_WR; >> attr.qp_type = IB_QPT_RC; >> attr.send_cq = cq; >> @@ -857,7 +1218,7 @@ static int ipoib_cm_send_req(struct net_ >> req.retry_count = 0; /* RFC draft warns against retries */ >> req.rnr_retry_count = 0; /* RFC draft warns against retries */ >> req.max_cm_retries = 15; >> - req.srq = 1; >> + req.srq = !!priv->cm.srq; >> return ib_send_cm_req(id, &req); >> } >> >> @@ -1202,6 +1563,11 @@ static void ipoib_cm_rx_reap(struct work >> list_for_each_entry_safe(p, n, &list, list) { >> ib_destroy_cm_id(p->id); >> ib_destroy_qp(p->qp); >> + if (!priv->cm.srq) { >> + spin_lock(&nosrq_count.lock); >> + nosrq_count.current_rc_qp--; >> + spin_unlock(&nosrq_count.lock); >> + } >> kfree(p); >> } >> } >> @@ -1220,12 +1586,19 @@ static void ipoib_cm_stale_task(struct w >> p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list); >> if (time_before_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT)) >> break; >> - list_move(&p->list, &priv->cm.rx_error_list); >> - p->state = IPOIB_CM_RX_ERROR; >> - spin_unlock_irq(&priv->lock); >> - ret = ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE); >> - if (ret) >> - ipoib_warn(priv, "unable to move qp to error state: > %d\n", >> ret); >> + if (!priv->cm.srq) { >> + free_resources_nosrq(priv, p); >> + list_del_init(&p->list); >> + priv->cm.rx_index_table[p->index] = NULL; >> + spin_unlock_irq(&priv->lock); >> + } else { >> + list_move(&p->list, &priv->cm.rx_error_list); >> + p->state = IPOIB_CM_RX_ERROR; >> + spin_unlock_irq(&priv->lock); >> + ret = ib_modify_qp(p->qp, &ipoib_cm_err_attr, > IB_QP_STATE); >> + if (ret) >> + ipoib_warn(priv, "unable to move qp to error > state: >> %d\n", ret); >> + } >> spin_lock_irq(&priv->lock); >> } >> >> @@ -1279,16 +1652,40 @@ int ipoib_cm_add_mode_attr(struct net_de >> return device_create_file(&dev->dev, &dev_attr_mode); >> } >> >> +static int create_srq(struct net_device *dev, struct ipoib_dev_priv *priv) >> +{ >> + struct ib_srq_init_attr srq_init_attr; >> + int ret; >> + >> + srq_init_attr.attr.max_wr = ipoib_recvq_size; >> + srq_init_attr.attr.max_sge = IPOIB_CM_RX_SG; >> + >> + priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr); >> + if (IS_ERR(priv->cm.srq)) { >> + ret = PTR_ERR(priv->cm.srq); >> + priv->cm.srq = NULL; >> + return ret; > > nit: you can just return PTR_ERR here, and remove the ret stack variable okay > >> + } >> + >> + priv->cm.srq_ring = kzalloc(ipoib_recvq_size * >> + sizeof *priv->cm.srq_ring, >> + GFP_KERNEL); >> + if (!priv->cm.srq_ring) { >> + printk(KERN_WARNING "%s: failed to allocate CM ring " >> + "(%d entries)\n", >> + priv->ca->name, ipoib_recvq_size); >> + ipoib_cm_dev_cleanup(dev); >> + return -ENOMEM; >> + } >> + >> + return 0; >> +} >> + >> int ipoib_cm_dev_init(struct net_device *dev) >> { >> struct ipoib_dev_priv *priv = netdev_priv(dev); >> - struct ib_srq_init_attr srq_init_attr = { >> - .attr = { >> - .max_wr = ipoib_recvq_size, >> - .max_sge = IPOIB_CM_RX_SG >> - } >> - }; >> int ret, i; >> + struct ib_device_attr attr; >> >> INIT_LIST_HEAD(&priv->cm.passive_ids); >> INIT_LIST_HEAD(&priv->cm.reap_list); >> @@ -1305,20 +1702,33 @@ int ipoib_cm_dev_init(struct net_device >> >> skb_queue_head_init(&priv->cm.skb_queue); >> >> - priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr); >> - if (IS_ERR(priv->cm.srq)) { >> - ret = PTR_ERR(priv->cm.srq); >> - priv->cm.srq = NULL; >> + if (ret = ib_query_device(priv->ca, &attr)) >> return ret; > > double parens around assignment - also below okay > >> - } >> >> - priv->cm.srq_ring = kzalloc(ipoib_recvq_size * sizeof > *priv->cm.srq_ring, >> - GFP_KERNEL); >> - if (!priv->cm.srq_ring) { >> - printk(KERN_WARNING "%s: failed to allocate CM ring (%d >> entries)\n", >> - priv->ca->name, ipoib_recvq_size); >> - ipoib_cm_dev_cleanup(dev); >> - return -ENOMEM; >> + if (attr.max_srq) { >> + /* This device supports SRQ */ >> + if (ret = create_srq(dev, priv)) >> + return ret; >> + priv->cm.rx_index_table = NULL; >> + } else { >> + priv->cm.srq = NULL; >> + priv->cm.srq_ring = NULL; >> + >> + /* Every new REQ that arrives creates a struct ipoib_cm_rx. >> + * These structures form a link list starting with the >> + * passive_ids. For quick and easy access we maintain a table >> + * of pointers to struct ipoib_cm_rx called the rx_index_table >> + */ > > Why store the structures in a linked list if they're stored in a table? This linked list is common to both SRQ and NOSRQ. Only the NOSRQ code uses the table. > >> + priv->cm.rx_index_table = kzalloc(NOSRQ_INDEX_TABLE_SIZE * >> + sizeof *priv->cm.rx_index_table, >> + GFP_KERNEL); >> + if (!priv->cm.rx_index_table) { >> + printk(KERN_WARNING "Failed to allocate >> NOSRQ_INDEX_TABLE\n"); >> + return -ENOMEM; >> + } >> + >> + spin_lock_init(&nosrq_count.lock); >> + nosrq_count.current_rc_qp = 0; >> } >> >> for (i = 0; i < IPOIB_CM_RX_SG; ++i) >> @@ -1331,17 +1741,23 @@ int ipoib_cm_dev_init(struct net_device >> priv->cm.rx_wr.sg_list = priv->cm.rx_sge; >> priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG; >> >> - for (i = 0; i < ipoib_recvq_size; ++i) { >> - if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1, >> + /* One can post receive buffers even before the RX QP is created >> + * only in the SRQ case. Therefore for NOSRQ we skip the rest of init >> + * and do that in ipoib_cm_req_handler() */ > > This is separate from this patch, but why not wait to post receives to a SRQ > only after we've received a REQ? Would this simplify the code any? Good point. We could think of that in the future. > > - Sean > From sean.hefty at intel.com Tue Jun 12 18:24:49 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 12 Jun 2007 18:24:49 -0700 Subject: [ofa-general] IPOB CM (NOSRQ) [PATCH V6] patch In-Reply-To: <466F36C8.5010507@linux.vnet.ibm.com> Message-ID: <000001c7ad59$a2a93040$8ec8180a@amr.corp.intel.com> >This function is called only when a REQ is received. Otherwise >current_rc_qp is only used in the error case, or when the connection >is being torn down. Hence I don't think it makes a significant >difference which one is used. I'm not hung up on this, but it appears that current_rc_qp is being used as an atomic (read, inc, dec). Converting it to an atomic seems cleaner. >Move the closing "*/" to the next line? The preferred format for multi-line comments is: /* * first line * second line * etc. */ I don't know how well the existing code follows this format... >>> + if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { >>> + spin_lock_irqsave(&priv->lock, flags); >>> + p->jiffies = jiffies; >>> + /* Move this entry to list head, but do >>> + * not re-add it if it has been removed. */ >>> + if (!list_empty(&p->list)) >> >> This line is the only difference between this function and the previous one. >Is >> it possible to always use the state check? > >The state check is only used in the SRQ case. I guess I was just asking whether the non-SRQ case could be made to make use of state as well. (I'll leave that to you, since I'm not as familiar with the code. I was just looking for ways to make the SRQ/no-SRQ code common, but only if it simplifies the code in the end.) - Sean From swise at opengridcomputing.com Tue Jun 12 18:48:52 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 12 Jun 2007 20:48:52 -0500 Subject: [ofa-general] Re: copyright warning/problem within ofed-1.2 In-Reply-To: <8A71B368A89016469F72CD08050AD33401505886@maui.asicdesigners.com> References: <466ED3CF.456F.00C7.0@novell.com> <8A71B368A89016469F72CD08050AD33401505886@maui.asicdesigners.com> Message-ID: <466F4D04.70102@opengridcomputing.com> Done. Vlad/Tziporet: Please pull from git://git.openfabrics.org/~swise/libcxgb3 The changes are only copyright headers/comments. Thanks, Steve. Felix Marti wrote: > Steve, > > Can you change the offending file to use an appropriate copyright > statement? > > Thanks, > felix > >> -----Original Message----- >> From: Patrick Mullaney [mailto:pmullaney at novell.com] >> Sent: Tuesday, June 12, 2007 2:12 PM >> To: tziporet at mellanox.co.il >> Cc: Felix Marti; Matthias Nagorni; Moiz Kohari >> Subject: copyright warning/problem within ofed-1.2 >> >> Hi Tziporet, >> >> We just ran across a copyright in libcgxgb3 > library(firmware_exports.h). >> We may not be able to ship this in its current state - can we get this >> changed? I looked around and it seems like there was a patch to remove > it >> that was submitted but it doesn't seem to have made it to the release. >> >> Thanks. >> Patrick >> > From jackm at dev.mellanox.co.il Tue Jun 12 22:35:13 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Wed, 13 Jun 2007 08:35:13 +0300 Subject: [ofa-general] [PATCH 1 of 2] libmlx4: deal with ownership bit wraparound when cleaning cq Message-ID: <200706130835.13642.jackm@dev.mellanox.co.il> 1. ntohl should apply only to cqe->my_qpn. 2. when compacting the cqe's, need to preserve the proper ownership value of the cqe in case of wraparound. Found by Ronni Zimmerman of Mellanox. Signed-off-by: Jack Morgenstein diff --git a/src/cq.c b/src/cq.c index a1831ff..ead1004 100644 --- a/src/cq.c +++ b/src/cq.c @@ -404,14 +404,24 @@ void mlx4_cq_clean(struct mlx4_cq *cq, uint32_t qpn, struct mlx4_srq *srq) * that match our QP by copying older entries on top of them. */ while ((int) --prod_index - (int) cq->cons_index >= 0) { + struct mlx4_cqe *q; + uint8_t own; cqe = get_cqe(cq, prod_index & cq->ibv_cq.cqe); - if (ntohl((cqe->my_qpn) & 0xffffff) == qpn) { + if ((ntohl(cqe->my_qpn) & 0xffffff) == qpn) { if (srq && !(cqe->owner_sr_opcode & MLX4_CQE_IS_SEND_MASK)) mlx4_free_srq_wqe(srq, ntohs(cqe->wqe_index)); ++nfreed; - } else if (nfreed) - memcpy(get_cqe(cq, (prod_index + nfreed) & cq->ibv_cq.cqe), - cqe, sizeof *cqe); + } else if (nfreed) { + /* + * preserve proper ownership bit value in case of + * wraparound. + */ + q = get_cqe(cq, (prod_index + nfreed) & cq->ibv_cq.cqe); + own = q->owner_sr_opcode & MLX4_CQE_OWNER_MASK; + memcpy(q, cqe, sizeof *cqe); + q->owner_sr_opcode = + (q->owner_sr_opcode & ~MLX4_CQE_OWNER_MASK) | own; + } } if (nfreed) { From jackm at dev.mellanox.co.il Tue Jun 12 22:36:24 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Wed, 13 Jun 2007 08:36:24 +0300 Subject: [ofa-general] [PATCH 2 of 2] mlx4: deal with ownership bit wraparound when cleaning cq Message-ID: <200706130836.25074.jackm@dev.mellanox.co.il> When compacting the cqe's, need to preserve the proper ownership value of the cqe in case of wraparound. Found by Ronni Zimmerman of Mellanox. Signed-off-by: Jack Morgenstein diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 048c527..ced854d 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -496,14 +496,24 @@ void __mlx4_ib_cq_clean(struct mlx4_ib_cq *cq, u32 qpn, struct mlx4_ib_srq *srq) * that match our QP by copying older entries on top of them. */ while ((int) --prod_index - (int) cq->mcq.cons_index >= 0) { + struct mlx4_cqe *q; + u8 own; cqe = get_cqe(cq, prod_index & cq->ibcq.cqe); if ((be32_to_cpu(cqe->my_qpn) & 0xffffff) == qpn) { if (srq && !(cqe->owner_sr_opcode & MLX4_CQE_IS_SEND_MASK)) mlx4_ib_free_srq_wqe(srq, be16_to_cpu(cqe->wqe_index)); ++nfreed; - } else if (nfreed) - memcpy(get_cqe(cq, (prod_index + nfreed) & cq->ibcq.cqe), - cqe, sizeof *cqe); + } else if (nfreed) { + /* + * preserve proper ownership bit value in case of + * wraparound. + */ + q = get_cqe(cq, (prod_index + nfreed) & cq->ibcq.cqe); + own = q->owner_sr_opcode & MLX4_CQE_OWNER_MASK; + memcpy(q, cqe, sizeof *cqe); + q->owner_sr_opcode = + (q->owner_sr_opcode & ~MLX4_CQE_OWNER_MASK) | own; + } } if (nfreed) { From vuhuong at mellanox.com Wed Jun 13 01:07:33 2007 From: vuhuong at mellanox.com (Vu Pham) Date: Wed, 13 Jun 2007 01:07:33 -0700 Subject: [ofa-general] OFED 1.x (Gen 2) based SRP target code released! In-Reply-To: <466E4AD8.6090804@voltaire.com> References: <9FA59C95FFCBB34EA5E42C1A8573784F6F91AB@mtiexch01.mti.com> <465AD2D1.2070100@voltaire.com> <466D9EBD.3090809@mellanox.com> <466E4AD8.6090804@voltaire.com> Message-ID: <466FA5C5.5020006@mellanox.com> Erez Zilber wrote: >>>> >>> I'm trying to build srpt according to the instructions, but it does >>> not get built at all. Here's what I did: >>> >>> tar xzf OFED-1.2-rc3.tgz >>> cd OFED-1.2-rc3/SRPMS >>> rpm2cpio ofa_kernel-1.2-rc3.src.rpm |cpio -i >>> tar xzf ofa_kernel-1.2.tgz >>> cd ofa_kernel-1.2 >>> patch -p1 < ~/srpt_inc/add_srpt_01.patch >>> patch -p1 < ~/srpt_inc/add_srpt_03.patch >>> >> You forget to >> patch -p1 < ~/srpt_inc/add_srpt_04.patch >> >> -vu > You may want to add it to the README file (it is not mentioned there). It was not in the original README; however, it is in current README in srpt_inc.git > Is it documented anywhere in openfabrics wiki? > No. It's not in openfabrics wiki -vu From vlad at dev.mellanox.co.il Wed Jun 13 01:33:07 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 13 Jun 2007 11:33:07 +0300 Subject: [ofa-general] Re: [ewg] Re: copyright warning/problem within ofed-1.2 In-Reply-To: <466F4D04.70102@opengridcomputing.com> References: <466ED3CF.456F.00C7.0@novell.com> <8A71B368A89016469F72CD08050AD33401505886@maui.asicdesigners.com> <466F4D04.70102@opengridcomputing.com> Message-ID: <466FABC3.6050101@dev.mellanox.co.il> Done, Regards, Vladimir Steve Wise wrote: > Done. > > Vlad/Tziporet: Please pull from > > git://git.openfabrics.org/~swise/libcxgb3 > > The changes are only copyright headers/comments. > > Thanks, > > Steve. > > > > Felix Marti wrote: >> Steve, >> >> Can you change the offending file to use an appropriate copyright >> statement? >> >> Thanks, >> felix >> >>> -----Original Message----- >>> From: Patrick Mullaney [mailto:pmullaney at novell.com] >>> Sent: Tuesday, June 12, 2007 2:12 PM >>> To: tziporet at mellanox.co.il >>> Cc: Felix Marti; Matthias Nagorni; Moiz Kohari >>> Subject: copyright warning/problem within ofed-1.2 >>> >>> Hi Tziporet, >>> >>> We just ran across a copyright in libcgxgb3 >> library(firmware_exports.h). >>> We may not be able to ship this in its current state - can we get this >>> changed? I looked around and it seems like there was a patch to remove >> it >>> that was submitted but it doesn't seem to have made it to the release. >>> >>> Thanks. >>> Patrick >>> >> > > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > From mst at dev.mellanox.co.il Wed Jun 13 01:45:31 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Jun 2007 11:45:31 +0300 Subject: [ofa-general] Re: crash in ipoib In-Reply-To: <466F211A.1000005@ichips.intel.com> References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> <20070612183521.GC10688@mellanox.co.il> <466F211A.1000005@ichips.intel.com> Message-ID: <20070613084531.GG1975@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [ofa-general] Re: crash in ipoib > > They're around > > >ipoib_cm_send + 433 > > if (unlikely(post_send(priv, tx, tx->tx_head... > > >ipoib_cm_handle_rx_wc + 239 > > skb = priv->cm.srq_ring[wr_id].skb > or > if (unlikely(wc->status != IB_WC_SUCCESS)) { > > (This one isn't matching up quite right, but appears to be in this area.) > > In my earlier trace, I saw ipoib_cm_handle_rx_wc + 378, which is around: > > if (p->state == IPOIB_CM_RX_LIVE) > -> list_move(&p->list, &priv->cm.passive_ids); > > - Sean This looks strange. Can you supply some more data please? Which HCA are you running on? What test are you running? What should I do to reproduce this? Further, could you supply the full oops? -- MST From vlad at lists.openfabrics.org Wed Jun 13 02:45:35 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Wed, 13 Jun 2007 02:45:35 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070613-0200 daily build status Message-ID: <20070613094535.E691CE6089D@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.13 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.17 Passed on powerpc with linux-2.6.13 Passed on x86_64 with linux-2.6.18 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.17 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.14 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From tziporet at dev.mellanox.co.il Wed Jun 13 05:55:42 2007 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Wed, 13 Jun 2007 15:55:42 +0300 Subject: [ofa-general] Re: [ewg] Re: copyright warning/problem within ofed-1.2 In-Reply-To: <466FABC3.6050101@dev.mellanox.co.il> References: <466ED3CF.456F.00C7.0@novell.com> <8A71B368A89016469F72CD08050AD33401505886@maui.asicdesigners.com> <466F4D04.70102@opengridcomputing.com> <466FABC3.6050101@dev.mellanox.co.il> Message-ID: <466FE94E.5010301@mellanox.co.il> Vladimir Sokolovsky wrote: > Done, > > Regards, > Vladimir > > Steve Wise wrote: >> Done. >> >> Vlad/Tziporet: Please pull from >> >> git://git.openfabrics.org/~swise/libcxgb3 >> >> The changes are only copyright headers/comments. >> >> Thanks, >> >> Steve. This will not be in RC5 - only the final release Tziporet From tziporet at mellanox.co.il Wed Jun 13 07:25:53 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 13 Jun 2007 17:25:53 +0300 Subject: [ofa-general] OFED 1.2 rc5 release In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9015634B7@mtlexch01.mtl.com> References: <43AA3CB3C1BF5A499F5AAD31CA5023AC06624A26@mtlexch01.mtl.com> <6C2C79E72C305246B504CBA17B5500C9015634B7@mtlexch01.mtl.com> Message-ID: <6C2C79E72C305246B504CBA17B5500C90156362A@mtlexch01.mtl.com> Hi, OFED 1.2-RC5 is available on http://www.openfabrics.org/builds/ofed-1.2/ File: OFED-1.2-rc5.tgz To get BUILD_ID run ofed_info Please report any issues in bugzilla https://bugs.openfabrics.org/ The GA release is expected next Wed (June 20) based on RC5 tests Tziporet & Vlad ======================================================================== Release information: OS support: Novell: - SLES 9.0 SP3 - SLES10 - SLES10 SP1 RC5 Redhat: - Redhat EL4 up3, up4 and up5 - Redhat EL5 kernel.org: - 2.6.20 - 2.6.19 Note: Fedora C6 and SuSE Pro 10 are not part of the official list. We keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. Systems: * x86_64 * x86 * ia64 * ppc64 Main changes from OFED-1.1-rc4: =============================== 1. Fixed 8 bugs (see attached for fixed issues) 2. Added support for SLES10 SP1 RC5 (tvflash is disabled for now) 3. Added support for iSER on RHEL 4 4. Updated documents - all owners please review to make sure docs of your component is updated. See bugzilla for all open issues. Tasks that should be completed for the GA release: 1. Complete all documentation (release notes, README, etc.) 2. Run all QA tests on all platforms -------------- next part -------------- A non-text attachment was scrubbed... Name: rc5_fixed_bugs.csv Type: application/octet-stream Size: 719 bytes Desc: rc5_fixed_bugs.csv URL: From jsquyres at cisco.com Wed Jun 13 07:46:27 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 13 Jun 2007 10:46:27 -0400 Subject: [ofa-general] OFED 1.2 rc5 release In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90156362A@mtlexch01.mtl.com> References: <43AA3CB3C1BF5A499F5AAD31CA5023AC06624A26@mtlexch01.mtl.com> <6C2C79E72C305246B504CBA17B5500C9015634B7@mtlexch01.mtl.com> <6C2C79E72C305246B504CBA17B5500C90156362A@mtlexch01.mtl.com> Message-ID: Here is a minor spacing/nits patch for MPI_README.txt. Additionally, I think that all three "Setup for * MPI..." sections should be modified so that they are consistent with each other. Specifically, I notice that the MVAPICH and MVAPICH2 sections make reference to sourcing shell setup files. This is obsolete; there is a whole section on the mpi-selector that should address setting up for Open MPI and MVAPICH*. See Section 3.1 for what I thought we were going to talk about in the "Setup for * MPI ..." sections. Regardless of what we decide to discuss, the 3 sections should be consistent. Also, it seems a little odd that the ordering is MVAPICH, OMPI, MVAPICH2. Shouldn't MVAPICH and MVAPICH2 go together? If we want to go alphabetically, we should go in order: MVAPICH, MVAPICH2, OMPI. It just seems odd that the 2 MVAPICH sections are not next to each other. -------------- next part -------------- A non-text attachment was scrubbed... Name: MPI_README.patch Type: application/octet-stream Size: 2861 bytes Desc: not available URL: -------------- next part -------------- On Jun 13, 2007, at 10:25 AM, Tziporet Koren wrote: > Hi, > > OFED 1.2-RC5 is available on > http://www.openfabrics.org/builds/ofed-1.2/ > File: OFED-1.2-rc5.tgz > To get BUILD_ID run ofed_info > > Please report any issues in bugzilla https://bugs.openfabrics.org/ > > The GA release is expected next Wed (June 20) based on RC5 tests > > Tziporet & Vlad > > ====================================================================== > == > > Release information: > > OS support: > Novell: > - SLES 9.0 SP3 > - SLES10 > - SLES10 SP1 RC5 > Redhat: > - Redhat EL4 up3, up4 and up5 > - Redhat EL5 > kernel.org: > - 2.6.20 > - 2.6.19 > > Note: Fedora C6 and SuSE Pro 10 are not part of the official list. > We keep the backport patches for these OSes and make sure OFED compile > and loaded properly but will not do full QA cycle. > > Systems: > * x86_64 > * x86 > * ia64 > * ppc64 > > Main changes from OFED-1.1-rc4: > =============================== > 1. Fixed 8 bugs (see attached for fixed issues) > 2. Added support for SLES10 SP1 RC5 (tvflash is disabled for now) > 3. Added support for iSER on RHEL 4 > 4. Updated documents - all owners please review to make sure docs of > your component is updated. > > See bugzilla for all open issues. > > Tasks that should be completed for the GA release: > 1. Complete all documentation (release notes, README, etc.) > 2. Run all QA tests on all platforms > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Cisco Systems From erezz at voltaire.com Wed Jun 13 07:49:14 2007 From: erezz at voltaire.com (Erez Zilber) Date: Wed, 13 Jun 2007 17:49:14 +0300 Subject: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits In-Reply-To: <20070612084108.GK6470@mellanox.co.il> References: <20070612084108.GK6470@mellanox.co.il> Message-ID: <467003EA.7070901@voltaire.com> > Erez, and other iser maintainers, I had a problem with RHEL4 iscsi backports > (scsi_flush_work isn't exported) I decided that since it isn't > called on older kernels it's reasonably safe to just comment it out, > but would be interested to hear you opinion. > See it in this sub-directory: > kernel_patches/backport/2.6.9_U2/libiscsi_no_flush_to_2_6_9.patch > This leads me to something that I thought about in the past. Old kernels (i.e. the RH4 kernels) don't have the SCSI work queue. Therefore, I used schedule_work instead of scsi_queue_work. Now, I cannot replace scsi_flush_work with flush_workqueue because I'm using a workqueue which does not belong to me (and, therefore, I cannot flush it). I'm thinking about adding a backport that will create a workqueue for each session in open-iscsi. With this, I can queue & flush. Mike - what do you think about that? I think that creating a workqueue in open-iscsi per session will be the closer thing to the SCSI workqueue that we have in new kernels. Erez From andrey.slepuhin at t-platforms.ru Wed Jun 13 07:56:57 2007 From: andrey.slepuhin at t-platforms.ru (Andrey Slepuhin) Date: Wed, 13 Jun 2007 18:56:57 +0400 Subject: [ofa-general] Problems with mlx4 Message-ID: <467005B9.8070708@t-platforms.ru> Dear folks, I just setup a test cluster using ConnectX cards, but I can not get link up. I downloaded the kernel from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git After inserting the modules I see that the card was initialized: Jun 13 22:17:23 testnode1 kernel: mlx4_core: Mellanox ConnectX core driver v0.01 (May 1, 2007) Jun 13 22:17:23 testnode1 kernel: mlx4_core: Initializing 0000:07:00.0 Jun 13 22:17:23 testnode1 kernel: ACPI: PCI Interrupt 0000:07:00.0[A] -> GSI 16 (level, low) -> IRQ 16 Jun 13 22:17:23 testnode1 kernel: PCI: Setting latency timer of device 0000:07:00.0 to 64 But the link remains in "DOWN" state: testnode1:~ # /opt/ofed/bin/ibstatus Infiniband device 'mlx4_0' port 1 status: default gid: fe80:0000:0000:0000:0002:c903:0000:07a1 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 2: Polling rate: 20 Gb/sec (4X DDR) Infiniband device 'mlx4_0' port 2 status: default gid: fe80:0000:0000:0000:0002:c903:0000:07a2 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 2: Polling rate: 20 Gb/sec (4X DDR) I tried different ports and cables but without success. Do you have any idea what's going wrong? The nodes configuration is: Intel S5000PSL motherboard, 2xXeon 5345, 8GB RAM All the nodes are connected to Flextronics (Mellanox) 24-port DDR switch. I'm running SLES10 with the kernel from Roland's tree: testnode1:~ # uname -a Linux testnode1 2.6.22-rc3 #1 SMP Wed Jun 6 23:56:36 MSD 2007 x86_64 x86_64 x86_64 GNU/Linux Any help will be much appreciated. Thanks in advance, Andrey From minich at ornl.gov Wed Jun 13 08:05:00 2007 From: minich at ornl.gov (Makia Minich) Date: Wed, 13 Jun 2007 11:05:00 -0400 Subject: [ofa-general] Problems with mlx4 In-Reply-To: <467005B9.8070708@t-platforms.ru> References: <467005B9.8070708@t-platforms.ru> Message-ID: <200706131105.00563.minich@ornl.gov> Are you running an SM anywhere? If I remember correctly, the Flextronics switch does not have an embeded SM. On Wednesday 13 June 2007 10:56:57 am Andrey Slepuhin wrote: > Dear folks, > > I just setup a test cluster using ConnectX cards, but I can not get link > up. I downloaded the kernel from > > git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git > > After inserting the modules I see that the card was initialized: > > Jun 13 22:17:23 testnode1 kernel: mlx4_core: Mellanox ConnectX core > driver v0.01 (May 1, 2007) > Jun 13 22:17:23 testnode1 kernel: mlx4_core: Initializing 0000:07:00.0 > Jun 13 22:17:23 testnode1 kernel: ACPI: PCI Interrupt 0000:07:00.0[A] -> > GSI 16 (level, low) -> IRQ 16 > Jun 13 22:17:23 testnode1 kernel: PCI: Setting latency timer of device > 0000:07:00.0 to 64 > > But the link remains in "DOWN" state: > > testnode1:~ # /opt/ofed/bin/ibstatus > Infiniband device 'mlx4_0' port 1 status: > default gid: fe80:0000:0000:0000:0002:c903:0000:07a1 > base lid: 0x0 > sm lid: 0x0 > state: 1: DOWN > phys state: 2: Polling > rate: 20 Gb/sec (4X DDR) > > Infiniband device 'mlx4_0' port 2 status: > default gid: fe80:0000:0000:0000:0002:c903:0000:07a2 > base lid: 0x0 > sm lid: 0x0 > state: 1: DOWN > phys state: 2: Polling > rate: 20 Gb/sec (4X DDR) > > I tried different ports and cables but without success. Do you have any > idea what's going wrong? > The nodes configuration is: > Intel S5000PSL motherboard, 2xXeon 5345, 8GB RAM > All the nodes are connected to Flextronics (Mellanox) 24-port DDR switch. > I'm running SLES10 with the kernel from Roland's tree: > testnode1:~ # uname -a > Linux testnode1 2.6.22-rc3 #1 SMP Wed Jun 6 23:56:36 MSD 2007 x86_64 > x86_64 x86_64 GNU/Linux > > Any help will be much appreciated. > > Thanks in advance, > Andrey > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general -- Makia Minich National Center for Computation Science Oak Ridge National Laboratory --*-- Imagine no possessions I wonder if you can - John Lennon From rdreier at cisco.com Wed Jun 13 08:05:55 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 13 Jun 2007 08:05:55 -0700 Subject: [ofa-general] Problems with mlx4 In-Reply-To: <467005B9.8070708@t-platforms.ru> (Andrey Slepuhin's message of "Wed, 13 Jun 2007 18:56:57 +0400") References: <467005B9.8070708@t-platforms.ru> Message-ID: > I just setup a test cluster using ConnectX cards, but I can not get > link up. Most likely you need to update your switch FW. You need Anafa2 FW version 1.0 to negotiate a DDR link with ConnectX. BTW what firmware version do you have on your HCAs? You probably want to update to 2.0.156 (the mlx4 driver won't work with 2.0.158 for a day or two still) so that you don't have to monkey around with hard-coding your switch ports to DDR only. - R. From rdreier at cisco.com Wed Jun 13 08:06:31 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 13 Jun 2007 08:06:31 -0700 Subject: [ofa-general] Problems with mlx4 In-Reply-To: <200706131105.00563.minich@ornl.gov> (Makia Minich's message of "Wed, 13 Jun 2007 11:05:00 -0400") References: <467005B9.8070708@t-platforms.ru> <200706131105.00563.minich@ornl.gov> Message-ID: > Are you running an SM anywhere? If I remember correctly, the Flextronics > switch does not have an embeded SM. Even without an SM the ports will go to INIT (and if the ports are DOWN then an SM can't do anything to help). - R. From andrey.slepuhin at t-platforms.ru Wed Jun 13 08:09:10 2007 From: andrey.slepuhin at t-platforms.ru (Andrey Slepuhin) Date: Wed, 13 Jun 2007 19:09:10 +0400 Subject: [ofa-general] Problems with mlx4 In-Reply-To: <200706131105.00563.minich@ornl.gov> References: <467005B9.8070708@t-platforms.ru> <200706131105.00563.minich@ornl.gov> Message-ID: <46700896.7090807@t-platforms.ru> No, I can not start OpenSM just because the port after loading the driver is in the "DOWN" state, not "INIT". Best regards, Andrey Makia Minich wrote: > Are you running an SM anywhere? If I remember correctly, the Flextronics > switch does not have an embeded SM. > > On Wednesday 13 June 2007 10:56:57 am Andrey Slepuhin wrote: > >> Dear folks, >> >> I just setup a test cluster using ConnectX cards, but I can not get link >> up. I downloaded the kernel from >> >> git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git >> >> After inserting the modules I see that the card was initialized: >> >> Jun 13 22:17:23 testnode1 kernel: mlx4_core: Mellanox ConnectX core >> driver v0.01 (May 1, 2007) >> Jun 13 22:17:23 testnode1 kernel: mlx4_core: Initializing 0000:07:00.0 >> Jun 13 22:17:23 testnode1 kernel: ACPI: PCI Interrupt 0000:07:00.0[A] -> >> GSI 16 (level, low) -> IRQ 16 >> Jun 13 22:17:23 testnode1 kernel: PCI: Setting latency timer of device >> 0000:07:00.0 to 64 >> >> But the link remains in "DOWN" state: >> >> testnode1:~ # /opt/ofed/bin/ibstatus >> Infiniband device 'mlx4_0' port 1 status: >> default gid: fe80:0000:0000:0000:0002:c903:0000:07a1 >> base lid: 0x0 >> sm lid: 0x0 >> state: 1: DOWN >> phys state: 2: Polling >> rate: 20 Gb/sec (4X DDR) >> >> Infiniband device 'mlx4_0' port 2 status: >> default gid: fe80:0000:0000:0000:0002:c903:0000:07a2 >> base lid: 0x0 >> sm lid: 0x0 >> state: 1: DOWN >> phys state: 2: Polling >> rate: 20 Gb/sec (4X DDR) >> >> I tried different ports and cables but without success. Do you have any >> idea what's going wrong? >> The nodes configuration is: >> Intel S5000PSL motherboard, 2xXeon 5345, 8GB RAM >> All the nodes are connected to Flextronics (Mellanox) 24-port DDR switch. >> I'm running SLES10 with the kernel from Roland's tree: >> testnode1:~ # uname -a >> Linux testnode1 2.6.22-rc3 #1 SMP Wed Jun 6 23:56:36 MSD 2007 x86_64 >> x86_64 x86_64 GNU/Linux >> >> Any help will be much appreciated. >> >> Thanks in advance, >> Andrey >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> > > From andrey.slepuhin at t-platforms.ru Wed Jun 13 08:14:37 2007 From: andrey.slepuhin at t-platforms.ru (Andrey Slepuhin) Date: Wed, 13 Jun 2007 19:14:37 +0400 Subject: [ofa-general] Problems with mlx4 In-Reply-To: References: <467005B9.8070708@t-platforms.ru> Message-ID: <467009DD.804@t-platforms.ru> That's what I afraid of... Ok, I will try to update the switch firmware, but do you have a link to ConnectX firmware? It is not present at public Mellanox site... Thanks, Andrey Roland Dreier wrote: > > I just setup a test cluster using ConnectX cards, but I can not get > > link up. > > Most likely you need to update your switch FW. You need Anafa2 FW > version 1.0 to negotiate a DDR link with ConnectX. > > BTW what firmware version do you have on your HCAs? You probably want > to update to 2.0.156 (the mlx4 driver won't work with 2.0.158 for a > day or two still) so that you don't have to monkey around with > hard-coding your switch ports to DDR only. > > - R. > From landman at scalableinformatics.com Wed Jun 13 08:15:08 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 13 Jun 2007 11:15:08 -0400 Subject: [ofa-general] quick IPoIB config question Message-ID: <467009FC.3070402@scalableinformatics.com> Hi folks: Built OFED-1.2-rc4 on OpenSuSE 10.2, works fine as long as I turn of 32-bit build, and update to a 2.6.20 kernel. Installed the RPMs after build, and the system appears to be fine/well behaved. Is there a OFED-specific technique to have the ib0 interface configure at boot time, after drivers load? This might be distribution specific. I created a file named /etc/sysconfig/network/ifcfg-ib0 which contained BOOTPROTO='static' MTU='' REMOTE_IPADDR='' STARTMODE='onboot' USERCONTROL='no' NETMASK='255.255.0.0' IPADDR='10.1.32.2' DEVICE='ib0' Bringing the interface up with an 'ifconfig ib0 up' doesn't seem to assign the IP address and netmask to it. Hence my question. Is there an OFED specific method of configuring this (e.g. a config file I need to edit/create), or is it distribution dependent? If I force the issue with an ifconfig, it looks like it works fine. This is ok as a work around, and I can create an /etc/init.d/ib or similar to force the issue. I would prefer to do this "the right way", and if there is someone with guidance/pointers as to what that is, I would prefer to follow that. Thanks. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From rdreier at cisco.com Wed Jun 13 08:23:35 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 13 Jun 2007 08:23:35 -0700 Subject: [ofa-general] Problems with mlx4 In-Reply-To: <467009DD.804@t-platforms.ru> (Andrey Slepuhin's message of "Wed, 13 Jun 2007 19:14:37 +0400") References: <467005B9.8070708@t-platforms.ru> <467009DD.804@t-platforms.ru> Message-ID: > That's what I afraid of... Ok, I will try to update the switch > firmware, but do you have a link to ConnectX firmware? It is not > present at public Mellanox site... I don't have a link. I would suggest contacting whoever supplied your HCAs to you. - R. From minich at ornl.gov Wed Jun 13 08:23:15 2007 From: minich at ornl.gov (Makia Minich) Date: Wed, 13 Jun 2007 11:23:15 -0400 Subject: [ofa-general] Problems with mlx4 In-Reply-To: References: <467005B9.8070708@t-platforms.ru> <200706131105.00563.minich@ornl.gov> Message-ID: <200706131123.15262.minich@ornl.gov> You're right ... I was only half paying attention. I had the same problem with these cards, and I needed to upgrade firmware. Afterwhich, they came up and worked. On Wednesday 13 June 2007 11:06:31 am Roland Dreier wrote: > > Are you running an SM anywhere? If I remember correctly, the > > Flextronics switch does not have an embeded SM. > > Even without an SM the ports will go to INIT (and if the ports are > DOWN then an SM can't do anything to help). > > - R. -- Makia Minich National Center for Computation Science Oak Ridge National Laboratory --*-- Imagine no possessions I wonder if you can - John Lennon From michaelc at cs.wisc.edu Wed Jun 13 08:37:11 2007 From: michaelc at cs.wisc.edu (Mike Christie) Date: Wed, 13 Jun 2007 10:37:11 -0500 Subject: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits In-Reply-To: <467003EA.7070901@voltaire.com> References: <20070612084108.GK6470@mellanox.co.il> <467003EA.7070901@voltaire.com> Message-ID: <46700F27.1000000@cs.wisc.edu> Erez Zilber wrote: >> Erez, and other iser maintainers, I had a problem with RHEL4 iscsi backports >> (scsi_flush_work isn't exported) I decided that since it isn't >> called on older kernels it's reasonably safe to just comment it out, >> but would be interested to hear you opinion. >> See it in this sub-directory: >> kernel_patches/backport/2.6.9_U2/libiscsi_no_flush_to_2_6_9.patch >> > > This leads me to something that I thought about in the past. Old kernels > (i.e. the RH4 kernels) don't have the SCSI work queue. Therefore, I used > schedule_work instead of scsi_queue_work. Now, I cannot replace > scsi_flush_work with flush_workqueue because I'm using a workqueue which > does not belong to me (and, therefore, I cannot flush it). > > I'm thinking about adding a backport that will create a workqueue for > each session in open-iscsi. With this, I can queue & flush. Mike - what > do you think about that? I think that creating a workqueue in open-iscsi > per session will be the closer thing to the SCSI workqueue that we have > in new kernels. > Yeah, that sounds fine. Just to be clear, you would want to create the single threaded work queue (create_singlethread_workqueue) instead of the normal thread per cpu work queue for each session. From rdreier at cisco.com Wed Jun 13 09:24:15 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 13 Jun 2007 09:24:15 -0700 Subject: [ofa-general] [GIT PULL] please pull tvflash.git Message-ID: Vlad, please pull from git://staging.openfabrics.org/~rdreier/tvflash.git to get tvflash updates that will fix problems building on SLES 10 SP1 and Fedora 7 due to linking with libgz or libz (https://bugs.openfabrics.org/show_bug.cgi?id=558). Thanks, Roland From mst at dev.mellanox.co.il Wed Jun 13 09:38:21 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Jun 2007 19:38:21 +0300 Subject: [ofa-general] [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <466F36C8.5010507@linux.vnet.ibm.com> References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com> <466F36C8.5010507@linux.vnet.ibm.com> Message-ID: <20070613163821.GB12277@mellanox.co.il> Here's how I would go about emulating SRQ in ehca in software. I knocked this out in several hours, so this is completely untested (not even compiled, that's why there are no Makefile bits), but it seemed an easiest way to get the message across on what I consider the right way to do it. Note how this both has no overhead for HCAs with hardware srq support and is smaller than nosrq patches. The idea here is that you can emulate enough of the SRQ interface in ehca to make IPoIB CM work without changes: keep QPs on a list, and distribute posted WRs between them evenly. This naturally does not solve the scalability problems that IPoIB CM without SRQ would have, but at least it contains them within ehca. Another advantage of this approach: noSRQ issues are separated out, so we'll be able to continue working on IPoIB CM without maintaining two code paths. There are obvious optimizations that can be done (e.g. each wr is copied twice on data path, we only need a unidirectional list of cqes ...) hopefully someone at IBM will look into this: I wanted to avoid touching low-level code I don't understand and can't test, as much as possible. Known bugs: Last wqe reached event is missing in this implementation: I've run out of time, and it's pretty trivial to add anyway, by adding a per-QP counter of outstanding WRs. We'll need a tasklet or a thread for the callback though: is there a tasklet/thread that can be reused for this? Caveats: As an optimization, I used a bit in qp_token to signal SRQ presence. No idea whether this works in practice in your hardware. If not, another way to detect SRQ WC will have to be found. Again, hopefully someone at IBM will look into this. Signed-off-by: Michael S. Tsirkin --- ehca_classes.h | 6 + ehca_irq.c | 2 ehca_iverbs.h | 6 + ehca_main.c | 3 ehca_qp.c | 14 ++- ehca_reqs.c | 3 ehca_srq.c | 237 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ehca_uverbs.c | 2 8 files changed, 269 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h index 1d286d3..e54bb82 100644 --- a/drivers/infiniband/hw/ehca/ehca_classes.h +++ b/drivers/infiniband/hw/ehca/ehca_classes.h @@ -281,6 +281,9 @@ extern spinlock_t hcall_lock; extern struct idr ehca_qp_idr; extern struct idr ehca_cq_idr; +#define EHCA_QP_TOKEN_SRQ (1 << 31) +#define EHCA_QP_TOKEN(token) (token & ~EHCA_QP_TOKEN_SRQ) + extern int ehca_static_rate; extern int ehca_port_act_time; extern int ehca_use_hp_mr; @@ -344,4 +347,7 @@ int ehca_cq_assign_qp(struct ehca_cq *cq, struct ehca_qp *qp); int ehca_cq_unassign_qp(struct ehca_cq *cq, unsigned int qp_num); struct ehca_qp* ehca_cq_get_qp(struct ehca_cq *cq, int qp_num); +int ehca_srq_handle_wc(struct ib_wc *wc, unsigned token); +int ehca_srq_attach(struct ib_srq *srq, struct ib_qp *qp); + #endif diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index 100329b..f3b078c 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -182,7 +182,7 @@ static void qp_event_callback(struct ehca_shca *shca, u32 token = EHCA_BMASK_GET(EQE_QP_TOKEN, eqe); spin_lock_irqsave(&ehca_qp_idr_lock, flags); - qp = idr_find(&ehca_qp_idr, token); + qp = idr_find(&ehca_qp_idr, EHCA_QP_TOKEN(token)); spin_unlock_irqrestore(&ehca_qp_idr_lock, flags); diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h index 37e7fe0..0f530cc 100644 --- a/drivers/infiniband/hw/ehca/ehca_iverbs.h +++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h @@ -178,4 +178,10 @@ void ehca_free_fw_ctrlblock(void *ptr); #define ehca_free_fw_ctrlblock(ptr) free_page((unsigned long)(ptr)) #endif +struct ib_srq *ehca_create_srq(struct ib_pd *pd, + struct ib_srq_init_attr *srq_init_attr); +int ehca_destroy_srq(struct ib_srq *srq); +int ehca_post_srq_recv(struct ib_srq *ib_srq, struct ib_recv_wr *recv_wr, + struct ib_recv_wr **bad_recv_wr); + #endif diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index c3f99f3..bfab202 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -330,6 +330,9 @@ int ehca_init_device(struct ehca_shca *shca) /* shca->ib_device.modify_ah = ehca_modify_ah; */ shca->ib_device.query_ah = ehca_query_ah; shca->ib_device.destroy_ah = ehca_destroy_ah; + shca->ib_device.create_srq = ehca_create_srq; + shca->ib_device.destroy_srq = ehca_destroy_srq; + shca->ib_device.post_srq_recv = ehca_post_srq_recv; shca->ib_device.create_qp = ehca_create_qp; shca->ib_device.modify_qp = ehca_modify_qp; shca->ib_device.query_qp = ehca_query_qp; diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c index b5bc787..9a14e90 100644 --- a/drivers/infiniband/hw/ehca/ehca_qp.c +++ b/drivers/infiniband/hw/ehca/ehca_qp.c @@ -486,6 +486,9 @@ struct ib_qp *ehca_create_qp(struct ib_pd *pd, goto create_qp_exit0; } + if (init_attr->srq) + my_qp->token |= EHCA_QP_TOKEN_SRQ; + parms.servicetype = ibqptype2servicetype(init_attr->qp_type); if (parms.servicetype < 0) { ret = -EINVAL; @@ -663,6 +666,13 @@ struct ib_qp *ehca_create_qp(struct ib_pd *pd, } } + if (my_qp->ib_qp.srq) { + ret = ehca_srq_attach(my_qp->ib_qp.srq, my_qp->ib_qp); + if (ret) + goto create_qp_exit3; + } + + return &my_qp->ib_qp; create_qp_exit3: @@ -674,7 +684,7 @@ create_qp_exit2: create_qp_exit1: spin_lock_irqsave(&ehca_qp_idr_lock, flags); - idr_remove(&ehca_qp_idr, my_qp->token); + idr_remove(&ehca_qp_idr, EHCA_QP_TOKEN(my_qp->token)); spin_unlock_irqrestore(&ehca_qp_idr_lock, flags); create_qp_exit0: @@ -1408,7 +1418,7 @@ int ehca_destroy_qp(struct ib_qp *ibqp) } spin_lock_irqsave(&ehca_qp_idr_lock, flags); - idr_remove(&ehca_qp_idr, my_qp->token); + idr_remove(&ehca_qp_idr, EHCA_QP_TOKEN(my_qp->token)); spin_unlock_irqrestore(&ehca_qp_idr_lock, flags); h_ret = hipz_h_destroy_qp(shca->ipz_hca_handle, my_qp); diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c b/drivers/infiniband/hw/ehca/ehca_reqs.c index caec9de..b151c67 100644 --- a/drivers/infiniband/hw/ehca/ehca_reqs.c +++ b/drivers/infiniband/hw/ehca/ehca_reqs.c @@ -601,6 +601,9 @@ poll_cq_one_exit0: if (cqe_count > 0) hipz_update_feca(my_cq, cqe_count); + if ((wc->opcode & IB_WC_RECV) && (cqe->qp_token & EHCA_QP_TOKEN_SRQ)) + ret = ehca_srq_handle_wc(wc, cqe->qp_token); + return ret; } diff --git a/drivers/infiniband/hw/ehca/ehca_srq.c b/drivers/infiniband/hw/ehca/ehca_srq.c new file mode 100644 index 0000000..1e1574a --- /dev/null +++ b/drivers/infiniband/hw/ehca/ehca_srq.c @@ -0,0 +1,237 @@ +/* + * SRQ emulation for ehca. + * + * Author: Michael S. Tsirkin + * + * Copyright (c) 2007 Mellanox Technologies. All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + + +#include +#include +#include +#include "ehca_classes.h" + +#define EHCA_QPS_PER_SRQ 16 + +struct ehca_srq_cqe { + struct list_head list; + struct ib_qp *qp; +}; + +struct ehca_srq { + struct ib_srq ib_srq; + struct ib_srq_attr attr; + struct spinlock lock; + + struct ib_recv_wr *wrs; + struct ehca_srq_cqe *cqes; + + struct ib_recv_wr *first_polled; /* Polled or unused */ + struct ib_recv_wr *first_posted; /* Posted on SRQ but not on QP */ + + struct list_head polled_cqes; /* Polled */ + struct list_head free_cqes; /* Posted or unused */ +}; + +static int ehca_srq_repost(struct ehca_srq *srq) +{ + struct ib_recv_wr wr, *wrp, *bad_recv_wr; + struct ehca_srq_cqe *c, n; + unsigned long flags; + int rc = 0; + + spin_lock_irqsave(&srq->lock, flags); + + list_for_each_entry_safe(c, n, &srq->polled_cqes, list) { + wrp = srq->first_posted; + if (!wrp) + break; + memcpy(&wr, wrp, sizeof wr); + wr.next = NULL; + wr.wr_id = (u64)wrp; + rc = ib_post_recv(c->qp, &wr, &bad_recv_wr); + if (rc) + break; + + srq->first_posted = wrp->next; + wrp->next = NULL; + list_del(&c->list); + } + + spin_unlock_irqrestore(&srq->lock, flags); + return rc; +} + +int ehca_srq_handle_wc(struct ib_wc *wc, unsigned token) +{ + struct ehca_qp *qp; + struct ehca_srq *srq; + struct ehca_srq_cqe *cqe; + struct ib_recv_wr *wr; + + spin_lock_irqsave(&ehca_qp_idr_lock, flags); + qp = idr_find(&ehca_qp_idr, EHCA_QP_TOKEN(token)); + spin_unlock_irqrestore(&ehca_qp_idr_lock, flags); + + if (!qp) + return -EINVAL; + + wc->qp = &qp->ib_qp; + srq = container_of(qp->ib_qp.srq, *srq, ib_srq); + spin_lock_irqsave(&srq->lock, flags); + BUG_ON(list_empty(&srq->free_cqes)); + cqe = container_of(srq->free_cqes.next, typeof *cqe, list); + cqe->qp = &qp->ib_qp; + list_move(&cqe->list, &srq->polled_cqes); + wr = (void *)wc->wr_id; + wc->wr_id = wr->wr_id; + wr->next = srq->first_polled; + srq->first_polled = wr; + spin_unlock_irqrestore(&srq->lock, flags); + return 0; +} + +int ehca_post_srq_recv(struct ib_srq *ib_srq, struct ib_recv_wr *recv_wr, + struct ib_recv_wr **bad_recv_wr); +{ + struct ib_recv_wr *wr, *copy; + struct ehca_srq *srq; + + srq = container_of(ib_srq, *srq, ib_srq); + for (wr = recv_wr; wr; wr = wr->next) { + copy = srq->first_polled; + if (!copy) { + *bad_recv_wr = wr; + return -ENOMEM; + } + srq->first_polled = copy->next; + + memcpy(copy, wr, sizeof *copy); + if (wr->num_sge) + memcpy(copy->sg_list, wr->sg_list, + wr->num_sge * sizeof *copy->sg_list); + + copy->next = srq->first_posted; + srq->first_posted = copy; + } + + ehca_srq_repost(srq); + return 0; +} + +int ehca_srq_attach(struct ib_srq *ib_srq, struct ib_qp *qp) +{ + int i; + struct ehca_srq_cqe *cqe; + struct ehca_srq *srq; + + srq = container_of(ib_srq, *srq, ib_srq); + + spin_lock_irq(&srq->lock); + for (i = 0; i < srq->attr.max_wrs / EHCA_QPS_PER_SRQ; ++i) { + if (list_empty(&srq->free_cqes)) + break; + cqe = list_entry(srq->free_cqes.next, typeof *cqe, list); + cqe->qp = qp; + list_move_tail(&cqe->list, &srq->polled_cqes); + } + spin_unlock_irq(&srq->lock); + if (!i) + return -ENOMEM; + + return ehca_srq_repost(srq); +} + +struct ib_srq *ehca_create_srq(struct ib_pd *pd, + struct ib_srq_init_attr *srq_init_attr) +{ + struct ehca_srq *srq; + int i = 0; + + srq = kmalloc(*srq, GFP_KERNEL); + if (!srq) + return ERR_PTR(-ENOMEM); + + memcpy(&srq->attr, srq_init_attr, sizeof srq->attr); + spin_lock_init(&srq->lock); + INIT_LIST_HEAD(&srq->polled_cqes); + INIT_LIST_HEAD(&srq->free_cqes); + srq->first_posted = NULL; + srq->first_polled = NULL; + + srq->wrs = kmalloc(sizeof *srq->wrs * srq->attr.max_wrs, GFP_KERNEL); + srq->cqes = kmalloc(sizeof *srq->cqes * srq->attr.max_wrs, GFP_KERNEL); + if (!srq->wrs || !srq->cqes) + goto err_arrays; + + for(i = 0; i < srq->attr.max_wrs; ++i) { + srq->wrs[i] = kmalloc(sizeof srq->wrs[i], GFP_KERNEL); + if (!srq->wrs[i]) + goto err_wr; + srq->wrs[i]->sg_list = kmalloc(sizeof srq->wrs[i]->sg_list * + srq->attr.max_sge, GFP_KERNEL); + if (!srq->wrs[i]->sg_list) { + kfree(srq->wrs[i]); + goto err_wr; + } + list_add(&srq->cqes[i].list, &srq->free_cqes); + srq->wrs[i]->next = srq->first_polled; + srq->first_polled = srq->wrs[i]; + } + + return &srq->ib_srq; + +err_wr: + while(--i >= 0) { + kfree(srq->wrs[i]->sg_list); + kfree(srq->wrs[i]); + } + +err_arrays: + kfree(srq->wrs); + kfree(srq->cqes); + return ERR_PTR(-ENOMEM); +} + +int ehca_destroy_srq(struct ib_srq *ib_srq) +{ + struct ehca_srq *srq; + int i; + + srq = container_of(ib_srq, *srq, ib_srq); + for (i = 0; i < srq->attr.max_wrs; ++i) { + kfree(srq->wrs[i]->sg_list); + kfree(srq->wrs[i]); + } + kfree(srq->wrs); + kfree(srq->cqes); +} diff --git a/drivers/infiniband/hw/ehca/ehca_uverbs.c b/drivers/infiniband/hw/ehca/ehca_uverbs.c index 73db920..a44354c 100644 --- a/drivers/infiniband/hw/ehca/ehca_uverbs.c +++ b/drivers/infiniband/hw/ehca/ehca_uverbs.c @@ -289,7 +289,7 @@ int ehca_mmap(struct ib_ucontext *context, struct vm_area_struct *vma) case 2: /* QP */ spin_lock_irqsave(&ehca_qp_idr_lock, flags); - qp = idr_find(&ehca_qp_idr, idr_handle); + qp = idr_find(&ehca_qp_idr, RHCA_QP_TOKEN(idr_handle)); spin_unlock_irqrestore(&ehca_qp_idr_lock, flags); /* make sure this mmap really belongs to the authorized user */ -- MST From rdreier at cisco.com Wed Jun 13 10:29:07 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 13 Jun 2007 10:29:07 -0700 Subject: [ofa-general] [PATCH/RFC] IB/mlx4: Handle new FW requirement for send request prefetching In-Reply-To: <200706051602.14182.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Tue, 5 Jun 2007 16:02:14 +0300") References: <200706051602.14182.jackm@dev.mellanox.co.il> Message-ID: I just queued this patch to handle new FW up. Please let me know if it looks OK to you, and I will ask Linus to pull it. Thanks. commit f22332295cb218ad12db2b521a34553ff5790c34 Author: Roland Dreier Date: Wed Jun 13 10:26:43 2007 -0700 IB/mlx4: Handle new FW requirement for send request prefetching New ConnectX firmware introduces FW command interface revision 2, which requires that for each QP, a chunk of send queue entries (the "headroom") is kept marked as invalid, so that the HCA doesn't get confused if it prefetches entries that haven't been posted yet. Add code to the driver to do this, and also update the user ABI so that userspace can request that the prefetcher be turned off for userspace QPs (we just leave the prefetcher on for all kernel QPs). Marking send queue entries this way is OK for older firmware too, so we change the driver to allow FW command interface revisions 1 and 2. Based on a patch from Jack Morgenstein . Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 048c527..e940521 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -355,7 +355,7 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq, wq = &(*cur_qp)->sq; wqe_ctr = be16_to_cpu(cqe->wqe_index); wq->tail += (u16) (wqe_ctr - (u16) wq->tail); - wc->wr_id = wq->wrid[wq->tail & (wq->max - 1)]; + wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)]; ++wq->tail; } else if ((*cur_qp)->ibqp.srq) { srq = to_msrq((*cur_qp)->ibqp.srq); @@ -364,7 +364,7 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq, mlx4_ib_free_srq_wqe(srq, wqe_ctr); } else { wq = &(*cur_qp)->rq; - wc->wr_id = wq->wrid[wq->tail & (wq->max - 1)]; + wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)]; ++wq->tail; } diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 93dac71..24ccadd 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -95,7 +95,8 @@ struct mlx4_ib_mr { struct mlx4_ib_wq { u64 *wrid; spinlock_t lock; - int max; + int wqe_cnt; + int max_post; int max_gs; int offset; int wqe_shift; @@ -113,6 +114,7 @@ struct mlx4_ib_qp { u32 doorbell_qpn; __be32 sq_signal_bits; + int sq_spare_wqes; struct mlx4_ib_wq sq; struct ib_umem *umem; @@ -123,6 +125,7 @@ struct mlx4_ib_qp { u8 alt_port; u8 atomic_rd_en; u8 resp_depth; + u8 sq_no_prefetch; u8 state; }; diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 4c15fa3..8fabe0d 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -109,6 +109,20 @@ static void *get_send_wqe(struct mlx4_ib_qp *qp, int n) return get_wqe(qp, qp->sq.offset + (n << qp->sq.wqe_shift)); } +/* + * Stamp a SQ WQE so that it is invalid if prefetched by marking the + * first four bytes of every 64 byte chunk with 0xffffffff, except for + * the very first chunk of the WQE. + */ +static void stamp_send_wqe(struct mlx4_ib_qp *qp, int n) +{ + u32 *wqe = get_send_wqe(qp, n); + int i; + + for (i = 16; i < 1 << (qp->sq.wqe_shift - 2); i += 16) + wqe[i] = 0xffffffff; +} + static void mlx4_ib_qp_event(struct mlx4_qp *qp, enum mlx4_event type) { struct ib_event event; @@ -201,18 +215,18 @@ static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, if (cap->max_recv_wr) return -EINVAL; - qp->rq.max = qp->rq.max_gs = 0; + qp->rq.wqe_cnt = qp->rq.max_gs = 0; } else { /* HW requires >= 1 RQ entry with >= 1 gather entry */ if (is_user && (!cap->max_recv_wr || !cap->max_recv_sge)) return -EINVAL; - qp->rq.max = roundup_pow_of_two(max(1U, cap->max_recv_wr)); + qp->rq.wqe_cnt = roundup_pow_of_two(max(1U, cap->max_recv_wr)); qp->rq.max_gs = roundup_pow_of_two(max(1U, cap->max_recv_sge)); qp->rq.wqe_shift = ilog2(qp->rq.max_gs * sizeof (struct mlx4_wqe_data_seg)); } - cap->max_recv_wr = qp->rq.max; + cap->max_recv_wr = qp->rq.max_post = qp->rq.wqe_cnt; cap->max_recv_sge = qp->rq.max_gs; return 0; @@ -236,8 +250,6 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, cap->max_send_sge + 2 > dev->dev->caps.max_sq_sg) return -EINVAL; - qp->sq.max = cap->max_send_wr ? roundup_pow_of_two(cap->max_send_wr) : 1; - qp->sq.wqe_shift = ilog2(roundup_pow_of_two(max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg), cap->max_inline_data + @@ -246,18 +258,25 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, qp->sq.max_gs = ((1 << qp->sq.wqe_shift) - send_wqe_overhead(type)) / sizeof (struct mlx4_wqe_data_seg); - qp->buf_size = (qp->rq.max << qp->rq.wqe_shift) + - (qp->sq.max << qp->sq.wqe_shift); + /* + * We need to leave 2 KB + 1 WQE of headroom in the SQ to + * allow HW to prefetch. + */ + qp->sq_spare_wqes = (2048 >> qp->sq.wqe_shift) + 1; + qp->sq.wqe_cnt = roundup_pow_of_two(cap->max_send_wr + qp->sq_spare_wqes); + + qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) + + (qp->sq.wqe_cnt << qp->sq.wqe_shift); if (qp->rq.wqe_shift > qp->sq.wqe_shift) { qp->rq.offset = 0; - qp->sq.offset = qp->rq.max << qp->rq.wqe_shift; + qp->sq.offset = qp->rq.wqe_cnt << qp->rq.wqe_shift; } else { - qp->rq.offset = qp->sq.max << qp->sq.wqe_shift; + qp->rq.offset = qp->sq.wqe_cnt << qp->sq.wqe_shift; qp->sq.offset = 0; } - cap->max_send_wr = qp->sq.max; - cap->max_send_sge = qp->sq.max_gs; + cap->max_send_wr = qp->sq.max_post = qp->sq.wqe_cnt - qp->sq_spare_wqes; + cap->max_send_sge = qp->sq.max_gs; cap->max_inline_data = (1 << qp->sq.wqe_shift) - send_wqe_overhead(type) - sizeof (struct mlx4_wqe_inline_seg); @@ -267,11 +286,11 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, static int set_user_sq_size(struct mlx4_ib_qp *qp, struct mlx4_ib_create_qp *ucmd) { - qp->sq.max = 1 << ucmd->log_sq_bb_count; + qp->sq.wqe_cnt = 1 << ucmd->log_sq_bb_count; qp->sq.wqe_shift = ucmd->log_sq_stride; - qp->buf_size = (qp->rq.max << qp->rq.wqe_shift) + - (qp->sq.max << qp->sq.wqe_shift); + qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) + + (qp->sq.wqe_cnt << qp->sq.wqe_shift); return 0; } @@ -307,6 +326,8 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, goto err; } + qp->sq_no_prefetch = ucmd.sq_no_prefetch; + err = set_user_sq_size(qp, &ucmd); if (err) goto err; @@ -334,6 +355,8 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, goto err_mtt; } } else { + qp->sq_no_prefetch = 0; + err = set_kernel_sq_size(dev, &init_attr->cap, init_attr->qp_type, qp); if (err) goto err; @@ -360,8 +383,8 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, if (err) goto err_mtt; - qp->sq.wrid = kmalloc(qp->sq.max * sizeof (u64), GFP_KERNEL); - qp->rq.wrid = kmalloc(qp->rq.max * sizeof (u64), GFP_KERNEL); + qp->sq.wrid = kmalloc(qp->sq.wqe_cnt * sizeof (u64), GFP_KERNEL); + qp->rq.wrid = kmalloc(qp->rq.wqe_cnt * sizeof (u64), GFP_KERNEL); if (!qp->sq.wrid || !qp->rq.wrid) { err = -ENOMEM; @@ -743,14 +766,17 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, context->mtu_msgmax = (attr->path_mtu << 5) | 31; } - if (qp->rq.max) - context->rq_size_stride = ilog2(qp->rq.max) << 3; + if (qp->rq.wqe_cnt) + context->rq_size_stride = ilog2(qp->rq.wqe_cnt) << 3; context->rq_size_stride |= qp->rq.wqe_shift - 4; - if (qp->sq.max) - context->sq_size_stride = ilog2(qp->sq.max) << 3; + if (qp->sq.wqe_cnt) + context->sq_size_stride = ilog2(qp->sq.wqe_cnt) << 3; context->sq_size_stride |= qp->sq.wqe_shift - 4; + if (cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT) + context->sq_size_stride |= !!qp->sq_no_prefetch << 7; + if (qp->ibqp.uobject) context->usr_page = cpu_to_be32(to_mucontext(ibqp->uobject->context)->uar.index); else @@ -884,16 +910,19 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, /* * Before passing a kernel QP to the HW, make sure that the - * ownership bits of the send queue are set so that the - * hardware doesn't start processing stale work requests. + * ownership bits of the send queue are set and the SQ + * headroom is stamped so that the hardware doesn't start + * processing stale work requests. */ if (!ibqp->uobject && cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT) { struct mlx4_wqe_ctrl_seg *ctrl; int i; - for (i = 0; i < qp->sq.max; ++i) { + for (i = 0; i < qp->sq.wqe_cnt; ++i) { ctrl = get_send_wqe(qp, i); ctrl->owner_opcode = cpu_to_be32(1 << 31); + + stamp_send_wqe(qp, i); } } @@ -1124,7 +1153,7 @@ static int mlx4_wq_overflow(struct mlx4_ib_wq *wq, int nreq, struct ib_cq *ib_cq struct mlx4_ib_cq *cq; cur = wq->head - wq->tail; - if (likely(cur + nreq < wq->max)) + if (likely(cur + nreq < wq->max_post)) return 0; cq = to_mcq(ib_cq); @@ -1132,7 +1161,7 @@ static int mlx4_wq_overflow(struct mlx4_ib_wq *wq, int nreq, struct ib_cq *ib_cq cur = wq->head - wq->tail; spin_unlock(&cq->lock); - return cur + nreq >= wq->max; + return cur + nreq >= wq->max_post; } int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, @@ -1165,8 +1194,8 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, goto out; } - ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.max - 1)); - qp->sq.wrid[ind & (qp->sq.max - 1)] = wr->wr_id; + ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.wqe_cnt - 1)); + qp->sq.wrid[ind & (qp->sq.wqe_cnt - 1)] = wr->wr_id; ctrl->srcrb_flags = (wr->send_flags & IB_SEND_SIGNALED ? @@ -1301,7 +1330,16 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, } ctrl->owner_opcode = mlx4_ib_opcode[wr->opcode] | - (ind & qp->sq.max ? cpu_to_be32(1 << 31) : 0); + (ind & qp->sq.wqe_cnt ? cpu_to_be32(1 << 31) : 0); + + /* + * We can improve latency by not stamping the last + * send queue WQE until after ringing the doorbell, so + * only stamp here if there are still more WQEs to post. + */ + if (wr->next) + stamp_send_wqe(qp, (ind + qp->sq_spare_wqes) & + (qp->sq.wqe_cnt - 1)); ++ind; } @@ -1324,6 +1362,9 @@ out: * and reach the HCA out of order. */ mmiowb(); + + stamp_send_wqe(qp, (ind + qp->sq_spare_wqes - 1) & + (qp->sq.wqe_cnt - 1)); } spin_unlock_irqrestore(&qp->rq.lock, flags); @@ -1344,7 +1385,7 @@ int mlx4_ib_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr, spin_lock_irqsave(&qp->rq.lock, flags); - ind = qp->rq.head & (qp->rq.max - 1); + ind = qp->rq.head & (qp->rq.wqe_cnt - 1); for (nreq = 0; wr; ++nreq, wr = wr->next) { if (mlx4_wq_overflow(&qp->rq, nreq, qp->ibqp.send_cq)) { @@ -1375,7 +1416,7 @@ int mlx4_ib_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr, qp->rq.wrid[ind] = wr->wr_id; - ind = (ind + 1) & (qp->rq.max - 1); + ind = (ind + 1) & (qp->rq.wqe_cnt - 1); } out: diff --git a/drivers/infiniband/hw/mlx4/user.h b/drivers/infiniband/hw/mlx4/user.h index 88c72d5..e2d11be 100644 --- a/drivers/infiniband/hw/mlx4/user.h +++ b/drivers/infiniband/hw/mlx4/user.h @@ -39,7 +39,7 @@ * Increment this value if any changes that break userspace ABI * compatibility are made. */ -#define MLX4_IB_UVERBS_ABI_VERSION 2 +#define MLX4_IB_UVERBS_ABI_VERSION 3 /* * Make sure that all structs defined in this file remain laid out so @@ -87,9 +87,10 @@ struct mlx4_ib_create_srq_resp { struct mlx4_ib_create_qp { __u64 buf_addr; __u64 db_addr; - __u8 log_sq_bb_count; - __u8 log_sq_stride; - __u8 reserved[6]; + __u8 log_sq_bb_count; + __u8 log_sq_stride; + __u8 sq_no_prefetch; + __u8 reserved[5]; }; #endif /* MLX4_IB_USER_H */ diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c index e7ca118..1a7e52d 100644 --- a/drivers/net/mlx4/fw.c +++ b/drivers/net/mlx4/fw.c @@ -38,7 +38,8 @@ #include "icm.h" enum { - MLX4_COMMAND_INTERFACE_REV = 1 + MLX4_COMMAND_INTERFACE_MIN_REV = 1, + MLX4_COMMAND_INTERFACE_MAX_REV = 2, }; extern void __buggy_use_of_MLX4_GET(void); @@ -491,7 +492,8 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev) ((fw_ver & 0x0000ffffull) << 16); MLX4_GET(cmd_if_rev, outbox, QUERY_FW_CMD_IF_REV_OFFSET); - if (cmd_if_rev != MLX4_COMMAND_INTERFACE_REV) { + if (cmd_if_rev < MLX4_COMMAND_INTERFACE_MIN_REV || + cmd_if_rev > MLX4_COMMAND_INTERFACE_MAX_REV) { mlx4_err(dev, "Installed FW has unsupported " "command interface revision %d.\n", cmd_if_rev); @@ -499,8 +501,8 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev) (int) (dev->caps.fw_ver >> 32), (int) (dev->caps.fw_ver >> 16) & 0xffff, (int) dev->caps.fw_ver & 0xffff); - mlx4_err(dev, "This driver version supports only revision %d.\n", - MLX4_COMMAND_INTERFACE_REV); + mlx4_err(dev, "This driver version supports only revisions %d to %d.\n", + MLX4_COMMAND_INTERFACE_MIN_REV, MLX4_COMMAND_INTERFACE_MAX_REV); err = -ENODEV; goto out; } From rdreier at cisco.com Wed Jun 13 10:34:39 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 13 Jun 2007 10:34:39 -0700 Subject: [ofa-general] [PATCH/RFC] libmlx4: Handle new FW requirement for send request prefetching In-Reply-To: (Roland Dreier's message of "Wed, 13 Jun 2007 10:29:07 -0700") References: <200706051602.14182.jackm@dev.mellanox.co.il> Message-ID: Similarly I just added this to libmlx4. The change to handle alignment for inline send segments will be a separate patch, and I'm still cleaning it up. Anyway, let me know if you see any problems with this. BTW, with FW 2.0.158, I am seeing the HCA FW crash after running ibv_srq_pingpong with default parameters. Not sure if this is a driver bug (I am using my latest kernel driver and libmlx4) or a firmware problem. commit 561da8d10e419ffb333fe6faf05004d9a3670e7a Author: Roland Dreier Date: Wed Jun 13 10:31:16 2007 -0700 Handle new FW requirement for send request prefetching New ConnectX firmware introduces FW command interface revision 2, which requires that for each QP, a chunk of send queue entries (the "headroom") is kept marked as invalid, so that the HCA doesn't get confused if it prefetches entries that haven't been posted yet. Add code to libmlx4 to do this. Also, handle the new kernel ABI that adds the sq_no_prefetch parameter to the create QP operation. We just hard-code sq_no_prefetch to 0 and always provide the full SQ headroom for now. Based on a patch from Jack Morgenstein . Signed-off-by: Roland Dreier diff --git a/src/cq.c b/src/cq.c index a1831ff..f3e3e3c 100644 --- a/src/cq.c +++ b/src/cq.c @@ -239,7 +239,7 @@ static int mlx4_poll_one(struct mlx4_cq *cq, wq = &(*cur_qp)->sq; wqe_index = ntohs(cqe->wqe_index); wq->tail += (uint16_t) (wqe_index - (uint16_t) wq->tail); - wc->wr_id = wq->wrid[wq->tail & (wq->max - 1)]; + wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)]; ++wq->tail; } else if ((*cur_qp)->ibv_qp.srq) { srq = to_msrq((*cur_qp)->ibv_qp.srq); @@ -248,7 +248,7 @@ static int mlx4_poll_one(struct mlx4_cq *cq, mlx4_free_srq_wqe(srq, wqe_index); } else { wq = &(*cur_qp)->rq; - wc->wr_id = wq->wrid[wq->tail & (wq->max - 1)]; + wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)]; ++wq->tail; } diff --git a/src/mlx4-abi.h b/src/mlx4-abi.h index 97f5dcd..20a40c9 100644 --- a/src/mlx4-abi.h +++ b/src/mlx4-abi.h @@ -36,7 +36,7 @@ #include #define MLX4_UVERBS_MIN_ABI_VERSION 2 -#define MLX4_UVERBS_MAX_ABI_VERSION 2 +#define MLX4_UVERBS_MAX_ABI_VERSION 3 struct mlx4_alloc_ucontext_resp { struct ibv_get_context_resp ibv_resp; @@ -86,7 +86,8 @@ struct mlx4_create_qp { __u64 db_addr; __u8 log_sq_bb_count; __u8 log_sq_stride; - __u8 reserved[6]; + __u8 sq_no_prefetch; /* was reserved in ABI 2 */ + __u8 reserved[5]; }; #endif /* MLX4_ABI_H */ diff --git a/src/mlx4.h b/src/mlx4.h index e29f456..3710a17 100644 --- a/src/mlx4.h +++ b/src/mlx4.h @@ -200,7 +200,8 @@ struct mlx4_srq { struct mlx4_wq { uint64_t *wrid; pthread_spinlock_t lock; - int max; + int wqe_cnt; + int max_post; unsigned head; unsigned tail; int max_gs; @@ -216,6 +217,7 @@ struct mlx4_qp { uint32_t doorbell_qpn; uint32_t sq_signal_bits; + int sq_spare_wqes; struct mlx4_wq sq; uint32_t *db; @@ -342,6 +344,8 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, struct ibv_send_wr **bad_wr); int mlx4_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr, struct ibv_recv_wr **bad_wr); +void mlx4_calc_sq_wqe_size(struct ibv_qp_cap *cap, enum ibv_qp_type type, + struct mlx4_qp *qp); int mlx4_alloc_qp_buf(struct ibv_pd *pd, struct ibv_qp_cap *cap, enum ibv_qp_type type, struct mlx4_qp *qp); void mlx4_set_sq_sizes(struct mlx4_qp *qp, struct ibv_qp_cap *cap, diff --git a/src/qp.c b/src/qp.c index 7df3311..301f7cb 100644 --- a/src/qp.c +++ b/src/qp.c @@ -65,6 +65,20 @@ static void *get_send_wqe(struct mlx4_qp *qp, int n) return qp->buf.buf + qp->sq.offset + (n << qp->sq.wqe_shift); } +/* + * Stamp a SQ WQE so that it is invalid if prefetched by marking the + * first four bytes of every 64 byte chunk with 0xffffffff, except for + * the very first chunk of the WQE. + */ +static void stamp_send_wqe(struct mlx4_qp *qp, int n) +{ + uint32_t *wqe = get_send_wqe(qp, n); + int i; + + for (i = 16; i < 1 << (qp->sq.wqe_shift - 2); i += 16) + wqe[i] = 0xffffffff; +} + void mlx4_init_qp_indices(struct mlx4_qp *qp) { qp->sq.head = 0; @@ -78,9 +92,11 @@ void mlx4_qp_init_sq_ownership(struct mlx4_qp *qp) struct mlx4_wqe_ctrl_seg *ctrl; int i; - for (i = 0; i < qp->sq.max; ++i) { + for (i = 0; i < qp->sq.wqe_cnt; ++i) { ctrl = get_send_wqe(qp, i); ctrl->owner_opcode = htonl(1 << 31); + + stamp_send_wqe(qp, i); } } @@ -89,14 +105,14 @@ static int wq_overflow(struct mlx4_wq *wq, int nreq, struct mlx4_cq *cq) unsigned cur; cur = wq->head - wq->tail; - if (cur + nreq < wq->max) + if (cur + nreq < wq->max_post) return 0; pthread_spin_lock(&cq->lock); cur = wq->head - wq->tail; pthread_spin_unlock(&cq->lock); - return cur + nreq >= wq->max; + return cur + nreq >= wq->max_post; } int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, @@ -138,8 +154,8 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, goto out; } - ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.max - 1)); - qp->sq.wrid[ind & (qp->sq.max - 1)] = wr->wr_id; + ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.wqe_cnt - 1)); + qp->sq.wrid[ind & (qp->sq.wqe_cnt - 1)] = wr->wr_id; ctrl->srcrb_flags = (wr->send_flags & IBV_SEND_SIGNALED ? @@ -274,7 +290,16 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, wmb(); ctrl->owner_opcode = htonl(mlx4_ib_opcode[wr->opcode]) | - (ind & qp->sq.max ? htonl(1 << 31) : 0); + (ind & qp->sq.wqe_cnt ? htonl(1 << 31) : 0); + + /* + * We can improve latency by not stamping the last + * send queue WQE until after ringing the doorbell, so + * only stamp here if there are still more WQEs to post. + */ + if (wr->next) + stamp_send_wqe(qp, (ind + qp->sq_spare_wqes) & + (qp->sq.wqe_cnt - 1)); ++ind; } @@ -313,6 +338,10 @@ out: *(uint32_t *) (ctx->uar + MLX4_SEND_DOORBELL) = qp->doorbell_qpn; } + if (nreq) + stamp_send_wqe(qp, (ind + qp->sq_spare_wqes - 1) & + (qp->sq.wqe_cnt - 1)); + pthread_spin_unlock(&qp->sq.lock); return ret; @@ -332,7 +361,7 @@ int mlx4_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr, /* XXX check that state is OK to post receive */ - ind = qp->rq.head & (qp->rq.max - 1); + ind = qp->rq.head & (qp->rq.wqe_cnt - 1); for (nreq = 0; wr; ++nreq, wr = wr->next) { if (wq_overflow(&qp->rq, nreq, to_mcq(qp->ibv_qp.recv_cq))) { @@ -363,7 +392,7 @@ int mlx4_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr, qp->rq.wrid[ind] = wr->wr_id; - ind = (ind + 1) & (qp->rq.max - 1); + ind = (ind + 1) & (qp->rq.wqe_cnt - 1); } out: @@ -384,36 +413,17 @@ out: return ret; } -int mlx4_alloc_qp_buf(struct ibv_pd *pd, struct ibv_qp_cap *cap, - enum ibv_qp_type type, struct mlx4_qp *qp) +void mlx4_calc_sq_wqe_size(struct ibv_qp_cap *cap, enum ibv_qp_type type, + struct mlx4_qp *qp) { int size; int max_sq_sge; - qp->rq.max_gs = cap->max_recv_sge; max_sq_sge = align(cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg), sizeof (struct mlx4_wqe_data_seg)) / sizeof (struct mlx4_wqe_data_seg); if (max_sq_sge < cap->max_send_sge) max_sq_sge = cap->max_send_sge; - qp->sq.wrid = malloc(qp->sq.max * sizeof (uint64_t)); - if (!qp->sq.wrid) - return -1; - - if (qp->rq.max) { - qp->rq.wrid = malloc(qp->rq.max * sizeof (uint64_t)); - if (!qp->rq.wrid) { - free(qp->sq.wrid); - return -1; - } - } - - size = qp->rq.max_gs * sizeof (struct mlx4_wqe_data_seg); - - for (qp->rq.wqe_shift = 4; 1 << qp->rq.wqe_shift < size; - qp->rq.wqe_shift++) - ; /* nothing */ - size = max_sq_sge * sizeof (struct mlx4_wqe_data_seg); switch (type) { case IBV_QPT_UD: @@ -451,14 +461,37 @@ int mlx4_alloc_qp_buf(struct ibv_pd *pd, struct ibv_qp_cap *cap, for (qp->sq.wqe_shift = 6; 1 << qp->sq.wqe_shift < size; qp->sq.wqe_shift++) ; /* nothing */ +} + +int mlx4_alloc_qp_buf(struct ibv_pd *pd, struct ibv_qp_cap *cap, + enum ibv_qp_type type, struct mlx4_qp *qp) +{ + qp->rq.max_gs = cap->max_recv_sge; + + qp->sq.wrid = malloc(qp->sq.wqe_cnt * sizeof (uint64_t)); + if (!qp->sq.wrid) + return -1; + + if (qp->rq.wqe_cnt) { + qp->rq.wrid = malloc(qp->rq.wqe_cnt * sizeof (uint64_t)); + if (!qp->rq.wrid) { + free(qp->sq.wrid); + return -1; + } + } + + for (qp->rq.wqe_shift = 4; + 1 << qp->rq.wqe_shift < qp->rq.max_gs * sizeof (struct mlx4_wqe_data_seg); + qp->rq.wqe_shift++) + ; /* nothing */ - qp->buf_size = (qp->rq.max << qp->rq.wqe_shift) + - (qp->sq.max << qp->sq.wqe_shift); + qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) + + (qp->sq.wqe_cnt << qp->sq.wqe_shift); if (qp->rq.wqe_shift > qp->sq.wqe_shift) { qp->rq.offset = 0; - qp->sq.offset = qp->rq.max << qp->rq.wqe_shift; + qp->sq.offset = qp->rq.wqe_cnt << qp->rq.wqe_shift; } else { - qp->rq.offset = qp->sq.max << qp->sq.wqe_shift; + qp->rq.offset = qp->sq.wqe_cnt << qp->sq.wqe_shift; qp->sq.offset = 0; } @@ -499,6 +532,8 @@ void mlx4_set_sq_sizes(struct mlx4_qp *qp, struct ibv_qp_cap *cap, cap->max_send_sge = qp->sq.max_gs; qp->max_inline_data = wqe_size - sizeof (struct mlx4_wqe_inline_seg); cap->max_inline_data = qp->max_inline_data; + qp->sq.max_post = qp->sq.wqe_cnt - qp->sq_spare_wqes; + cap->max_send_wr = qp->sq.max_post; } struct mlx4_qp *mlx4_find_qp(struct mlx4_context *ctx, uint32_t qpn) diff --git a/src/verbs.c b/src/verbs.c index 52ca0c8..2243b6c 100644 --- a/src/verbs.c +++ b/src/verbs.c @@ -355,11 +355,18 @@ struct ibv_qp *mlx4_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr) if (!qp) return NULL; - qp->sq.max = align_queue_size(attr->cap.max_send_wr); - qp->rq.max = align_queue_size(attr->cap.max_recv_wr); + mlx4_calc_sq_wqe_size(&attr->cap, attr->qp_type, qp); + + /* + * We need to leave 2 KB + 1 WQE of headroom in the SQ to + * allow HW to prefetch. + */ + qp->sq_spare_wqes = (2048 >> qp->sq.wqe_shift) + 1; + qp->sq.wqe_cnt = align_queue_size(attr->cap.max_send_wr + qp->sq_spare_wqes); + qp->rq.wqe_cnt = align_queue_size(attr->cap.max_recv_wr); if (attr->srq) - attr->cap.max_recv_wr = qp->rq.max = 0; + attr->cap.max_recv_wr = qp->rq.wqe_cnt = 0; else if (attr->cap.max_recv_sge < 1) attr->cap.max_recv_sge = 1; @@ -387,9 +394,10 @@ struct ibv_qp *mlx4_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr) cmd.db_addr = (uintptr_t) qp->db; cmd.log_sq_stride = qp->sq.wqe_shift; for (cmd.log_sq_bb_count = 0; - qp->sq.max > 1 << cmd.log_sq_bb_count; + qp->sq.wqe_cnt > 1 << cmd.log_sq_bb_count; ++cmd.log_sq_bb_count) ; /* nothing */ + cmd.sq_no_prefetch = 0; /* OK for ABI 2: just a reserved field */ memset(cmd.reserved, 0, sizeof cmd.reserved); ret = ibv_cmd_create_qp(pd, &qp->ibv_qp, attr, &cmd.ibv_cmd, sizeof cmd, @@ -401,8 +409,8 @@ struct ibv_qp *mlx4_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr) if (ret) goto err_destroy; - qp->rq.max = attr->cap.max_recv_wr; - qp->rq.max_gs = attr->cap.max_recv_sge; + qp->rq.wqe_cnt = qp->rq.max_post = attr->cap.max_recv_wr; + qp->rq.max_gs = attr->cap.max_recv_sge; mlx4_set_sq_sizes(qp, &attr->cap, attr->qp_type); qp->doorbell_qpn = htonl(qp->ibv_qp.qp_num << 8); @@ -422,7 +430,7 @@ err_rq_db: err_free: free(qp->sq.wrid); - if (qp->rq.max) + if (qp->rq.wqe_cnt) free(qp->rq.wrid); mlx4_free_buf(&qp->buf); @@ -527,7 +535,7 @@ int mlx4_destroy_qp(struct ibv_qp *ibqp) if (!ibqp->srq) mlx4_free_db(to_mctx(ibqp->context), MLX4_DB_TYPE_RQ, qp->db); free(qp->sq.wrid); - if (qp->rq.max) + if (qp->rq.wqe_cnt) free(qp->rq.wrid); mlx4_free_buf(&qp->buf); free(qp); From rdreier at cisco.com Wed Jun 13 10:37:27 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 13 Jun 2007 10:37:27 -0700 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <20070613163821.GB12277@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 13 Jun 2007 19:38:21 +0300") References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com> <466F36C8.5010507@linux.vnet.ibm.com> <20070613163821.GB12277@mellanox.co.il> Message-ID: Not sure I follow how this code works. What happens if I attach 100 QPs to an SRQ and then post only 50 receives? - R. From mst at dev.mellanox.co.il Wed Jun 13 10:49:30 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Jun 2007 20:49:30 +0300 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com> <466F36C8.5010507@linux.vnet.ibm.com> <20070613163821.GB12277@mellanox.co.il> Message-ID: <20070613174930.GE12277@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) > > Not sure I follow how this code works. What happens if I attach 100 > QPs to an SRQ and then post only 50 receives? Thanks for asking. Note this is not a full emulation, just close enough to make IPoIB CM work. The assumption I made is that you will post max_wrs receives (this is what IPoIB does). If this is what you do, each QP will get WR_PER_QP receives (its a macro now, can be made module option, or exposed in srq_attr). And if you try to attach more than max_wrs/WR_PER_QP QPs, create QP will fail. -- MST From sean.hefty at intel.com Wed Jun 13 11:02:47 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 13 Jun 2007 11:02:47 -0700 Subject: [ofa-general] Re: crash in ipoib In-Reply-To: <20070613084531.GG1975@mellanox.co.il> Message-ID: <000001c7ade5$0daebd20$4acc180a@amr.corp.intel.com> >This looks strange. Can you supply some more data please? >Which HCA are you running on? >What test are you running? >What should I do to reproduce this? >Further, could you supply the full oops? Woody will need to answer the test/config questions. The oops is only displayed on the screen, and the stack trace is about 50-75 calls long. The start of the oops gets pushed off the screen. (Can we be overrunning the stack?) I'm not at the systems today, but can probably get what else is available tomorrow. We have, I think, up to 16 systems running the tests, and we only see failures on specific nodes (which all happen to be the same type of system ). - Sean From rdreier at cisco.com Wed Jun 13 11:05:15 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 13 Jun 2007 11:05:15 -0700 Subject: [ofa-general] Re: crash in ipoib In-Reply-To: <000001c7ade5$0daebd20$4acc180a@amr.corp.intel.com> (Sean Hefty's message of "Wed, 13 Jun 2007 11:02:47 -0700") References: <000001c7ade5$0daebd20$4acc180a@amr.corp.intel.com> Message-ID: > Woody will need to answer the test/config questions. The oops is only displayed > on the screen, and the stack trace is about 50-75 calls long. The start of the > oops gets pushed off the screen. (Can we be overrunning the stack?) I'm not at > the systems today, but can probably get what else is available tomorrow. If you don't have serial console, it might be worth trying to get netconsole working. It's usually pretty simple to set up (see Documentation/networking/netconsole.txt, you basically just need another system running netcat to capture the log messages). - R. From mst at dev.mellanox.co.il Wed Jun 13 11:09:49 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Jun 2007 21:09:49 +0300 Subject: [ofa-general] Re: Re: crash in ipoib In-Reply-To: <000001c7ade5$0daebd20$4acc180a@amr.corp.intel.com> References: <20070613084531.GG1975@mellanox.co.il> <000001c7ade5$0daebd20$4acc180a@amr.corp.intel.com> Message-ID: <20070613180949.GH12277@mellanox.co.il> > Quoting Sean Hefty : > Subject: RE: Re: crash in ipoib > > >This looks strange. Can you supply some more data please? > >Which HCA are you running on? > >What test are you running? > >What should I do to reproduce this? > >Further, could you supply the full oops? > > Woody will need to answer the test/config questions. The oops is only displayed > on the screen, and the stack trace is about 50-75 calls long. The start of the > oops gets pushed off the screen. (Can we be overrunning the stack?) I'm not at > the systems today, but can probably get what else is available tomorrow. Getting a serial console would be the thing to do then. If you are worried about stack overflow, build your kernel with stack instrumentation. It's quite likely the real oops reason has scrolled off the screen, what you post here could be thre result of fullowing memory corruption. > We have, I think, up to 16 systems running the tests, and we only see failures > on specific nodes (which all happen to be the same type of system > ). One thing to try to check is whether it's kernel-specific. What happens if you install a different kernel/OS there? Try RHEL5 or just build 2.6.20 kernel there. Does it still happen? -- MST From mst at dev.mellanox.co.il Wed Jun 13 11:15:10 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Jun 2007 21:15:10 +0300 Subject: [ofa-general] Re: crash in ipoib In-Reply-To: References: <000001c7ade5$0daebd20$4acc180a@amr.corp.intel.com> Message-ID: <20070613181510.GI12277@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [ofa-general] Re: crash in ipoib > > > Woody will need to answer the test/config questions. The oops is only displayed > > on the screen, and the stack trace is about 50-75 calls long. The start of the > > oops gets pushed off the screen. (Can we be overrunning the stack?) I'm not at > > the systems today, but can probably get what else is available tomorrow. > > If you don't have serial console, it might be worth trying to get > netconsole working. It's usually pretty simple to set up (see > Documentation/networking/netconsole.txt, you basically just need > another system running netcat to capture the log messages). I don't think it's available on that ancient kernel. -- MST From rdreier at cisco.com Wed Jun 13 11:18:08 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 13 Jun 2007 11:18:08 -0700 Subject: [ofa-general] Re: crash in ipoib In-Reply-To: <20070613181510.GI12277@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 13 Jun 2007 21:15:10 +0300") References: <000001c7ade5$0daebd20$4acc180a@amr.corp.intel.com> <20070613181510.GI12277@mellanox.co.il> Message-ID: > I don't think it's available on that ancient kernel. I thought RHEL4 patched netconsole in... at least 'modinfo netconsole' seems to find something on a RHEL4 system we have around. - R. From mst at dev.mellanox.co.il Wed Jun 13 11:23:06 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 13 Jun 2007 21:23:06 +0300 Subject: [ofa-general] Re: crash in ipoib In-Reply-To: References: <000001c7ade5$0daebd20$4acc180a@amr.corp.intel.com> <20070613181510.GI12277@mellanox.co.il> Message-ID: <20070613182306.GJ12277@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [ofa-general] Re: crash in ipoib > > > I don't think it's available on that ancient kernel. > > I thought RHEL4 patched netconsole in... at least 'modinfo netconsole' > seems to find something on a RHEL4 system we have around. Cool, worth a try then. -- MST From robert.j.woodruff at intel.com Wed Jun 13 12:29:17 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Wed, 13 Jun 2007 12:29:17 -0700 Subject: [ofa-general] RE: Re: crash in ipoib In-Reply-To: <20070613180949.GH12277@mellanox.co.il> Message-ID: We are running on a RHEL EL4 2.6.9-42EL kernel on a rocks install. The tests I run are IMB with Intel MPI over uDAPL and at the same time as IMB over IPopIB. It usiually takes at least 1 day sometimes 2 days of running IMB in a loop with various number of processes per node, 1,2, and 4. It seems to fail randomly, not on the same node everytime, so it is not feasible to connect a serial console to every node. It would also be hard for us to put in a new kernel as this has problems with rocks. The systems are the older Xeon, Lindenhurst, 3.6Ghz I have not seen this error on any other kernel or system, I have tested RHEL5 and RHEL4-U5, but only on 2 nodes, but that does not seem to fail. We also having OFED 1.2 running on a 64 and 256 node production applications development clusters and they have not reported any similar problems, but they are not running the same tests. I plan on loading OFED 1.2-rc5 today. Is there an easy way to build the IPoIB driver from the OFED installer so that it has debug enabled ? woody -----Original Message----- From: Michael S. Tsirkin [mailto:mst at dev.mellanox.co.il] Sent: Wednesday, June 13, 2007 11:10 AM To: Hefty, Sean Cc: 'Michael S. Tsirkin'; Sean Hefty; Woodruff, Robert J; 'Vladimir Sokolovsky'; general at lists.openfabrics.org Subject: Re: Re: crash in ipoib > Quoting Sean Hefty : > Subject: RE: Re: crash in ipoib > > >This looks strange. Can you supply some more data please? > >Which HCA are you running on? > >What test are you running? > >What should I do to reproduce this? > >Further, could you supply the full oops? > > Woody will need to answer the test/config questions. The oops is only displayed > on the screen, and the stack trace is about 50-75 calls long. The start of the > oops gets pushed off the screen. (Can we be overrunning the stack?) I'm not at > the systems today, but can probably get what else is available tomorrow. Getting a serial console would be the thing to do then. If you are worried about stack overflow, build your kernel with stack instrumentation. It's quite likely the real oops reason has scrolled off the screen, what you post here could be thre result of fullowing memory corruption. > We have, I think, up to 16 systems running the tests, and we only see failures > on specific nodes (which all happen to be the same type of system > ). One thing to try to check is whether it's kernel-specific. What happens if you install a different kernel/OS there? Try RHEL5 or just build 2.6.20 kernel there. Does it still happen? -- MST From rdreier at cisco.com Wed Jun 13 12:30:42 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 13 Jun 2007 12:30:42 -0700 Subject: [ofa-general] RE: Re: crash in ipoib In-Reply-To: (Robert J. Woodruff's message of "Wed, 13 Jun 2007 12:29:17 -0700") References: Message-ID: > I plan on loading OFED 1.2-rc5 today. Is there an easy way to build the > IPoIB driver from the OFED installer so that it has debug enabled ? I would hope that that is the way the installer builds it by default. From robert.j.woodruff at intel.com Wed Jun 13 12:48:46 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Wed, 13 Jun 2007 12:48:46 -0700 Subject: [ofa-general] RE: Re: crash in ipoib In-Reply-To: Message-ID: Roland wrote, >I would hope that that is the way the installer builds it by default. I found am option in the ofed.conf that allows additional parameters to be passed to the build. I added this and am rebuilding it now. OFA_KERNEL_PARAMS="--with-memtrack --with-ipoib_debug-mod" From rdreier at cisco.com Wed Jun 13 13:40:33 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 13 Jun 2007 13:40:33 -0700 Subject: [ofa-general] Re: [PATCH 2 of 2] mlx4: deal with ownership bit wraparound when cleaning cq In-Reply-To: <200706130836.25074.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Wed, 13 Jun 2007 08:36:24 +0300") References: <200706130836.25074.jackm@dev.mellanox.co.il> Message-ID: thanks, applied 1 & 2. From sweitzen at cisco.com Wed Jun 13 17:50:21 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 13 Jun 2007 17:50:21 -0700 Subject: [ofa-general] OFED 1.2 rc5 release In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90156362A@mtlexch01.mtl.com> References: <43AA3CB3C1BF5A499F5AAD31CA5023AC06624A26@mtlexch01.mtl.com><6C2C79E72C305246B504CBA17B5500C9015634B7@mtlexch01.mtl.com> <6C2C79E72C305246B504CBA17B5500C90156362A@mtlexch01.mtl.com> Message-ID: I have created 1.2rc5 in bugzilla. Tziporet, I'm not sure how you created your "fixed in rc5" list, but some of the bugs on it are still open (for example, https://bugs.openfabrics.org/show_bug.cgi?id=577). Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > Tziporet Koren > Sent: Wednesday, June 13, 2007 7:26 AM > To: ewg at lists.openfabrics.org > Cc: general at lists.openfabrics.org > Subject: [ofa-general] OFED 1.2 rc5 release > > Hi, > > OFED 1.2-RC5 is available on > http://www.openfabrics.org/builds/ofed-1.2/ > File: OFED-1.2-rc5.tgz > To get BUILD_ID run ofed_info > > Please report any issues in bugzilla https://bugs.openfabrics.org/ > > The GA release is expected next Wed (June 20) based on RC5 tests > > Tziporet & Vlad > > ============================================================== > ========== > > Release information: > > OS support: > Novell: > - SLES 9.0 SP3 > - SLES10 > - SLES10 SP1 RC5 > Redhat: > - Redhat EL4 up3, up4 and up5 > - Redhat EL5 > kernel.org: > - 2.6.20 > - 2.6.19 > > Note: Fedora C6 and SuSE Pro 10 are not part of the official list. > We keep the backport patches for these OSes and make sure OFED compile > and loaded properly but will not do full QA cycle. > > Systems: > * x86_64 > * x86 > * ia64 > * ppc64 > > Main changes from OFED-1.1-rc4: > =============================== > 1. Fixed 8 bugs (see attached for fixed issues) > 2. Added support for SLES10 SP1 RC5 (tvflash is disabled for now) > 3. Added support for iSER on RHEL 4 > 4. Updated documents - all owners please review to make sure docs of > your component is updated. > > See bugzilla for all open issues. > > Tasks that should be completed for the GA release: > 1. Complete all documentation (release notes, README, etc.) > 2. Run all QA tests on all platforms > From vlad at dev.mellanox.co.il Thu Jun 14 00:08:03 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 14 Jun 2007 10:08:03 +0300 Subject: [ofa-general] [GIT PULL] please pull tvflash.git In-Reply-To: References: Message-ID: <4670E953.9060703@dev.mellanox.co.il> Roland Dreier wrote: > Vlad, please pull from > > git://staging.openfabrics.org/~rdreier/tvflash.git > > to get tvflash updates that will fix problems building on SLES 10 SP1 > and Fedora 7 due to linking with libgz or libz > (https://bugs.openfabrics.org/show_bug.cgi?id=558). > > Thanks, > Roland > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > Done, Regards, Vladimir From kliteyn at dev.mellanox.co.il Thu Jun 14 01:19:57 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 14 Jun 2007 11:19:57 +0300 Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node guid files options for fat-tree Message-ID: <4670FA2D.7070708@dev.mellanox.co.il> Hi Hal. The following three patches are adding root and compute node guid files options for fat-tree routing, reading these files in fat-tree, and taking care of non-compute nodes when creating fat-tree order file. [1/3] Added two options: ftree_root_guid_file - file that contains list of root guids ftree_cn_guid_file - file that contains list of compute node guids For now, these options are exposed via options file only. [2/3] Fat-tree routing reads root guid file and compute node guid file, and creates map of roots and compute nodes (CNs) to be used later. [3/3] Non-CNs are treated as "dummies" when creating fat-tree order file, because they are not participating in the MPI all-to-all communication. -- Yevgeny From kliteyn at dev.mellanox.co.il Thu Jun 14 01:20:06 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 14 Jun 2007 11:20:06 +0300 Subject: [ofa-general] PATCH [1/3] osm: adding root and compute node guid files options for fat-tree Message-ID: <4670FA36.6060303@dev.mellanox.co.il> Hi Hal, Added two options: ftree_root_guid_file - file that contains list of root guids for fat-tree routing ftree_cn_guid_file - file that contains list of compute node guidsfor fat-tree routing For now, these options are exposed via options file only. Signed-off-by: Yevgeny Kliteynik --- opensm/include/opensm/osm_subnet.h | 10 ++++++++++ opensm/opensm/osm_subnet.c | 22 ++++++++++++++++++++++ 2 files changed, 32 insertions(+), 0 deletions(-) diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index c62128b..46d90d6 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -279,6 +279,8 @@ typedef struct _osm_subn_opt char * lid_matrix_dump_file; char * ucast_dump_file; char * updn_guid_file; + char * ftree_root_guid_file; + char * ftree_cn_guid_file; char * sa_db_file; boolean_t exit_on_fatal; boolean_t honor_guid2lid_file; @@ -455,6 +457,14 @@ typedef struct _osm_subn_opt * updn_guid_file * Pointer to name of the UPDN guid file given by User * +* ftree_root_guid_file +* Name of the file that contains list of root guids that +* will be used by fat-tree routing (provided by User) +* +* ftree_cn_guid_file +* Name of the file that contains list of compute node guids that +* will be used by fat-tree routing (provided by User) +* * sa_db_file * Name of the SA database file. * diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 736f49a..a39ada6 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -501,6 +501,8 @@ osm_subn_set_default_opt( p_opt->lid_matrix_dump_file = NULL; p_opt->ucast_dump_file = NULL; p_opt->updn_guid_file = NULL; + p_opt->ftree_root_guid_file = NULL; + p_opt->ftree_cn_guid_file = NULL; p_opt->sa_db_file = NULL; p_opt->exit_on_fatal = TRUE; p_opt->enable_quirks = FALSE; @@ -1326,6 +1328,14 @@ osm_subn_parse_conf_file( "updn_guid_file", p_key, p_val, &p_opts->updn_guid_file); + __osm_subn_opts_unpack_charp( + "updn_guid_file", + p_key, p_val, &p_opts->ftree_root_guid_file); + + __osm_subn_opts_unpack_charp( + "updn_guid_file", + p_key, p_val, &p_opts->ftree_cn_guid_file); + __osm_subn_opts_unpack_charp( "sa_db_file", p_key, p_val, &p_opts->sa_db_file); @@ -1554,6 +1564,18 @@ osm_subn_write_conf_file( "# One guid in each line\n" "updn_guid_file %s\n\n", p_opts->updn_guid_file); + if (p_opts->ftree_root_guid_file) + fprintf( opts_file, + "# The file holding the fat-tree root node guids\n" + "# One guid in each line\n" + "ftree_root_guid_file %s\n\n", + p_opts->ftree_root_guid_file); + if (p_opts->ftree_cn_guid_file) + fprintf( opts_file, + "# The file holding the fat-tree compute node guids\n" + "# One guid in each line\n" + "ftree_cn_guid_file %s\n\n", + p_opts->ftree_cn_guid_file); if (p_opts->sa_db_file) fprintf( opts_file, "# SA database file name\n" -- 1.5.1.4 From kliteyn at dev.mellanox.co.il Thu Jun 14 01:20:19 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 14 Jun 2007 11:20:19 +0300 Subject: [ofa-general] [PATCH 3/3] osm: adding root and compute node guid files options for fat-tree Message-ID: <4670FA43.9090904@dev.mellanox.co.il> Hi Hal, Non-CNs are treated as "dummies" when creating fat-tree order file, because they are not participating in the MPI all-to-all communication. -- Yevgeny Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_ucast_ftree.c | 10 +++++++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c index b1ee0ca..d3ff45f 100644 --- a/opensm/opensm/osm_ucast_ftree.c +++ b/opensm/opensm/osm_ucast_ftree.c @@ -1373,9 +1373,13 @@ __osm_ftree_fabric_dump_hca_ordering( p_group = p_sw->down_port_groups[j]; p_hca = p_group->remote_hca_or_sw.remote_hca; - fprintf(p_hca_ordering_file,"0x%x\t%s\n", - cl_ntoh16(p_group->remote_base_lid), - p_hca->p_osm_node->print_desc); + /* treat non-compute nodes as dummy */ + if (p_hca->is_cn) + fprintf(p_hca_ordering_file,"0x%x\t%s\n", + cl_ntoh16(p_group->remote_base_lid), + p_hca->p_osm_node->print_desc); + else + fprintf(p_hca_ordering_file,"0xFFFF\tDUMMY\n"); } /* now print dummy HCAs */ -- 1.5.1.4 From kliteyn at dev.mellanox.co.il Thu Jun 14 01:20:13 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 14 Jun 2007 11:20:13 +0300 Subject: [ofa-general] PATCH [2/3] osm: adding root and compute node guid files options for fat-tree Message-ID: <4670FA3D.3090500@dev.mellanox.co.il> Hi Hal. Fat-tree routing reads root guid file and compute node guid file, and creates map of roots and compute nodes (CNs) to be used later. --Yevgeny Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_ucast_ftree.c | 232 +++++++++++++++++++++++++++++++++++++++ 1 files changed, 232 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c index 1730ef2..b1ee0ca 100644 --- a/opensm/opensm/osm_ucast_ftree.c +++ b/opensm/opensm/osm_ucast_ftree.c @@ -119,6 +119,17 @@ typedef struct { /*************************************************** ** + ** ftree_guid_tbl_element_t definition + ** + ***************************************************/ + +typedef struct { + cl_map_item_t map_item; + uint64_t guid; +} ftree_guid_tbl_element_t; + +/*************************************************** + ** ** ftree_fwd_tbl_t definition ** ***************************************************/ @@ -182,6 +193,7 @@ typedef struct ftree_sw_t_ ftree_port_group_t ** up_port_groups; uint8_t up_port_groups_num; ftree_fwd_tbl_t lft_buf; + boolean_t is_root; } ftree_sw_t; /*************************************************** @@ -195,6 +207,7 @@ typedef struct ftree_hca_t_ { osm_node_t * p_osm_node; ftree_port_group_t ** up_port_groups; uint16_t up_port_groups_num; + boolean_t is_cn; } ftree_hca_t; /*************************************************** @@ -209,6 +222,8 @@ typedef struct ftree_fabric_t_ cl_qmap_t hca_tbl; cl_qmap_t sw_tbl; cl_qmap_t sw_by_tuple_tbl; + cl_qmap_t cn_guids_tbl; + cl_qmap_t root_guids_tbl; uint8_t tree_rank; ftree_sw_t ** leaf_switches; uint32_t leaf_switches_num; @@ -393,6 +408,36 @@ __osm_ftree_sw_tbl_element_destroy( /*************************************************** ** + ** ftree_guid_tbl_element_t functions + ** + ***************************************************/ + +static ftree_guid_tbl_element_t * +__osm_ftree_guid_tbl_element_create( + IN uint64_t guid) +{ + ftree_guid_tbl_element_t * p_element = + (ftree_guid_tbl_element_t *) malloc(sizeof(ftree_guid_tbl_element_t)); + if (!p_element) + return NULL; + + memset(p_element, 0,sizeof(ftree_guid_tbl_element_t)); + p_element->guid = guid; + return p_element; +} + +/***************************************************/ + +static void +__osm_ftree_guid_tbl_element_destroy( + IN ftree_guid_tbl_element_t * p_element) +{ + if (p_element) + free(p_element); +} + +/*************************************************** + ** ** ftree_port_t functions ** ***************************************************/ @@ -607,6 +652,9 @@ __osm_ftree_sw_create( p_sw->lft_buf = (ftree_fwd_tbl_t)cl_pool_get(&p_ftree->sw_fwd_tbl_pool); memset(p_sw->lft_buf, OSM_NO_PATH, FTREE_FWD_TBL_LEN); + /* by default the switch is not root */ + p_sw->is_root = FALSE; + return p_sw; } /* __osm_ftree_sw_create() */ @@ -810,6 +858,10 @@ __osm_ftree_hca_create( if (!p_hca->up_port_groups) return NULL; p_hca->up_port_groups_num = 0; + + /* by default every CA is treated as compute node */ + p_hca->is_cn = TRUE; + return p_hca; } @@ -934,6 +986,9 @@ __osm_ftree_fabric_create() cl_qmap_init(&p_ftree->sw_tbl); cl_qmap_init(&p_ftree->sw_by_tuple_tbl); + cl_qmap_init(&p_ftree->cn_guids_tbl); + cl_qmap_init(&p_ftree->root_guids_tbl); + status = cl_pool_init( &p_ftree->sw_fwd_tbl_pool, 8, /* min pool size */ 0, /* max pool size - unlimited */ @@ -960,6 +1015,8 @@ __osm_ftree_fabric_clear(ftree_fabric_t * p_ftree) ftree_sw_t * p_next_sw; ftree_sw_tbl_element_t * p_element; ftree_sw_tbl_element_t * p_next_element; + ftree_guid_tbl_element_t * p_guid_element; + ftree_guid_tbl_element_t * p_next_guid_element; if (!p_ftree) return; @@ -1000,6 +1057,28 @@ __osm_ftree_fabric_clear(ftree_fabric_t * p_ftree) } cl_qmap_remove_all(&p_ftree->sw_by_tuple_tbl); + /* remove all the elements of root_guids_tbl */ + + p_next_guid_element = (ftree_guid_tbl_element_t *)cl_qmap_head(&p_ftree->root_guids_tbl); + while( p_next_guid_element != (ftree_guid_tbl_element_t *)cl_qmap_end(&p_ftree->root_guids_tbl) ) + { + p_guid_element = p_next_guid_element; + p_next_guid_element = (ftree_guid_tbl_element_t *)cl_qmap_next(&p_guid_element->map_item ); + __osm_ftree_guid_tbl_element_destroy(p_guid_element); + } + cl_qmap_remove_all(&p_ftree->root_guids_tbl); + + /* remove all the elements of cn_guids_tbl */ + + p_next_guid_element = (ftree_guid_tbl_element_t *)cl_qmap_head(&p_ftree->cn_guids_tbl); + while( p_next_guid_element != (ftree_guid_tbl_element_t *)cl_qmap_end(&p_ftree->cn_guids_tbl) ) + { + p_guid_element = p_next_guid_element; + p_next_guid_element = (ftree_guid_tbl_element_t *)cl_qmap_next(&p_guid_element->map_item ); + __osm_ftree_guid_tbl_element_destroy(p_guid_element); + } + cl_qmap_remove_all(&p_ftree->cn_guids_tbl); + /* free the leaf switches array */ if ((p_ftree->leaf_switches_num > 0) && (p_ftree->leaf_switches)) free(p_ftree->leaf_switches); @@ -1048,6 +1127,16 @@ __osm_ftree_fabric_add_hca(ftree_fabric_t * p_ftree, osm_node_t * p_osm_node) CL_ASSERT(osm_node_get_type(p_osm_node) == IB_NODE_TYPE_CA); + /* if a user has supplied CN guids list, and this CA's guid + is not there, then the CA should be marked as non-CN */ + if ( (!cl_is_qmap_empty(&p_ftree->cn_guids_tbl)) && + (cl_qmap_get(&p_ftree->cn_guids_tbl, + cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node))) == + cl_qmap_end(&p_ftree->cn_guids_tbl)) ) + { + p_hca->is_cn = FALSE; + } + cl_qmap_insert(&p_ftree->hca_tbl, p_osm_node->node_info.node_guid, &p_hca->map_item); @@ -1062,6 +1151,16 @@ __osm_ftree_fabric_add_sw(ftree_fabric_t * p_ftree, osm_switch_t * p_osm_sw) CL_ASSERT(osm_node_get_type(p_osm_sw->p_node) == IB_NODE_TYPE_SWITCH); + /* if a user has supplied root guids list, and this switch's guid + *is* there, then the switch should be marked as root */ + if ( (!cl_is_qmap_empty(&p_ftree->root_guids_tbl)) && + (cl_qmap_get(&p_ftree->root_guids_tbl, + cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node))) != + cl_qmap_end(&p_ftree->root_guids_tbl)) ) + { + p_sw->is_root = TRUE; + } + cl_qmap_insert(&p_ftree->sw_tbl, p_osm_sw->p_node->node_info.node_guid, &p_sw->map_item); @@ -2907,6 +3006,127 @@ __osm_ftree_fabric_populate_ports( /*************************************************** ***************************************************/ +static int +__osm_ftree_convert_list2qmap( + cl_list_t * p_guid_list, + cl_qmap_t * p_map ) +{ + uint64_t * p_guid; + + if ( !p_map ) + return -1; + + if ( !p_guid_list || !cl_list_count(p_guid_list) ) + return 0; + + while ( (p_guid = (uint64_t*)cl_list_remove_head(p_guid_list)) ) + { + cl_qmap_insert( p_map, + *p_guid, + &(__osm_ftree_guid_tbl_element_create(*p_guid)->map_item) ); + free(p_guid); + } + + CL_ASSERT(cl_is_list_empty(p_guid_list)); + + return 0; +} /* __osm_ftree_convert_list2qmap() */ + +/*************************************************** + ***************************************************/ + +static int +__osm_ftree_fabric_read_guid_files( + IN ftree_fabric_t * p_ftree) +{ + cl_list_t guid_list; + ftree_guid_tbl_element_t * p_guid_element; + ftree_guid_tbl_element_t * p_next_guid_element; + int status = 0; + + OSM_LOG_ENTER(&p_ftree->p_osm->log, __osm_ftree_fabric_read_guid_files); + + cl_list_construct( &guid_list ); + cl_list_init( &guid_list, 10 ); + + p_ftree->p_osm->subn.opt.ftree_root_guid_file = "/tmp/ftree.root.guids"; + p_ftree->p_osm->subn.opt.ftree_cn_guid_file = "/tmp/ftree.cn.guids"; + + if (p_ftree->p_osm->subn.opt.ftree_root_guid_file) + { + osm_log( &p_ftree->p_osm->log, OSM_LOG_DEBUG, + "__osm_ftree_read_guid_files: " + "Fetching root nodes from file %s\n", + p_ftree->p_osm->subn.opt.ftree_root_guid_file ); + + if ( osm_ucast_mgr_read_guid_file( &p_ftree->p_osm->sm.ucast_mgr, + p_ftree->p_osm->subn.opt.ftree_root_guid_file, + &guid_list ) || + __osm_ftree_convert_list2qmap( &guid_list, + &p_ftree->root_guids_tbl ) ) + { + status = -1; + goto Exit; + } + + if (osm_log_is_active(&p_ftree->p_osm->log,OSM_LOG_DEBUG)) + { + p_next_guid_element = (ftree_guid_tbl_element_t *)cl_qmap_head(&p_ftree->root_guids_tbl); + while( p_next_guid_element != (ftree_guid_tbl_element_t *)cl_qmap_end(&p_ftree->root_guids_tbl) ) + { + p_guid_element = p_next_guid_element; + p_next_guid_element = (ftree_guid_tbl_element_t *)cl_qmap_next(&p_guid_element->map_item ); + osm_log( &p_ftree->p_osm->log, OSM_LOG_DEBUG, + "__osm_ftree_fabric_read_guid_files: " + "root guid 0x%016" PRIx64 "\n", + p_guid_element->guid ); + } + } + } + CL_ASSERT(cl_is_list_empty(&guid_list)); + + if (p_ftree->p_osm->subn.opt.ftree_cn_guid_file) + { + osm_log( &p_ftree->p_osm->log, OSM_LOG_DEBUG, + "__osm_ftree_read_guid_files: " + "Fetching compute nodes from file %s\n", + p_ftree->p_osm->subn.opt.ftree_cn_guid_file ); + + if ( osm_ucast_mgr_read_guid_file( &p_ftree->p_osm->sm.ucast_mgr, + p_ftree->p_osm->subn.opt.ftree_cn_guid_file, + &guid_list ) || + __osm_ftree_convert_list2qmap( &guid_list, + &p_ftree->cn_guids_tbl ) ) + { + status = -1; + goto Exit; + } + + if (osm_log_is_active(&p_ftree->p_osm->log,OSM_LOG_DEBUG)) + { + p_next_guid_element = (ftree_guid_tbl_element_t *)cl_qmap_head(&p_ftree->cn_guids_tbl); + while( p_next_guid_element != (ftree_guid_tbl_element_t *)cl_qmap_end(&p_ftree->cn_guids_tbl) ) + { + p_guid_element = p_next_guid_element; + p_next_guid_element = (ftree_guid_tbl_element_t *)cl_qmap_next(&p_guid_element->map_item ); + osm_log( &p_ftree->p_osm->log, OSM_LOG_DEBUG, + "__osm_ftree_fabric_read_guid_files: " + "compute node guid 0x%016" PRIx64 "\n", + p_guid_element->guid ); + } + } + } + CL_ASSERT(cl_is_list_empty(&guid_list)); + + Exit: + OSM_LOG_EXIT(&p_ftree->p_osm->log); + cl_list_destroy(&guid_list); + return status; +} /*__osm_ftree_fabric_read_guid_files() */ + +/*************************************************** + ***************************************************/ + static int __osm_ftree_construct_fabric( IN void * context) @@ -2947,6 +3167,18 @@ __osm_ftree_construct_fabric( goto Exit; } + osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE, + "__osm_ftree_construct_fabric: " + "Reading guid files provided by user\n"); + if (__osm_ftree_fabric_read_guid_files(p_ftree) != 0) + { + osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, + "Failed reading guid files - " + "falling back to default routing\n"); + status = -1; + goto Exit; + } + osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE,"__osm_ftree_construct_fabric: \n" " |----------------------------------------|\n" " |- Starting FatTree fabric construction -|\n" -- 1.5.1.4 From kliteyn at dev.mellanox.co.il Thu Jun 14 01:25:21 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 14 Jun 2007 11:25:21 +0300 Subject: [ofa-general] [PATCH] osm: bugfix - if fat-tree failed, osm should fall back to default routing Message-ID: <4670FB71.5090406@dev.mellanox.co.il> Hi Hal, When fat-tree fails to populate all the data structures, it should return error and let osm fall back to default routing. Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_ucast_ftree.c | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c index d3ff45f..2236734 100644 --- a/opensm/opensm/osm_ucast_ftree.c +++ b/opensm/opensm/osm_ucast_ftree.c @@ -3302,11 +3302,15 @@ __osm_ftree_do_routing( IN void * context) { ftree_fabric_t * p_ftree = context; + int status = 0; OSM_LOG_ENTER(&p_ftree->p_osm->log, __osm_ftree_do_routing); if (!p_ftree->fabric_built) + { + status = -1; goto Exit; + } osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE,"__osm_ftree_do_routing: " "Starting FatTree routing\n"); @@ -3330,7 +3334,7 @@ __osm_ftree_do_routing( Exit: OSM_LOG_EXIT(&p_ftree->p_osm->log); - return 0; + return status; } /*************************************************** -- 1.5.1.4 From vlad at lists.openfabrics.org Thu Jun 14 02:43:53 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Thu, 14 Jun 2007 02:43:53 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070614-0200 daily build status Message-ID: <20070614094353.E1D6AE6086C@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.12 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.13 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.13 Passed on powerpc with linux-2.6.13 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ia64 with linux-2.6.15 Passed on x86_64 with linux-2.6.14 Passed on powerpc with linux-2.6.14 Passed on ia64 with linux-2.6.17 Passed on powerpc with linux-2.6.15 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From sashak at voltaire.com Thu Jun 14 04:37:57 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 14 Jun 2007 14:37:57 +0300 Subject: [ofa-general] [PATCH] opensm/osm_helper.c: fixing PortInfo CapMask printing Message-ID: <20070614113757.GA5908@sashak.voltaire.com> When PortInfo:CapMask is zero, non-initialized local buffer (garbage) is printed. There is the fix. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_helper.c | 19 ++++++++++--------- 1 files changed, 10 insertions(+), 9 deletions(-) diff --git a/opensm/opensm/osm_helper.c b/opensm/opensm/osm_helper.c index 724ecdf..2b35bdd 100644 --- a/opensm/opensm/osm_helper.c +++ b/opensm/opensm/osm_helper.c @@ -546,9 +546,6 @@ osm_dbg_get_capabilities_str( uint32_t total_len = 0; char *p_local = p_buf; - if( !p_pi->capability_mask ) - return; - strcpy( p_local, "Capability Mask:\n" ); p_local += strlen( p_local ); @@ -839,9 +836,11 @@ osm_dump_port_info( ); /* show the capabilities mask */ - osm_dbg_get_capabilities_str( buf, BUF_SIZE, "\t\t\t\t", p_pi ); - - osm_log( p_log, log_level, "%s", buf ); + if( p_pi->capability_mask ) + { + osm_dbg_get_capabilities_str( buf, BUF_SIZE, "\t\t\t\t", p_pi ); + osm_log( p_log, log_level, "%s", buf ); + } } } @@ -936,9 +935,11 @@ osm_dump_portinfo_record( ); /* show the capabilities mask */ - osm_dbg_get_capabilities_str( buf, BUF_SIZE, "\t\t\t\t", p_pi ); - - osm_log( p_log, log_level, "%s", buf ); + if( p_pi->capability_mask ) + { + osm_dbg_get_capabilities_str( buf, BUF_SIZE, "\t\t\t\t", p_pi ); + osm_log( p_log, log_level, "%s", buf ); + } } } -- 1.5.2.1.137.g426c From sashak at voltaire.com Thu Jun 14 05:15:01 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 14 Jun 2007 15:15:01 +0300 Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node guid files options for fat-tree In-Reply-To: <4670FA2D.7070708@dev.mellanox.co.il> References: <4670FA2D.7070708@dev.mellanox.co.il> Message-ID: <20070614121501.GC5908@sashak.voltaire.com> Hi Yevgeny, On 11:19 Thu 14 Jun , Yevgeny Kliteynik wrote: > > The following three patches are adding root and compute node guid files > options for fat-tree routing, Is there any reason to not share root guids file option with up/down? Also the way how root guids are handled (in both up/down and ftree) doesn't look very optimal - guids are loaded to dynamic list, the list is converted to map, this map is matched and root nodes are marked as roots. Isn't it would be easy just to mark root nodes during file parsing? Sasha From kliteyn at dev.mellanox.co.il Thu Jun 14 05:36:15 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 14 Jun 2007 15:36:15 +0300 Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node guid files options for fat-tree In-Reply-To: <20070614121501.GC5908@sashak.voltaire.com> References: <4670FA2D.7070708@dev.mellanox.co.il> <20070614121501.GC5908@sashak.voltaire.com> Message-ID: <4671363F.6060600@dev.mellanox.co.il> Sasha Khapyorsky wrote: > Hi Yevgeny, > > On 11:19 Thu 14 Jun , Yevgeny Kliteynik wrote: >> The following three patches are adding root and compute node guid files >> options for fat-tree routing, > > Is there any reason to not share root guids file option with up/down? There are two new options for fat-tree: roots and compute nodes (CN). These two will be very "tightly coupled" and would have more implication on the routing than in case of up/dn roots. For instance, having root file but not CN file means that the topology doesn't have to be pure fat-tree, but all the CAs are considered CNs and have to be on the same level of the tree. And there is similar implication of all the combinations of these two options. Because of this coupling I wanted to differentiate these two options from the up/dn roots. Thoughts? > Also the way how root guids are handled (in both up/down and ftree) > doesn't look very optimal - guids are loaded to dynamic list, the list > is converted to map, this map is matched and root nodes are marked as > roots. Isn't it would be easy just to mark root nodes during file > parsing? The only thing you can save here is converting list to map: You have to parse the guids file anyway, and you have to build all the fat-tree data structures anyway. So if you parse the file and fill the map right away instead of filling the list first, you will save the list2map conversion. But then up/dn and fat-tree can't use the same function to parse the guid file, and since the list2map conversion is not a big deal (we're talking about list of roots, which is couple of hundreds of guids at max), I prefer to leave it and not to use separate parsing functions for up/dn and fat-tree. BTW, since we're on this subject, how about removing the list2array conversion in the same place in up/dn routing? -- Yevgeny > Sasha > From kliteyn at dev.mellanox.co.il Thu Jun 14 06:16:55 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 14 Jun 2007 16:16:55 +0300 Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node guid files options for fat-tree In-Reply-To: <4671363F.6060600@dev.mellanox.co.il> References: <4670FA2D.7070708@dev.mellanox.co.il> <20070614121501.GC5908@sashak.voltaire.com> <4671363F.6060600@dev.mellanox.co.il> Message-ID: <46713FC7.3030104@dev.mellanox.co.il> Hi Sasha, Yevgeny Kliteynik wrote: > Sasha Khapyorsky wrote: >> Hi Yevgeny, >> >> On 11:19 Thu 14 Jun , Yevgeny Kliteynik wrote: >>> The following three patches are adding root and compute node guid files >>> options for fat-tree routing, >> >> Is there any reason to not share root guids file option with up/down? > > There are two new options for fat-tree: roots and compute nodes (CN). > These two will be very "tightly coupled" and would have more implication > on the routing than in case of up/dn roots. For instance, having root > file but not CN file means that the topology doesn't have to be pure > fat-tree, > but all the CAs are considered CNs and have to be on the same level of > the tree. > And there is similar implication of all the combinations of these two > options. > > Because of this coupling I wanted to differentiate these two options from > the up/dn roots. > > Thoughts? > >> Also the way how root guids are handled (in both up/down and ftree) >> doesn't look very optimal - guids are loaded to dynamic list, the list >> is converted to map, this map is matched and root nodes are marked as >> roots. Isn't it would be easy just to mark root nodes during file >> parsing? > > The only thing you can save here is converting list to map: > You have to parse the guids file anyway, and you have to build all the > fat-tree data structures anyway. So if you parse the file and fill the > map right away instead of filling the list first, you will save the > list2map conversion. > But then up/dn and fat-tree can't use the same function to parse the > guid file, > and since the list2map conversion is not a big deal (we're talking about > list > of roots, which is couple of hundreds of guids at max), I prefer > to leave it and not to use separate parsing functions for up/dn and fat-tree. Actually, I can do something else here: - parse guid file into list - populate fat-tree switches and CAs - scan guid list, and for each guid mark the matching node in the fat-tree maps Sounds OK? -- Yevgeny > BTW, since we're on this subject, how about removing the list2array > conversion > in the same place in up/dn routing? > > -- Yevgeny > >> Sasha >> > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From sashak at voltaire.com Thu Jun 14 06:45:19 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 14 Jun 2007 16:45:19 +0300 Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node guid files options for fat-tree In-Reply-To: <4671363F.6060600@dev.mellanox.co.il> References: <4670FA2D.7070708@dev.mellanox.co.il> <20070614121501.GC5908@sashak.voltaire.com> <4671363F.6060600@dev.mellanox.co.il> Message-ID: <20070614134519.GD5908@sashak.voltaire.com> On 15:36 Thu 14 Jun , Yevgeny Kliteynik wrote: > Sasha Khapyorsky wrote: > > Hi Yevgeny, > > On 11:19 Thu 14 Jun , Yevgeny Kliteynik wrote: > >> The following three patches are adding root and compute node guid files > >> options for fat-tree routing, > > Is there any reason to not share root guids file option with up/down? > > There are two new options for fat-tree: roots and compute nodes (CN). > These two will be very "tightly coupled" and would have more implication > on the routing than in case of up/dn roots. For instance, having root > file but not CN file means that the topology doesn't have to be pure > fat-tree, > but all the CAs are considered CNs and have to be on the same level of the > tree. > And there is similar implication of all the combinations of these two > options. > > Because of this coupling I wanted to differentiate these two options from > the up/dn roots. > > Thoughts? I still not have strong option about two options against common one. Hypothetically if in some days we will implement routing engine chains (so failed algo will fallback to next in chain and not just to default) separate options could be useful. > > Also the way how root guids are handled (in both up/down and ftree) > > doesn't look very optimal - guids are loaded to dynamic list, the list > > is converted to map, this map is matched and root nodes are marked as > > roots. Isn't it would be easy just to mark root nodes during file parsing? > > The only thing you can save here is converting list to map: I don't think the root guids map is needed - you can just set is_root field for sw nodes by guid(s) specified in the file, since you already have sw by guid map. > You have to parse the guids file anyway, and you have to build all the > fat-tree data structures anyway. So if you parse the file and fill the > map right away instead of filling the list first, you will save the list2map > conversion. > But then up/dn and fat-tree can't use the same function to parse the guid > file, > and since the list2map conversion is not a big deal (we're talking about > list > of roots, which is couple of hundreds of guids at max), I prefer to leave it > and not to use separate parsing functions for up/dn and fat-tree. You can pass custom callback to common parser. > BTW, since we're on this subject, how about removing the list2array > conversion > in the same place in up/dn routing? Sure, similar junk should be cleaned up in up/down too (and my original complain was about both root guids users). Sasha From sashak at voltaire.com Thu Jun 14 06:57:17 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 14 Jun 2007 16:57:17 +0300 Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node guid files options for fat-tree In-Reply-To: <46713FC7.3030104@dev.mellanox.co.il> References: <4670FA2D.7070708@dev.mellanox.co.il> <20070614121501.GC5908@sashak.voltaire.com> <4671363F.6060600@dev.mellanox.co.il> <46713FC7.3030104@dev.mellanox.co.il> Message-ID: <20070614135717.GE5908@sashak.voltaire.com> On 16:16 Thu 14 Jun , Yevgeny Kliteynik wrote: > Hi Sasha, > > Yevgeny Kliteynik wrote: > > Sasha Khapyorsky wrote: > >> Hi Yevgeny, > >> > >> On 11:19 Thu 14 Jun , Yevgeny Kliteynik wrote: > >>> The following three patches are adding root and compute node guid files > >>> options for fat-tree routing, > >> > >> Is there any reason to not share root guids file option with up/down? > > There are two new options for fat-tree: roots and compute nodes (CN). > > These two will be very "tightly coupled" and would have more implication > > on the routing than in case of up/dn roots. For instance, having root > > file but not CN file means that the topology doesn't have to be pure > > fat-tree, > > but all the CAs are considered CNs and have to be on the same level of the > > tree. > > And there is similar implication of all the combinations of these two > > options. > > Because of this coupling I wanted to differentiate these two options from > > the up/dn roots. > > Thoughts? > >> Also the way how root guids are handled (in both up/down and ftree) > >> doesn't look very optimal - guids are loaded to dynamic list, the list > >> is converted to map, this map is matched and root nodes are marked as > >> roots. Isn't it would be easy just to mark root nodes during file parsing? > > The only thing you can save here is converting list to map: > > You have to parse the guids file anyway, and you have to build all the > > fat-tree data structures anyway. So if you parse the file and fill the > > map right away instead of filling the list first, you will save the > > list2map conversion. > > But then up/dn and fat-tree can't use the same function to parse the guid > > file, > > and since the list2map conversion is not a big deal (we're talking about > > list > of roots, which is couple of hundreds of guids at max), I prefer to > > leave it and not to use separate parsing functions for up/dn and fat-tree. > > Actually, I can do something else here: > - parse guid file into list > - populate fat-tree switches and CAs > - scan guid list, and for each guid mark the matching node in the fat-tree > maps > > Sounds OK? Yes, much better. Also there could be something like: - populate fat-tree switches and CAs - parse guid file, and for each guid mark the matching node (with custom callback) But with your proposition it is not needed to touch the parser (and up/down :)). Sasha From kliteyn at dev.mellanox.co.il Thu Jun 14 06:54:35 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 14 Jun 2007 16:54:35 +0300 Subject: [ofa-general] PATCH [2/3] osm: adding root and compute node guid files options for fat-tree In-Reply-To: <4670FA3D.3090500@dev.mellanox.co.il> References: <4670FA3D.3090500@dev.mellanox.co.il> Message-ID: <4671489B.1070808@dev.mellanox.co.il> Hi Hal, Yevgeny Kliteynik wrote: > Hi Hal. > > Fat-tree routing reads root guid file and compute node guid file, > and creates map of roots and compute nodes (CNs) to be used later. > > --Yevgeny > > Signed-off-by: Yevgeny Kliteynik > --- > opensm/opensm/osm_ucast_ftree.c | 232 > +++++++++++++++++++++++++++++++++++++++ > 1 files changed, 232 insertions(+), 0 deletions(-) > > diff --git a/opensm/opensm/osm_ucast_ftree.c > b/opensm/opensm/osm_ucast_ftree.c > index 1730ef2..b1ee0ca 100644 > --- a/opensm/opensm/osm_ucast_ftree.c > +++ b/opensm/opensm/osm_ucast_ftree.c > @@ -119,6 +119,17 @@ typedef struct { > > /*************************************************** > ** > + ** ftree_guid_tbl_element_t definition > + ** > + ***************************************************/ > + > +typedef struct { > + cl_map_item_t map_item; > + uint64_t guid; > +} ftree_guid_tbl_element_t; > + > +/*************************************************** > + ** > ** ftree_fwd_tbl_t definition > ** > ***************************************************/ > @@ -182,6 +193,7 @@ typedef struct ftree_sw_t_ > ftree_port_group_t ** up_port_groups; > uint8_t up_port_groups_num; > ftree_fwd_tbl_t lft_buf; > + boolean_t is_root; > } ftree_sw_t; > > /*************************************************** > @@ -195,6 +207,7 @@ typedef struct ftree_hca_t_ { > osm_node_t * p_osm_node; > ftree_port_group_t ** up_port_groups; > uint16_t up_port_groups_num; > + boolean_t is_cn; > } ftree_hca_t; > > /*************************************************** > @@ -209,6 +222,8 @@ typedef struct ftree_fabric_t_ > cl_qmap_t hca_tbl; > cl_qmap_t sw_tbl; > cl_qmap_t sw_by_tuple_tbl; > + cl_qmap_t cn_guids_tbl; > + cl_qmap_t root_guids_tbl; > uint8_t tree_rank; > ftree_sw_t ** leaf_switches; > uint32_t leaf_switches_num; > @@ -393,6 +408,36 @@ __osm_ftree_sw_tbl_element_destroy( > > /*************************************************** > ** > + ** ftree_guid_tbl_element_t functions > + ** > + ***************************************************/ > + > +static ftree_guid_tbl_element_t * > +__osm_ftree_guid_tbl_element_create( > + IN uint64_t guid) > +{ > + ftree_guid_tbl_element_t * p_element = + > (ftree_guid_tbl_element_t *) malloc(sizeof(ftree_guid_tbl_element_t)); > + if (!p_element) > + return NULL; > + > + memset(p_element, 0,sizeof(ftree_guid_tbl_element_t)); > + p_element->guid = guid; > + return p_element; > +} > + > +/***************************************************/ > + > +static void > +__osm_ftree_guid_tbl_element_destroy( > + IN ftree_guid_tbl_element_t * p_element) > +{ > + if (p_element) > + free(p_element); > +} > + > +/*************************************************** > + ** > ** ftree_port_t functions > ** > ***************************************************/ > @@ -607,6 +652,9 @@ __osm_ftree_sw_create( > p_sw->lft_buf = (ftree_fwd_tbl_t)cl_pool_get(&p_ftree->sw_fwd_tbl_pool); > memset(p_sw->lft_buf, OSM_NO_PATH, FTREE_FWD_TBL_LEN); > > + /* by default the switch is not root */ > + p_sw->is_root = FALSE; > + > return p_sw; > } /* __osm_ftree_sw_create() */ > > @@ -810,6 +858,10 @@ __osm_ftree_hca_create( > if (!p_hca->up_port_groups) > return NULL; > p_hca->up_port_groups_num = 0; > + > + /* by default every CA is treated as compute node */ > + p_hca->is_cn = TRUE; > + > return p_hca; > } > > @@ -934,6 +986,9 @@ __osm_ftree_fabric_create() > cl_qmap_init(&p_ftree->sw_tbl); > cl_qmap_init(&p_ftree->sw_by_tuple_tbl); > > + cl_qmap_init(&p_ftree->cn_guids_tbl); > + cl_qmap_init(&p_ftree->root_guids_tbl); > + > status = cl_pool_init( &p_ftree->sw_fwd_tbl_pool, > 8, /* min pool size */ > 0, /* max pool size - > unlimited */ > @@ -960,6 +1015,8 @@ __osm_ftree_fabric_clear(ftree_fabric_t * p_ftree) > ftree_sw_t * p_next_sw; > ftree_sw_tbl_element_t * p_element; > ftree_sw_tbl_element_t * p_next_element; > + ftree_guid_tbl_element_t * p_guid_element; > + ftree_guid_tbl_element_t * p_next_guid_element; > > if (!p_ftree) > return; > @@ -1000,6 +1057,28 @@ __osm_ftree_fabric_clear(ftree_fabric_t * p_ftree) > } > cl_qmap_remove_all(&p_ftree->sw_by_tuple_tbl); > > + /* remove all the elements of root_guids_tbl */ > + > + p_next_guid_element = (ftree_guid_tbl_element_t > *)cl_qmap_head(&p_ftree->root_guids_tbl); > + while( p_next_guid_element != (ftree_guid_tbl_element_t > *)cl_qmap_end(&p_ftree->root_guids_tbl) ) > + { > + p_guid_element = p_next_guid_element; > + p_next_guid_element = (ftree_guid_tbl_element_t > *)cl_qmap_next(&p_guid_element->map_item ); > + __osm_ftree_guid_tbl_element_destroy(p_guid_element); > + } > + cl_qmap_remove_all(&p_ftree->root_guids_tbl); > + > + /* remove all the elements of cn_guids_tbl */ > + > + p_next_guid_element = (ftree_guid_tbl_element_t > *)cl_qmap_head(&p_ftree->cn_guids_tbl); > + while( p_next_guid_element != (ftree_guid_tbl_element_t > *)cl_qmap_end(&p_ftree->cn_guids_tbl) ) > + { > + p_guid_element = p_next_guid_element; > + p_next_guid_element = (ftree_guid_tbl_element_t > *)cl_qmap_next(&p_guid_element->map_item ); > + __osm_ftree_guid_tbl_element_destroy(p_guid_element); > + } > + cl_qmap_remove_all(&p_ftree->cn_guids_tbl); > + > /* free the leaf switches array */ > if ((p_ftree->leaf_switches_num > 0) && (p_ftree->leaf_switches)) > free(p_ftree->leaf_switches); > @@ -1048,6 +1127,16 @@ __osm_ftree_fabric_add_hca(ftree_fabric_t * > p_ftree, osm_node_t * p_osm_node) > > CL_ASSERT(osm_node_get_type(p_osm_node) == IB_NODE_TYPE_CA); > > + /* if a user has supplied CN guids list, and this CA's guid + > is not there, then the CA should be marked as non-CN */ > + if ( (!cl_is_qmap_empty(&p_ftree->cn_guids_tbl)) && + > (cl_qmap_get(&p_ftree->cn_guids_tbl, > + > cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node))) == > + cl_qmap_end(&p_ftree->cn_guids_tbl)) ) > + { > + p_hca->is_cn = FALSE; > + } > + > cl_qmap_insert(&p_ftree->hca_tbl, > p_osm_node->node_info.node_guid, > &p_hca->map_item); > @@ -1062,6 +1151,16 @@ __osm_ftree_fabric_add_sw(ftree_fabric_t * > p_ftree, osm_switch_t * p_osm_sw) > > CL_ASSERT(osm_node_get_type(p_osm_sw->p_node) == IB_NODE_TYPE_SWITCH); > > + /* if a user has supplied root guids list, and this switch's guid > + *is* there, then the switch should be marked as root */ > + if ( (!cl_is_qmap_empty(&p_ftree->root_guids_tbl)) && + > (cl_qmap_get(&p_ftree->root_guids_tbl, > + > cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node))) != > + cl_qmap_end(&p_ftree->root_guids_tbl)) ) > + { > + p_sw->is_root = TRUE; > + } > + > cl_qmap_insert(&p_ftree->sw_tbl, > p_osm_sw->p_node->node_info.node_guid, > &p_sw->map_item); > @@ -2907,6 +3006,127 @@ __osm_ftree_fabric_populate_ports( > /*************************************************** > ***************************************************/ > > +static int > +__osm_ftree_convert_list2qmap( > + cl_list_t * p_guid_list, > + cl_qmap_t * p_map ) > +{ > + uint64_t * p_guid; > + > + if ( !p_map ) > + return -1; > + > + if ( !p_guid_list || !cl_list_count(p_guid_list) ) > + return 0; > + > + while ( (p_guid = (uint64_t*)cl_list_remove_head(p_guid_list)) ) > + { > + cl_qmap_insert( p_map, + *p_guid, > + > &(__osm_ftree_guid_tbl_element_create(*p_guid)->map_item) ); > + free(p_guid); > + } > + > + CL_ASSERT(cl_is_list_empty(p_guid_list)); > + > + return 0; > +} /* __osm_ftree_convert_list2qmap() */ > + > +/*************************************************** > + ***************************************************/ > + > +static int > +__osm_ftree_fabric_read_guid_files( > + IN ftree_fabric_t * p_ftree) > +{ > + cl_list_t guid_list; > + ftree_guid_tbl_element_t * p_guid_element; > + ftree_guid_tbl_element_t * p_next_guid_element; > + int status = 0; > + > + OSM_LOG_ENTER(&p_ftree->p_osm->log, __osm_ftree_fabric_read_guid_files); > + > + cl_list_construct( &guid_list ); > + cl_list_init( &guid_list, 10 ); > + > + p_ftree->p_osm->subn.opt.ftree_root_guid_file = "/tmp/ftree.root.guids"; > + p_ftree->p_osm->subn.opt.ftree_cn_guid_file = "/tmp/ftree.cn.guids"; These two lines are, of course, a mistake :) -- Yevgeny > + > + if (p_ftree->p_osm->subn.opt.ftree_root_guid_file) > + { > + osm_log( &p_ftree->p_osm->log, OSM_LOG_DEBUG, > + "__osm_ftree_read_guid_files: " > + "Fetching root nodes from file %s\n", > + p_ftree->p_osm->subn.opt.ftree_root_guid_file ); > + > + if ( osm_ucast_mgr_read_guid_file( &p_ftree->p_osm->sm.ucast_mgr, > + > p_ftree->p_osm->subn.opt.ftree_root_guid_file, > + &guid_list ) || > + __osm_ftree_convert_list2qmap( &guid_list, > + &p_ftree->root_guids_tbl ) ) > + { > + status = -1; > + goto Exit; > + } > + > + if (osm_log_is_active(&p_ftree->p_osm->log,OSM_LOG_DEBUG)) > + { > + p_next_guid_element = (ftree_guid_tbl_element_t > *)cl_qmap_head(&p_ftree->root_guids_tbl); > + while( p_next_guid_element != (ftree_guid_tbl_element_t > *)cl_qmap_end(&p_ftree->root_guids_tbl) ) > + { > + p_guid_element = p_next_guid_element; > + p_next_guid_element = (ftree_guid_tbl_element_t > *)cl_qmap_next(&p_guid_element->map_item ); > + osm_log( &p_ftree->p_osm->log, OSM_LOG_DEBUG, > + "__osm_ftree_fabric_read_guid_files: " > + "root guid 0x%016" PRIx64 "\n", > + p_guid_element->guid ); > + } > + } > + } > + CL_ASSERT(cl_is_list_empty(&guid_list)); > + > + if (p_ftree->p_osm->subn.opt.ftree_cn_guid_file) > + { > + osm_log( &p_ftree->p_osm->log, OSM_LOG_DEBUG, > + "__osm_ftree_read_guid_files: " > + "Fetching compute nodes from file %s\n", > + p_ftree->p_osm->subn.opt.ftree_cn_guid_file ); > + > + if ( osm_ucast_mgr_read_guid_file( &p_ftree->p_osm->sm.ucast_mgr, > + > p_ftree->p_osm->subn.opt.ftree_cn_guid_file, > + &guid_list ) || > + __osm_ftree_convert_list2qmap( &guid_list, > + &p_ftree->cn_guids_tbl ) ) > + { > + status = -1; > + goto Exit; > + } > + > + if (osm_log_is_active(&p_ftree->p_osm->log,OSM_LOG_DEBUG)) > + { > + p_next_guid_element = (ftree_guid_tbl_element_t > *)cl_qmap_head(&p_ftree->cn_guids_tbl); > + while( p_next_guid_element != (ftree_guid_tbl_element_t > *)cl_qmap_end(&p_ftree->cn_guids_tbl) ) > + { > + p_guid_element = p_next_guid_element; > + p_next_guid_element = (ftree_guid_tbl_element_t > *)cl_qmap_next(&p_guid_element->map_item ); > + osm_log( &p_ftree->p_osm->log, OSM_LOG_DEBUG, > + "__osm_ftree_fabric_read_guid_files: " > + "compute node guid 0x%016" PRIx64 "\n", > + p_guid_element->guid ); > + } > + } > + } > + CL_ASSERT(cl_is_list_empty(&guid_list)); > + > + Exit: > + OSM_LOG_EXIT(&p_ftree->p_osm->log); > + cl_list_destroy(&guid_list); > + return status; > +} /*__osm_ftree_fabric_read_guid_files() */ > + > +/*************************************************** > + ***************************************************/ > + > static int __osm_ftree_construct_fabric( > IN void * context) > @@ -2947,6 +3167,18 @@ __osm_ftree_construct_fabric( > goto Exit; > } > > + osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE, > + "__osm_ftree_construct_fabric: " > + "Reading guid files provided by user\n"); > + if (__osm_ftree_fabric_read_guid_files(p_ftree) != 0) > + { > + osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS, > + "Failed reading guid files - " > + "falling back to default routing\n"); > + status = -1; > + goto Exit; > + } > + > osm_log(&p_ftree->p_osm->log, > OSM_LOG_VERBOSE,"__osm_ftree_construct_fabric: \n" > " > |----------------------------------------|\n" > " |- Starting FatTree fabric > construction -|\n" From kliteyn at dev.mellanox.co.il Thu Jun 14 07:00:06 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 14 Jun 2007 17:00:06 +0300 Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node guid files options for fat-tree In-Reply-To: <20070614135717.GE5908@sashak.voltaire.com> References: <4670FA2D.7070708@dev.mellanox.co.il> <20070614121501.GC5908@sashak.voltaire.com> <4671363F.6060600@dev.mellanox.co.il> <46713FC7.3030104@dev.mellanox.co.il> <20070614135717.GE5908@sashak.voltaire.com> Message-ID: <467149E6.80606@dev.mellanox.co.il> Sasha Khapyorsky wrote: > On 16:16 Thu 14 Jun , Yevgeny Kliteynik wrote: >> Hi Sasha, >> >> Yevgeny Kliteynik wrote: >>> Sasha Khapyorsky wrote: >>>> Hi Yevgeny, >>>> >>>> On 11:19 Thu 14 Jun , Yevgeny Kliteynik wrote: >>>>> The following three patches are adding root and compute node guid files >>>>> options for fat-tree routing, >>>> Is there any reason to not share root guids file option with up/down? >>> There are two new options for fat-tree: roots and compute nodes (CN). >>> These two will be very "tightly coupled" and would have more implication >>> on the routing than in case of up/dn roots. For instance, having root >>> file but not CN file means that the topology doesn't have to be pure >>> fat-tree, >>> but all the CAs are considered CNs and have to be on the same level of the >>> tree. >>> And there is similar implication of all the combinations of these two >>> options. >>> Because of this coupling I wanted to differentiate these two options from >>> the up/dn roots. >>> Thoughts? >>>> Also the way how root guids are handled (in both up/down and ftree) >>>> doesn't look very optimal - guids are loaded to dynamic list, the list >>>> is converted to map, this map is matched and root nodes are marked as >>>> roots. Isn't it would be easy just to mark root nodes during file parsing? >>> The only thing you can save here is converting list to map: >>> You have to parse the guids file anyway, and you have to build all the >>> fat-tree data structures anyway. So if you parse the file and fill the >>> map right away instead of filling the list first, you will save the >>> list2map conversion. >>> But then up/dn and fat-tree can't use the same function to parse the guid >>> file, >>> and since the list2map conversion is not a big deal (we're talking about >>> list > of roots, which is couple of hundreds of guids at max), I prefer to >>> leave it and not to use separate parsing functions for up/dn and fat-tree. >> Actually, I can do something else here: >> - parse guid file into list >> - populate fat-tree switches and CAs >> - scan guid list, and for each guid mark the matching node in the fat-tree >> maps >> >> Sounds OK? > > Yes, much better. > > Also there could be something like: > - populate fat-tree switches and CAs > - parse guid file, and for each guid mark the matching node (with > custom callback) > > But with your proposition it is not needed to touch the parser (and > up/down :)). OK, I'll rewrite it as I've described it. What about the rest of the patches? -- Yevgeny > Sasha > From sashak at voltaire.com Thu Jun 14 07:31:34 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 14 Jun 2007 17:31:34 +0300 Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node guid files options for fat-tree In-Reply-To: <467149E6.80606@dev.mellanox.co.il> References: <4670FA2D.7070708@dev.mellanox.co.il> <20070614121501.GC5908@sashak.voltaire.com> <4671363F.6060600@dev.mellanox.co.il> <46713FC7.3030104@dev.mellanox.co.il> <20070614135717.GE5908@sashak.voltaire.com> <467149E6.80606@dev.mellanox.co.il> Message-ID: <20070614143134.GF5908@sashak.voltaire.com> On 17:00 Thu 14 Jun , Yevgeny Kliteynik wrote: > >> Actually, I can do something else here: > >> - parse guid file into list > >> - populate fat-tree switches and CAs > >> - scan guid list, and for each guid mark the matching node in the > >> fat-tree maps > >> > >> Sounds OK? > > Yes, much better. > > Also there could be something like: > > - populate fat-tree switches and CAs > > - parse guid file, and for each guid mark the matching node (with > > custom callback) > > But with your proposition it is not needed to touch the parser (and > > up/down :)). > > OK, I'll rewrite it as I've described it. > What about the rest of the patches? Basically looks fine. Just small nits: there are trailing white spaces (you can use 'git-diff --color' in order to see it or apply the patch with 'git-am --whitespace=...'), it is helpful to have descriptive per patch subjects in emails (git-am gets this as patch summary) - git-format-patch is useful there. Sasha From mshefty at ichips.intel.com Thu Jun 14 09:20:54 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 14 Jun 2007 09:20:54 -0700 Subject: [ofa-general] crash in ipoib In-Reply-To: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> Message-ID: <46716AE6.9050804@ichips.intel.com> Here's the capture from the network console <5> [...network console startup...] <5> Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: <5> <4>Warning: kfree_skb on hard IRQ ffffffff802bb055 <5> Warning: kfree_skb on hard IRQ ffffffff802bb055 <5> Warning: kfree_skb on hard IRQ ffffffff802bb055 <5> Warning: kfree_skb on hard IRQ ffffffff802bb055 <5> {:ib_ipoib:ipoib_cm_handle_rx_wc+378} <5> PML4 dcc2f067 PGD 102087067 PMD 0 <5> Oops: 0002 [1] SMP <5> CPU 1 <5> Modules linked in: netconsole det(U) nfs lockd nfs_acl autofs4 i2c_dev i2c_core sunrpc rdma_ucm(U) ib_vnic(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_local_sa(U) ib_ipath(U) ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mirror dm_mod button battery ac joydev uhci_hcd ehci_hcd hw_random ib_mthca(U) ib_ipoib(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) md5 ipv6 e1000(U) ahci ext3 jbd ata_piix libata sd_mod scsi_mod <5> Pid: 1584, comm: ib_cm/1 Tainted: PF 2.6.9-42.ELsmp <5> RIP: 0010:[] {:ib_ipoib:ipoib_cm_handle_rx_wc+378} <5> RSP: 0018:0000010005d7b940 EFLAGS: 00010046 <5> RAX: 0000000000000000 RBX: 000001010d3a8e00 RCX: 0000000000000000 <5> RDX: 000001010d3a8e10 RSI: 00000101191b3990 RDI: 00000101191b3380 <5> RBP: 000001011302b680 R08: 0000000000000010 R09: 0000010119301e00 <5> R10: 000000000000001f R11: 00000000000000e4 R12: 0000000000000206 <5> R13: 00000101191b3380 R14: 00000101191b3000 R15: 0000000000000030 <5> FS: 0000000000000000(0000) GS:ffffffff804e5100(0000) knlGS:0000000000000000 <5> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <5> CR2: 0000000000000008 CR3: 0000000005d68000 CR4: 00000000000006e0 <5> Process ib_cm/1 (pid: 1584, threadinfo 0000010119c14000, task 000001011a9c5030) <5> Stack: 0000000000000206 0000000000000030 0000000000000206 0000010110e8fb00 <5> 00000101191b37b8 0000000000000206 00000000dc62401c 0000000400000206 <5> 0100000082000001 000000041a9121c0 <5> Call Trace: {:ib_ipoib:ipoib_ib_completion+144} <5> {:ib_mthca:mthca_tavor_interrupt+95} <5> {:ib_mthca:mthca_eq_int+221} {do_IRQ+266} <5> {:ib_mthca:mthca_tavor_interrupt+95} <5> {handle_IRQ_event+41} {do_IRQ+197} <5> {ret_from_intr+0} {csum_partial+1209} <5> {skb_checksum+308} {:ip_conntrack:tcp_error+312} <5> {:ip_conntrack:ip_conntrack_in+163} <5> {ip_local_deliver_finish+0} {nf_hook_slow+184} <5> {nf_iterate+82} {ip_rcv_finish+0} <5> {nf_hook_slow+116} {ip_rcv_finish+0} <5> {ip_rcv+1119} {netif_receive_skb+791} <5> {process_backlog+136} {net_rx_action+203} <5> {__do_softirq+88} {do_softirq+49} <5> {do_IRQ+328} {ret_from_intr+0} <5> {_spin_unlock_irqrestore+47} <5> {:ib_cm:ib_send_cm_rep+812} {:ib_ipoib:ipoib_cm_rx_handler+821} <5> {:ib_ipoib:ipoib_cm_rx_event_handler+0} <5> {:ib_core:ib_find_cached_pkey+192} <5> {:ib_cm:cm_process_work+101} {:ib_cm:cm_req_handler+2398} <5> {:ib_cm:cm_work_handler+0} {:ib_cm:cm_work_handler+46} <5> {worker_thread+419} {default_wake_function+0} <5> {__wake_up_common+67} {default_wake_function+0} <5> {keventd_create_kthread+0} {worker_thread+0} <5> {keventd_create_kthread+0} {kthread+200} <5> {child_rip+8} {keventd_create_kthread+0} <5> {kthread+0} {child_rip+0} <5> <5> <5> Code: 48 89 48 08 48 89 01 49 8b 86 90 09 00 00 48 89 50 08 48 89 <5> RIP {:ib_ipoib:ipoib_cm_handle_rx_wc+378} RSP <0000010005d7b940> <5> CR2: 0000000000000008 <5> <0>Kernel panic - not syncing: Oops <5> Badness in panic at kernel/panic.c:118 <5> <5> Call Trace: {panic+527} {__mod_timer+293} <5> {complement_pos+12} {vgacon_cursor+213} <5> {vgacon_cursor+0} {bust_spinlocks+62} <5> {oops_end+65} {do_page_fault+1204} <5> {:ib_mthca:mthca_tavor_post_srq_recv+839} <5> {:ib_ipoib:ipoib_cm_post_receive+119} <5> {cache_alloc_refill+390} {error_exit+0} <5> {:ib_ipoib:ipoib_cm_handle_rx_wc+378} <5> {:ib_ipoib:ipoib_ib_completion+144} <5> {:ib_mthca:mthca_tavor_interrupt+95} <5> {:ib_mthca:mthca_eq_int+221} {do_IRQ+266} <5> {:ib_mthca:mthca_tavor_interrupt+95} <5> {handle_IRQ_event+41} {do_IRQ+197} <5> {ret_from_intr+0} {csum_partial+1209} <5> {skb_checksum+308} {:ip_conntrack:tcp_error+312} <5> {:ip_conntrack:ip_conntrack_in+163} <5> {ip_local_deliver_finish+0} {nf_hook_slow+184} <5> {nf_iterate+82} {ip_rcv_finish+0} <5> {nf_hook_slow+116} {ip_rcv_finish+0} <5> {ip_rcv+1119} {netif_receive_skb+791} <5> {process_backlog+136} {net_rx_action+203} <5> {__do_softirq+88} {do_softirq+49} <5> {do_IRQ+328} {ret_from_intr+0} <5> {_spin_unlock_irqrestore+47} <5> {:ib_cm:ib_send_cm_rep+812} {:ib_ipoib:ipoib_cm_rx_handler+821} <5> {:ib_ipoib:ipoib_cm_rx_event_handler+0} <5> {:ib_core:ib_find_cached_pkey+192} <5> {:ib_cm:cm_process_work+101} {:ib_cm:cm_req_handler+2398} <5> {:ib_cm:cm_work_handler+0} {:ib_cm:cm_work_handler+46} <5> {worker_thread+419} {default_wake_function+0} <5> {__wake_up_common+67} {default_wake_function+0} <5> {keventd_create_kthread+0} {worker_thread+0} <5> {keventd_create_kthread+0} {kthread+200} <5> {child_rip+8} {keventd_create_kthread+0} <5> {kthread+0} {child_rip+0} <5> <5> Badness in i8042_panic_blink at drivers/input/serio/i8042.c:987 <5> <5> Call Trace: {i8042_panic_blink+238} {panic+445} <5> {__mod_timer+293} {complement_pos+12} <5> {vgacon_cursor+213} {vgacon_cursor+0} <5> {bust_spinlocks+62} {oops_end+65} <5> {do_page_fault+1204} {:ib_mthca:mthca_tavor_post_srq_recv+839} <5> {:ib_ipoib:ipoib_cm_post_receive+119} <5> {cache_alloc_refill+390} {error_exit+0} <5> {:ib_ipoib:ipoib_cm_handle_rx_wc+378} <5> {:ib_ipoib:ipoib_ib_completion+144} <5> {:ib_mthca:mthca_tavor_interrupt+95} <5> {:ib_mthca:mthca_eq_int+221} {do_IRQ+266} <5> {:ib_mthca:mthca_tavor_interrupt+95} <5> {handle_IRQ_event+41} {do_IRQ+197} <5> {ret_from_intr+0} {csum_partial+1209} <5> {skb_checksum+308} {:ip_conntrack:tcp_error+312} <5> {:ip_conntrack:ip_conntrack_in+163} <5> {ip_local_deliver_finish+0} {nf_hook_slow+184} <5> {nf_iterate+82} {ip_rcv_finish+0} <5> {nf_hook_slow+116} {ip_rcv_finish+0} <5> {ip_rcv+1119} {netif_receive_skb+791} <5> {process_backlog+136} {net_rx_action+203} <5> {__do_softirq+88} {do_softirq+49} <5> {do_IRQ+328} {ret_from_intr+0} <5> {_spin_unlock_irqrestore+47} <5> {:ib_cm:ib_send_cm_rep+812} {:ib_ipoib:ipoib_cm_rx_handler+821} <5> {:ib_ipoib:ipoib_cm_rx_event_handler+0} <5> {:ib_core:ib_find_cached_pkey+192} <5> {:ib_cm:cm_process_work+101} {:ib_cm:cm_req_handler+2398} <5> {:ib_cm:cm_work_handler+0} {:ib_cm:cm_work_handler+46} <5> {worker_thread+419} {default_wake_function+0} <5> {__wake_up_common+67} {default_wake_function+0} <5> {keventd_create_kthread+0} {worker_thread+0} <5> {keventd_create_kthread+0} {kthread+200} <5> {child_rip+8} {keventd_create_kthread+0} <5> {kthread+0} {child_rip+0} <5> <5> Badness in i8042_panic_blink at drivers/input/serio/i8042.c:990 <5> <5> Call Trace: {i8042_panic_blink+384} {panic+445} <5> {__mod_timer+293} {complement_pos+12} <5> {vgacon_cursor+213} {vgacon_cursor+0} <5> {bust_spinlocks+62} {oops_end+65} <5> {do_page_fault+1204} {:ib_mthca:mthca_tavor_post_srq_recv+839} <5> {:ib_ipoib:ipoib_cm_post_receive+119} <5> {cache_alloc_refill+390} {error_exit+0} <5> {:ib_ipoib:ipoib_cm_handle_rx_wc+378} <5> {:ib_ipoib:ipoib_ib_completion+144} <5> {:ib_mthca:mthca_tavor_interrupt+95} <5> {:ib_mthca:mthca_eq_int+221} {do_IRQ+266} <5> {:ib_mthca:mthca_tavor_interrupt+95} <5> {handle_IRQ_event+41} {do_IRQ+197} <5> {ret_from_intr+0} {csum_partial+1209} <5> {skb_checksum+308} {:ip_conntrack:tcp_error+312} <5> {:ip_conntrack:ip_conntrack_in+163} <5> {ip_local_deliver_finish+0} {nf_hook_slow+184} <5> {nf_iterate+82} {ip_rcv_finish+0} <5> {nf_hook_slow+116} {ip_rcv_finish+0} <5> {ip_rcv+1119} {netif_receive_skb+791} <5> {process_backlog+136} {net_rx_action+203} <5> {__do_softirq+88} {do_softirq+49} <5> {do_IRQ+328} {ret_from_intr+0} <5> {_spin_unlock_irqrestore+47} <5> {:ib_cm:ib_send_cm_rep+812} {:ib_ipoib:ipoib_cm_rx_handler+821} <5> {:ib_ipoib:ipoib_cm_rx_event_handler+0} <5> {:ib_core:ib_find_cached_pkey+192} <5> {:ib_cm:cm_process_work+101} {:ib_cm:cm_req_handler+2398} <5> {:ib_cm:cm_work_handler+0} {:ib_cm:cm_work_handler+46} <5> {worker_thread+419} {default_wake_function+0} <5> {__wake_up_common+67} {default_wake_function+0} <5> {keventd_create_kthread+0} {worker_thread+0} <5> {keventd_create_kthread+0} {kthread+200} <5> {child_rip+8} {keventd_create_kthread+0} <5> {kthread+0} {child_rip+0} <5> <5> Badness in i8042_panic_blink at drivers/input/serio/i8042.c:992 <5> <5> Call Trace: {i8042_panic_blink+485} {panic+445} <5> {__mod_timer+293} {complement_pos+12} <5> {vgacon_cursor+213} {vgacon_cursor+0} <5> {bust_spinlocks+62} {oops_end+65} <5> {do_page_fault+1204} {:ib_mthca:mthca_tavor_post_srq_recv+839} <5> {:ib_ipoib:ipoib_cm_post_receive+119} <5> {cache_alloc_refill+390} {error_exit+0} <5> {:ib_ipoib:ipoib_cm_handle_rx_wc+378} <5> {:ib_ipoib:ipoib_ib_completion+144} <5> {:ib_mthca:mthca_tavor_interrupt+95} <5> {:ib_mthca:mthca_eq_int+221} {do_IRQ+266} <5> {:ib_mthca:mthca_tavor_interrupt+95} <5> {handle_IRQ_event+41} {do_IRQ+197} <5> {ret_from_intr+0} {csum_partial+1209} <5> {skb_checksum+308} {:ip_conntrack:tcp_error+312} <5> {:ip_conntrack:ip_conntrack_in+163} <5> {ip_local_deliver_finish+0} {nf_hook_slow+184} <5> {nf_iterate+82} {ip_rcv_finish+0} <5> {nf_hook_slow+116} {ip_rcv_finish+0} <5> {ip_rcv+1119} {netif_receive_skb+791} <5> {process_backlog+136} {net_rx_action+203} <5> {__do_softirq+88} {do_softirq+49} <5> {do_IRQ+328} {ret_from_intr+0} <5> {_spin_unlock_irqrestore+47} <5> {:ib_cm:ib_send_cm_rep+812} {:ib_ipoib:ipoib_cm_rx_handler+821} <5> {:ib_ipoib:ipoib_cm_rx_event_handler+0} <5> {:ib_core:ib_find_cached_pkey+192} <5> {:ib_cm:cm_process_work+101} {:ib_cm:cm_req_handler+2398} <5> {:ib_cm:cm_work_handler+0} {:ib_cm:cm_work_handler+46} <5> {worker_thread+419} {default_wake_function+0} <5> {__wake_up_common+67} {default_wake_function+0} <5> {keventd_create_kthread+0} {worker_thread+0} <5> {keventd_create_kthread+0} {kthread+200} <5> {child_rip+8} {keventd_create_kthread+0} <5> {kthread+0} {child_rip+0} <5> From mshefty at ichips.intel.com Thu Jun 14 09:39:25 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 14 Jun 2007 09:39:25 -0700 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <20070613174930.GE12277@mellanox.co.il> References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com> <466F36C8.5010507@linux.vnet.ibm.com> <20070613163821.GB12277@mellanox.co.il> <20070613174930.GE12277@mellanox.co.il> Message-ID: <46716F3D.7050206@ichips.intel.com> > Note this is not a full emulation, just close enough to make IPoIB CM work. If the emulation is only enough for IPoIB, then I think it belongs in IPoIB, and not in every HCA driver. - Sean From andrey.slepuhin at t-platforms.ru Thu Jun 14 09:41:06 2007 From: andrey.slepuhin at t-platforms.ru (Andrey Slepuhin) Date: Thu, 14 Jun 2007 20:41:06 +0400 Subject: [ofa-general] Problems with mlx4 In-Reply-To: References: <467005B9.8070708@t-platforms.ru> Message-ID: <46716FA2.7020805@t-platforms.ru> Hi Roland, I upgraded the switch FW to version 1.0 and applied your latest mlx4 patches, but I'm still in the same situation - the link is down. What else can go wrong? Thanks, Andrey Roland Dreier wrote: > > I just setup a test cluster using ConnectX cards, but I can not get > > link up. > > Most likely you need to update your switch FW. You need Anafa2 FW > version 1.0 to negotiate a DDR link with ConnectX. > > BTW what firmware version do you have on your HCAs? You probably want > to update to 2.0.156 (the mlx4 driver won't work with 2.0.158 for a > day or two still) so that you don't have to monkey around with > hard-coding your switch ports to DDR only. > > - R. > From rdreier at cisco.com Thu Jun 14 09:44:29 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 14 Jun 2007 09:44:29 -0700 Subject: [ofa-general] Problems with mlx4 In-Reply-To: <46716FA2.7020805@t-platforms.ru> (Andrey Slepuhin's message of "Thu, 14 Jun 2007 20:41:06 +0400") References: <467005B9.8070708@t-platforms.ru> <46716FA2.7020805@t-platforms.ru> Message-ID: > I upgraded the switch FW to version 1.0 and applied your latest mlx4 > patches, but I'm still in the same situation - the link is down. What > else can go wrong? Please read my whole email, especially this part: BTW what firmware version do you have on your HCAs? You probably want to update to 2.0.156 (the mlx4 driver won't work with 2.0.158 for a day or two still) so that you don't have to monkey around with hard-coding your switch ports to DDR only. From rdreier at cisco.com Thu Jun 14 09:48:10 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 14 Jun 2007 09:48:10 -0700 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <46716F3D.7050206@ichips.intel.com> (Sean Hefty's message of "Thu, 14 Jun 2007 09:39:25 -0700") References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com> <466F36C8.5010507@linux.vnet.ibm.com> <20070613163821.GB12277@mellanox.co.il> <20070613174930.GE12277@mellanox.co.il> <46716F3D.7050206@ichips.intel.com> Message-ID: > > Note this is not a full emulation, just close enough to make IPoIB CM work. > If the emulation is only enough for IPoIB, then I think it belongs in > IPoIB, and not in every HCA driver. I was thinking the same thing. Otherwise you're just setting a booby trap for someone who tries to use SRQ for something else. However it may be a good approach to put an abstraction layer in IPoIB so that the CM code can use an SRQ-like interface to both HCAs that support SRQ and HCAs that don't. - R. From mshefty at ichips.intel.com Thu Jun 14 10:01:31 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 14 Jun 2007 10:01:31 -0700 Subject: [ofa-general] crash in ipoib In-Reply-To: <46716AE6.9050804@ichips.intel.com> References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> <46716AE6.9050804@ichips.intel.com> Message-ID: <4671746B.2050903@ichips.intel.com> I don't know if this is the issue, but here's the code from ipoib_cm_req_handler(): ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); if (ret) { ipoib_warn(priv, "failed to send REP: %d\n", ret); goto err_rep; } cm_id->context = p; p->jiffies = jiffies; p->state = IPOIB_CM_RX_LIVE; spin_lock_irq(&priv->lock); if (list_empty(&priv->cm.passive_ids)) queue_delayed_work(ipoib_workqueue, &priv->cm.stale_task, IPOIB_CM_RX_DELAY); list_add(&p->list, &priv->cm.passive_ids); spin_unlock_irq(&priv->lock); Note that once the REP is sent, the QP is connected. Data can be received, we can have events, we can be disconnected, whatever... but we're not yet on the passive_ids list. - Sean From mst at dev.mellanox.co.il Thu Jun 14 10:35:23 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Jun 2007 20:35:23 +0300 Subject: [ofa-general] crash in ipoib In-Reply-To: <46716AE6.9050804@ichips.intel.com> References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> <46716AE6.9050804@ichips.intel.com> Message-ID: <20070614173522.GA29561@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [ofa-general] crash in ipoib > > Here's the capture from the network console Aha, cool. > <5> [...network console startup...] > <5> Unable to handle kernel NULL pointer dereference at 0000000000000008 > RIP: > <5> <4>Warning: kfree_skb on hard IRQ ffffffff802bb055 > <5> Warning: kfree_skb on hard IRQ ffffffff802bb055 > <5> Warning: kfree_skb on hard IRQ ffffffff802bb055 > <5> Warning: kfree_skb on hard IRQ ffffffff802bb055 Weird stuff, it looks like we are freeing an skb with a destructor. Where does ffffffff802bb055 point to? Since 2.6.12 we'd get a proper stack dump for this, but in 2.6.9 need to decode it manually. > <5> {:ib_ipoib:ipoib_cm_handle_rx_wc+378} > <5> PML4 dcc2f067 PGD 102087067 PMD 0 > <5> Oops: 0002 [1] SMP > <5> CPU 1 > <5> Modules linked in: netconsole det(U) nfs lockd nfs_acl autofs4 > i2c_dev i2c_core sunrpc rdma_ucm(U) ib_vnic(U) ib_sdp(U) rdma_cm(U) > iw_cm(U) ib_addr(U) ib_local_sa(U) ib_ipath(U) ipt_REJECT ipt_state > ip_conntrack iptable_filter ip_tables dm_mirror dm_mod button battery ac > joydev uhci_hcd ehci_hcd hw_random ib_mthca(U) ib_ipoib(U) ib_umad(U) > ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) md5 ipv6 > e1000(U) ahci ext3 jbd ata_piix libata sd_mod scsi_mod > <5> Pid: 1584, comm: ib_cm/1 Tainted: PF 2.6.9-42.ELsmp > <5> RIP: 0010:[] > {:ib_ipoib:ipoib_cm_handle_rx_wc+378} > <5> RSP: 0018:0000010005d7b940 EFLAGS: 00010046 > <5> RAX: 0000000000000000 RBX: 000001010d3a8e00 RCX: 0000000000000000 > <5> RDX: 000001010d3a8e10 RSI: 00000101191b3990 RDI: 00000101191b3380 > <5> RBP: 000001011302b680 R08: 0000000000000010 R09: 0000010119301e00 > <5> R10: 000000000000001f R11: 00000000000000e4 R12: 0000000000000206 > <5> R13: 00000101191b3380 R14: 00000101191b3000 R15: 0000000000000030 > <5> FS: 0000000000000000(0000) GS:ffffffff804e5100(0000) > knlGS:0000000000000000 > <5> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > <5> CR2: 0000000000000008 CR3: 0000000005d68000 CR4: 00000000000006e0 > <5> Process ib_cm/1 (pid: 1584, threadinfo 0000010119c14000, task > 000001011a9c5030) > <5> Stack: 0000000000000206 0000000000000030 0000000000000206 > 0000010110e8fb00 > <5> 00000101191b37b8 0000000000000206 00000000dc62401c > 0000000400000206 > <5> 0100000082000001 000000041a9121c0 Where does :ib_ipoib:ipoib_cm_handle_rx_wc+378 point to on your system? -- MST From xma at us.ibm.com Thu Jun 14 10:38:54 2007 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 14 Jun 2007 10:38:54 -0700 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: Message-ID: > > > Note this is not a full emulation, just close enough to make > IPoIB CM work. > > > If the emulation is only enough for IPoIB, then I think it belongs in > > IPoIB, and not in every HCA driver. > > I was thinking the same thing. Otherwise you're just setting a booby > trap for someone who tries to use SRQ for something else. > > However it may be a good approach to put an abstraction layer in IPoIB > so that the CM code can use an SRQ-like interface to both HCAs that > support SRQ and HCAs that don't. > > - R. That's an interesting point. How to explore different HCAs hardware features in ULPs is definitely worth to think about deeply. Thanks Shirley -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Thu Jun 14 10:47:08 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 14 Jun 2007 10:47:08 -0700 Subject: [ofa-general] crash in ipoib In-Reply-To: <20070614173522.GA29561@mellanox.co.il> References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> <46716AE6.9050804@ichips.intel.com> <20070614173522.GA29561@mellanox.co.il> Message-ID: <46717F1C.3010604@ichips.intel.com> > Where does :ib_ipoib:ipoib_cm_handle_rx_wc+378 point to on your system? It points to list_move below: if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) { p = wc->qp->qp_context; if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) { spin_lock_irqsave(&priv->lock, flags); p->jiffies = jiffies; /* Move this entry to list head, but do not re-add it * if it has been moved out of list. */ if (p->state == IPOIB_CM_RX_LIVE) >>> list_move(&p->list, priv->cm.passive_ids); spin_unlock_irqrestore(&priv->lock, flags); } } There appears to be a race in ipoib_cm_req_handler() setting the ipoib_cm_rx state outside of a lock, and before the item it added to a list. I think this could cause list_move() call above to oops. I think ipoib_cm_req_handler() needs changes, but I'm not sure if this is enough (patch below has line wrap issues...): @@ -291,16 +291,16 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, st if (ret) goto err_modify; + cm_id->context = p; ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); if (ret) { ipoib_warn(priv, "failed to send REP: %d\n", ret); goto err_rep; } - cm_id->context = p; p->jiffies = jiffies; - p->state = IPOIB_CM_RX_LIVE; spin_lock_irq(&priv->lock); + p->state = IPOIB_CM_RX_LIVE; if (list_empty(&priv->cm.passive_ids)) queue_delayed_work(ipoib_workqueue, &priv->cm.stale_task, IPOIB_CM_RX_DELAY); - Sean From mst at dev.mellanox.co.il Thu Jun 14 10:50:30 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Jun 2007 20:50:30 +0300 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com> <466F36C8.5010507@linux.vnet.ibm.com> <20070613163821.GB12277@mellanox.co.il> <20070613174930.GE12277@mellanox.co.il> <46716F3D.7050206@ichips.intel.com> Message-ID: <20070614175030.GB29561@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [ofa-general] Re: [PATCH draft,?untested] ehca srq emulation (for IPoIB CM) > > > > Note this is not a full emulation, just close enough to make IPoIB CM work. > > > If the emulation is only enough for IPoIB, then I think it belongs in > > IPoIB, and not in every HCA driver. "every HCA driver" is an exagerration: 1. ehca is the only one that does not support SRQ in hardware 2. emulation (and ipoib nosrq patches, too) work by assuming only a small number of connections and a huge amount of memory. This is true for systems where ehca is used but not in the general case > I was thinking the same thing. Otherwise you're just setting a booby > trap for someone who tries to use SRQ for something else. The emulation is quite close IMO - most likely it will just work, but if not, we can just document the limitations. In case a ULP wants to avoid using the emulation, we could have a "SRQ is emulated bit" to distinguish between these. > However it may be a good approach to put an abstraction layer in IPoIB > so that the CM code can use an SRQ-like interface to both HCAs that > support SRQ and HCAs that don't. 2 issues with this: 1. I think other ULPs can benefit from this emulation too. 2. The emulation does need help from hardware (e.g. I use a qp token in CQE for QP lookups and SRQ detection). Implementing it on top of exiting verbs can be done only if verbs interface is extended. -- MST From ahubbe at iol.unh.edu Thu Jun 14 10:56:43 2007 From: ahubbe at iol.unh.edu (Allen Hubbe) Date: Thu, 14 Jun 2007 13:56:43 -0400 (EDT) Subject: [ofa-general] memory leak in librdmacm, libibverbs Message-ID: I found a memory leak that is present in at least librdmacm and libibverbs. The libraries allow a user to get a device list, and later free the device list. In freeing the device list, the devices in the list are not freed, causing a memory leak. It would not be wise to free all the devices in the list, either, because the user very likely wants to continue using one of the devices that was returned in the list. I think the intent of the methods was for the list to live the life of the program, but that might not be the way it gets used. I included a short example program, run on a machine with devices present it will consume all available memory. --------------------------------------------------------------- #include #include #include int main(){ struct ibv_context **ibv_devices; int num_devices; ibv_devices = rdma_get_devices(&num_devices); if(ibv_devices == NULL) { printf("no devices found, exiting\n"); exit(1); } else { while(1) { rdma_free_devices(ibv_devices); ibv_devices = rdma_get_devices(NULL); } } return 0; } --------------------------------------------------------------- From rdreier at cisco.com Thu Jun 14 11:12:14 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 14 Jun 2007 11:12:14 -0700 Subject: [ofa-general] memory leak in librdmacm, libibverbs In-Reply-To: (Allen Hubbe's message of "Thu, 14 Jun 2007 13:56:43 -0400 (EDT)") References: Message-ID: > I found a memory leak that is present in at least librdmacm and > libibverbs. The libraries allow a user to get a device list, and later > free the device list. In freeing the device list, the devices in the list > are not freed, causing a memory leak. It would not be wise to free all > the devices in the list, either, because the user very likely wants to > continue using one of the devices that was returned in the list. I think > the intent of the methods was for the list to live the life of the > program, but that might not be the way it gets used. I don't see it. Both rdma_get_devices() and ibv_get_device_list() don't allocate anything beyond the list they return to the caller. The device structures are just allocated once when the libraries discover the devices. And rdma_free_devices() and ibv_free_device_list() both free exactly what the corresponding get function allocated. > I included a short example program, run on a machine with devices > present it will consume all available memory. I ran this program on a system where rdma_get_devices() reports 1 device found, and the memory used by the process does not increase after startup, even after running for a few minutes. - R. From rdreier at cisco.com Thu Jun 14 11:24:37 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 14 Jun 2007 11:24:37 -0700 Subject: [ofa-general] memory leak in librdmacm, libibverbs In-Reply-To: (Roland Dreier's message of "Thu, 14 Jun 2007 11:12:14 -0700") References: Message-ID: Please don't Cc: iwarplab at iol.unh.edu if I'm going to get a bounce about a subscribers-only list when I reply to your email. - R. From mst at dev.mellanox.co.il Thu Jun 14 11:44:45 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Jun 2007 21:44:45 +0300 Subject: [ofa-general] crash in ipoib In-Reply-To: <46717F1C.3010604@ichips.intel.com> References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> <46716AE6.9050804@ichips.intel.com> <20070614173522.GA29561@mellanox.co.il> <46717F1C.3010604@ichips.intel.com> Message-ID: <20070614184445.GC29561@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [ofa-general] crash in ipoib > > >Where does :ib_ipoib:ipoib_cm_handle_rx_wc+378 point to on your system? > > It points to list_move below: > > if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) { > p = wc->qp->qp_context; > if (p && time_after_eq(jiffies, p->jiffies + > IPOIB_CM_RX_UPDATE_TIME)) { > spin_lock_irqsave(&priv->lock, flags); > p->jiffies = jiffies; > /* Move this entry to list head, but do not > re-add it > * if it has been moved out of list. */ > if (p->state == IPOIB_CM_RX_LIVE) > >>> list_move(&p->list, > priv->cm.passive_ids); > spin_unlock_irqrestore(&priv->lock, flags); > } > } > > There appears to be a race in ipoib_cm_req_handler() setting the > ipoib_cm_rx state outside of a lock, and before the item it added to a > list. I think this could cause list_move() call above to oops. Hmm,yes, looks like you are right. > I think > ipoib_cm_req_handler() needs changes, but I'm not sure if this is enough > (patch below has line wrap issues...): > > @@ -291,16 +291,16 @@ static int ipoib_cm_req_handler(struct ib_cm_id > *cm_id, st > if (ret) > goto err_modify; > > + cm_id->context = p; > ret = ipoib_cm_send_rep(dev, cm_id, p->qp, > &event->param.req_rcvd, psn); > if (ret) { > ipoib_warn(priv, "failed to send REP: %d\n", ret); > goto err_rep; > } > > - cm_id->context = p; > p->jiffies = jiffies; > - p->state = IPOIB_CM_RX_LIVE; > spin_lock_irq(&priv->lock); > + p->state = IPOIB_CM_RX_LIVE; > if (list_empty(&priv->cm.passive_ids)) > queue_delayed_work(ipoib_workqueue, > &priv->cm.stale_task, > IPOIB_CM_RX_DELAY); I'm not sure this is enough. Maybe the following is needed? Can you test it? diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 076a0bb..2509bb8 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -320,12 +320,6 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even if (ret) goto err_modify; - ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); - if (ret) { - ipoib_warn(priv, "failed to send REP: %d\n", ret); - goto err_rep; - } - cm_id->context = p; p->jiffies = jiffies; p->state = IPOIB_CM_RX_LIVE; @@ -335,6 +329,13 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even &priv->cm.stale_task, IPOIB_CM_RX_DELAY); list_add(&p->list, &priv->cm.passive_ids); spin_unlock_irq(&priv->lock); + + ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); + if (ret) { + /* TODO: error handling is wrong here */ + ipoib_warn(priv, "failed to send REP: %d\n", ret); + goto err_rep; + } return 0; err_rep: -- MST From andrey.slepuhin at t-platforms.ru Thu Jun 14 11:45:47 2007 From: andrey.slepuhin at t-platforms.ru (Andrey Slepuhin) Date: Thu, 14 Jun 2007 22:45:47 +0400 Subject: [ofa-general] Problems with mlx4 In-Reply-To: References: <467005B9.8070708@t-platforms.ru> <46716FA2.7020805@t-platforms.ru> Message-ID: <46718CDB.2050809@t-platforms.ru> Aha, just got latest firmware tools from Mellanox with ConnectX support and realized that the firmware was 2.0.147... After upgrading (but to 2.0.158 - that's the only firmware revision I got from Mellanox) the link was initialized, so I started to build the userspace... Thanks, Roland! Best regards, Andrey Roland Dreier wrote: > > I upgraded the switch FW to version 1.0 and applied your latest mlx4 > > patches, but I'm still in the same situation - the link is down. What > > else can go wrong? > > Please read my whole email, especially this part: > > BTW what firmware version do you have on your HCAs? You probably want > to update to 2.0.156 (the mlx4 driver won't work with 2.0.158 for a > day or two still) so that you don't have to monkey around with > hard-coding your switch ports to DDR only. > From mst at dev.mellanox.co.il Thu Jun 14 12:08:37 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Jun 2007 22:08:37 +0300 Subject: [ofa-general] crash in ipoib In-Reply-To: <20070614184445.GC29561@mellanox.co.il> References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> <46716AE6.9050804@ichips.intel.com> <20070614173522.GA29561@mellanox.co.il> <46717F1C.3010604@ichips.intel.com> <20070614184445.GC29561@mellanox.co.il> Message-ID: <20070614190837.GA2207@mellanox.co.il> > I'm not sure this is enough. Maybe the following is needed? > Can you test it? And here's a version with error handling fixed. Sean, does this solve your crash? ---> Move RX to passive_list before sending a REP. Signed-off-by: Michael S. Tsirkin --- diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 076a0bb..2be2c76 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -320,12 +320,6 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even if (ret) goto err_modify; - ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); - if (ret) { - ipoib_warn(priv, "failed to send REP: %d\n", ret); - goto err_rep; - } - cm_id->context = p; p->jiffies = jiffies; p->state = IPOIB_CM_RX_LIVE; @@ -335,6 +329,13 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even &priv->cm.stale_task, IPOIB_CM_RX_DELAY); list_add(&p->list, &priv->cm.passive_ids); spin_unlock_irq(&priv->lock); + + ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); + if (ret) { + ipoib_warn(priv, "failed to send REP: %d\n", ret); + if (ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE)) + ipoib_warn(priv, "unable to move qp to error state\n"); + } return 0; err_rep: -- MST From rdreier at cisco.com Thu Jun 14 12:14:00 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 14 Jun 2007 12:14:00 -0700 Subject: [ofa-general] crash in ipoib In-Reply-To: <20070614190837.GA2207@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 14 Jun 2007 22:08:37 +0300") References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> <46716AE6.9050804@ichips.intel.com> <20070614173522.GA29561@mellanox.co.il> <46717F1C.3010604@ichips.intel.com> <20070614184445.GC29561@mellanox.co.il> <20070614190837.GA2207@mellanox.co.il> Message-ID: > + ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); > + if (ret) { > + ipoib_warn(priv, "failed to send REP: %d\n", ret); > + if (ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE)) > + ipoib_warn(priv, "unable to move qp to error state\n"); > + } So if sending a rep fails, this leaves p on the passive_ids list with state IPOIB_CM_RX_LIVE. Does it ever get cleaned up? The old code used to destroy the qp and free p if sending a REP failed. - R. From mshefty at ichips.intel.com Thu Jun 14 12:35:27 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 14 Jun 2007 12:35:27 -0700 Subject: [ofa-general] crash in ipoib In-Reply-To: <20070614190837.GA2207@mellanox.co.il> References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> <46716AE6.9050804@ichips.intel.com> <20070614173522.GA29561@mellanox.co.il> <46717F1C.3010604@ichips.intel.com> <20070614184445.GC29561@mellanox.co.il> <20070614190837.GA2207@mellanox.co.il> Message-ID: <4671987F.3080403@ichips.intel.com> > And here's a version with error handling fixed. > Sean, does this solve your crash? We'll test a patch once we can agree on it. It can take up to a day for us to hit this issue though. We had created the following to try, which leaves the error handling the same. Which approach do you prefer? @@ -291,16 +291,17 @@ static int ipoib_cm_req_handler(struct ib_cm_id if (ret) goto err_modify; + cm_id->context = p; + spin_lock_irq(&priv->lock); ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); if (ret) { + spin_unlock_irq(&priv->lock); ipoib_warn(priv, "failed to send REP: %d\n", ret); goto err_rep; } - cm_id->context = p; p->jiffies = jiffies; p->state = IPOIB_CM_RX_LIVE; - spin_lock_irq(&priv->lock); if (list_empty(&priv->cm.passive_ids)) queue_delayed_work(ipoib_workqueue, &priv->cm.stale_task, IPOIB_CM_RX_DELAY); - Sean From mst at dev.mellanox.co.il Thu Jun 14 13:15:38 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Jun 2007 23:15:38 +0300 Subject: [ofa-general] crash in ipoib In-Reply-To: References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> <46716AE6.9050804@ichips.intel.com> <20070614173522.GA29561@mellanox.co.il> <46717F1C.3010604@ichips.intel.com> <20070614184445.GC29561@mellanox.co.il> <20070614190837.GA2207@mellanox.co.il> Message-ID: <20070614201538.GB2207@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [ofa-general] crash in ipoib > > > + ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); > > + if (ret) { > > + ipoib_warn(priv, "failed to send REP: %d\n", ret); > > + if (ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE)) > > + ipoib_warn(priv, "unable to move qp to error state\n"); > > + } > > So if sending a rep fails, this leaves p on the passive_ids list with > state IPOIB_CM_RX_LIVE. Does it ever get cleaned up? Yes, in the usual way: upon the last wqe reached event. > The old code used to destroy the qp and free p if sending a REP failed. This was really a wrong thing to do - destroying QP connected to srq must be done with the draining procedure, in case the remote violates the protocol and sends us packets for this QPN. -- MST From mst at dev.mellanox.co.il Thu Jun 14 13:20:07 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 14 Jun 2007 23:20:07 +0300 Subject: [ofa-general] crash in ipoib In-Reply-To: <4671987F.3080403@ichips.intel.com> References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> <46716AE6.9050804@ichips.intel.com> <20070614173522.GA29561@mellanox.co.il> <46717F1C.3010604@ichips.intel.com> <20070614184445.GC29561@mellanox.co.il> <20070614190837.GA2207@mellanox.co.il> <4671987F.3080403@ichips.intel.com> Message-ID: <20070614202006.GC2207@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [ofa-general] crash in ipoib > > >And here's a version with error handling fixed. > >Sean, does this solve your crash? > > We'll test a patch once we can agree on it. It can take up to a day for > us to hit this issue though. > > We had created the following to try, which leaves the error handling the > same. Which approach do you prefer? > > @@ -291,16 +291,17 @@ static int ipoib_cm_req_handler(struct ib_cm_id > if (ret) > goto err_modify; > > + cm_id->context = p; > + spin_lock_irq(&priv->lock); > ret = ipoib_cm_send_rep(dev, cm_id, p->qp, > &event->param.req_rcvd, psn); > if (ret) { > + spin_unlock_irq(&priv->lock); > ipoib_warn(priv, "failed to send REP: %d\n", ret); > goto err_rep; > } > > - cm_id->context = p; > p->jiffies = jiffies; > p->state = IPOIB_CM_RX_LIVE; > - spin_lock_irq(&priv->lock); > if (list_empty(&priv->cm.passive_ids)) > queue_delayed_work(ipoib_workqueue, > &priv->cm.stale_task, > IPOIB_CM_RX_DELAY); > I think my patch is more correct, but just for the sake of testing yours should be sufficient as well. -- MST From pradeeps at linux.vnet.ibm.com Thu Jun 14 15:46:25 2007 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Thu, 14 Jun 2007 15:46:25 -0700 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <20070614175030.GB29561@mellanox.co.il> References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com> <466F36C8.5010507@linux.vnet.ibm.com> <20070613163821.GB12277@mellanox.co.il> <20070613174930.GE12277@mellanox.co.il> <46716F3D.7050206@ichips.intel.com> <20070614175030.GB29561@mellanox.co.il> Message-ID: <4671C541.4040503@linux.vnet.ibm.com> Michael S. Tsirkin wrote: >> Quoting Roland Dreier : >> Subject: Re: [ofa-general] Re: [PATCH draft,?untested] ehca srq emulation (for IPoIB CM) >> >> > > Note this is not a full emulation, just close enough to make IPoIB CM work. >> >> > If the emulation is only enough for IPoIB, then I think it belongs in >> > IPoIB, and not in every HCA driver. > > "every HCA driver" is an exagerration: > 1. ehca is the only one that does not support SRQ in hardware > 2. emulation (and ipoib nosrq patches, too) work by assuming only a > small number of connections and a huge amount of memory. > This is true for systems where ehca is used but not in the general case > Pushing the changes into the driver is a potential maintenance nightmare. How does one keep changes across layers in sync? That was the reason I strived to use common code in the NOSRQ case; at least as much as possible and all of it in IPoIB. In the emulation approach by apportioning off WRs across QPs, we will be sacrificing performance by dropping packets or returning an RNR on a really busy QP. As I see it, the alternative is to allocate a really big SRQ, even when there are very few QPs and wasting a lot of the unused WRs. Thus even with a small number of heavily used connections and huge amounts of memory we will not be able to derive the performance benefits that connected mode can potentially offer. >> I was thinking the same thing. Otherwise you're just setting a booby >> trap for someone who tries to use SRQ for something else. > > The emulation is quite close IMO - most likely it will just work, > but if not, we can just document the limitations. > > In case a ULP wants to avoid using the emulation, we could have a "SRQ is > emulated bit" to distinguish between these. > >> However it may be a good approach to put an abstraction layer in IPoIB >> so that the CM code can use an SRQ-like interface to both HCAs that >> support SRQ and HCAs that don't. > > 2 issues with this: > > 1. I think other ULPs can benefit from this emulation too. > 2. The emulation does need help from hardware (e.g. I use a qp token > in CQE for QP lookups and SRQ detection). > Implementing it on top of exiting verbs can be done > only if verbs interface is extended. Pradeep From friedman at ucla.edu Thu Jun 14 21:31:35 2007 From: friedman at ucla.edu (Scott A. Friedman) Date: Thu, 14 Jun 2007 21:31:35 -0700 Subject: [ofa-general] iWarp cxgb3 firmware Message-ID: <46721627.9000907@ucla.edu> Hi Is anyone using the cxgb3 module in rc4 or rc5? If so, where are you getting the correct firmware that it seems to want (4.2)? Chelsio is only distributing v4.1 on their web site. I would like to know since my iWarp nodes are currently stuck at rc3, whose cxgb3 needs version 4.0 Do these firmware versions make significant changes? Thanks, Scott From mst at dev.mellanox.co.il Thu Jun 14 22:18:46 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 15 Jun 2007 08:18:46 +0300 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <4671C541.4040503@linux.vnet.ibm.com> References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com> <466F36C8.5010507@linux.vnet.ibm.com> <20070613163821.GB12277@mellanox.co.il> <20070613174930.GE12277@mellanox.co.il> <46716F3D.7050206@ichips.intel.com> <20070614175030.GB29561@mellanox.co.il> <4671C541.4040503@linux.vnet.ibm.com> Message-ID: <20070615051846.GG2207@mellanox.co.il> > Pushing the changes into the driver is a potential maintenance > nightmare. How does one keep changes across layers in sync? We have different definitions of "across layers": in my code everything is kept inside ehca. I call it a maintenance nightmare when there's code in IPoIB that only ehca owners can test. > That was the reason I strived to use common code in the NOSRQ case; at > least as much as possible and all of it in IPoIB. And you ended up with a bigger patch. > In the emulation approach by apportioning off WRs across QPs, we will be > sacrificing performance by dropping packets or returning an RNR on a > really busy QP. As I see it, the alternative is to allocate a really big > SRQ, even when there are very few QPs and wasting a lot of the unused WRs. As I said, there are obvious performance optimisatons to implement. We can later add code in IPoIB that, for very large SRQ size, will post WRs on demand. But at least that will be common code that everyone can test. -- MST From vlad at lists.openfabrics.org Fri Jun 15 02:42:19 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Fri, 15 Jun 2007 02:42:19 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070615-0200 daily build status Message-ID: <20070615094219.3411CE6080B@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.12 Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.13 Passed on powerpc with linux-2.6.19 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.16 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.15 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.17 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.13 Passed on x86_64 with linux-2.6.21.1 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.14 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From hanafim.ctr at asc.hpc.mil Fri Jun 15 06:52:04 2007 From: hanafim.ctr at asc.hpc.mil (MAHMOUD HANAFI) Date: Fri, 15 Jun 2007 09:52:04 -0400 Subject: [ofa-general] OFED SRP Frame/MTU tunning Message-ID: <46729984.3070101@asc.hpc.mil> All, I would like to configure SRP to use frame size of 1k,it defaults to 2K. Is this an options that can be set/configured? Thanks, -- Mahmoud Hanafi Senior System Administrator ASC/MSRC www.asc.hpc.mil 2435 5th Street WPAFB, OHIO 45433 (937) 255-1536 From chas at cmf.nrl.navy.mil Fri Jun 15 07:33:29 2007 From: chas at cmf.nrl.navy.mil (chas williams - CONTRACTOR) Date: Fri, 15 Jun 2007 10:33:29 -0400 Subject: [ofa-general] OFED SRP Frame/MTU tunning In-Reply-To: <46729984.3070101@asc.hpc.mil> Message-ID: <200706151433.l5FEXT0X032144@cmf.nrl.navy.mil> In message <46729984.3070101 at asc.hpc.mil>,MAHMOUD HANAFI writes: >I would like to configure SRP to use frame size of 1k,it defaults to 2K. Is th >is an options that can >be set/configured? apply this patch. i should have made this a per login item though. --- a/drivers/infiniband/ulp/srp/ib_srp.c.orig 2006-12-21 14:15:33.728164124 -0500 +++ b/drivers/infiniband/ulp/srp/ib_srp.c 2006-12-21 15:26:44.234250010 -0500 @@ -83,6 +83,10 @@ MODULE_PARM_DESC(mellanox_workarounds, "Enable workarounds for Mellanox SRP target bugs if != 0"); +static int tavor_quirk = 0; +module_param_named(tavor_quirk, tavor_quirk, int, 0644); +MODULE_PARM_DESC(tavor_quirk, "Tavor performance quirk: limit MTU to 1K if > 0"); + static const u8 mellanox_oui[3] = { 0x00, 0x02, 0xc9 }; static void srp_add_one(struct ib_device *device); @@ -256,8 +260,14 @@ target->status = status; if (status) printk(KERN_ERR PFX "Got failed path rec status %d\n", status); - else + else { target->path = *pathrec; + if (tavor_quirk) { + if (target->path.mtu > IB_MTU_1024) + target->path.mtu = IB_MTU_1024; + } + } + complete(&target->done); } From jsquyres at cisco.com Fri Jun 15 08:11:20 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Fri, 15 Jun 2007 11:11:20 -0400 Subject: [ofa-general] http://git.openfabrics.org/ Message-ID: I notice that http://git.openfabrics.org/ shows the main OFA web site, but http://git.openfabrics.org/git/ shows all the git repositories. Can a redirect be installed such that http://git.openfabrics.org/ is automatically sent to http://git.openfabrics.org/git/? I think that would be a little more intuitive. Thanks! -- Jeff Squyres Cisco Systems From swise at opengridcomputing.com Fri Jun 15 08:27:51 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 15 Jun 2007 10:27:51 -0500 Subject: [ofa-general] iWarp cxgb3 firmware In-Reply-To: <46721627.9000907@ucla.edu> References: <46721627.9000907@ucla.edu> Message-ID: <4672AFF7.1020203@opengridcomputing.com> Scott A. Friedman wrote: > Hi > > Is anyone using the cxgb3 module in rc4 or rc5? If so, where are you > getting the correct firmware that it seems to want (4.2)? Chelsio is > only distributing v4.1 on their web site. I would like to know since my > iWarp nodes are currently stuck at rc3, whose cxgb3 needs version 4.0 > > Do these firmware versions make significant changes? > Unfortunately, yes, they do. -rc4 and beyond requires firwmare version 4.2 to fix some streaming mode->rdma mode connection transition fixes. And the interface between the driver and firmare changed which is why the requirement is there. -rc4 and beyond _will not_ work with anything less than 4.2 I pushed the changes into -rc4 to get them in before ofed-1.2 ships as this was a critical bug. The 4.2 firmware will be available this week from Chelsio. Contact your chelsio rep to get it. Perhaps you can get a pre-release version today... Steve. From mshefty at ichips.intel.com Fri Jun 15 08:49:55 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 15 Jun 2007 08:49:55 -0700 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <20070615051846.GG2207@mellanox.co.il> References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com> <466F36C8.5010507@linux.vnet.ibm.com> <20070613163821.GB12277@mellanox.co.il> <20070613174930.GE12277@mellanox.co.il> <46716F3D.7050206@ichips.intel.com> <20070614175030.GB29561@mellanox.co.il> <4671C541.4040503@linux.vnet.ibm.com> <20070615051846.GG2207@mellanox.co.il> Message-ID: <4672B523.50502@ichips.intel.com> > We have different definitions of "across layers": in my code everything is kept > inside ehca. I call it a maintenance nightmare when there's code in IPoIB that > only ehca owners can test. I disagree with the concept of adding this code into the lower level driver. Posting a receive buffer onto a QP after it gets a receive completion is something the ULP can and should do. SRQ support is optional. There's no reason why the no-SRQ code in IPoIB can't be tested on all HCAs. It's the SRQ code that requires specific hardware. - Sean From mst at dev.mellanox.co.il Fri Jun 15 09:07:09 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 15 Jun 2007 19:07:09 +0300 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <4672B523.50502@ichips.intel.com> References: <466F36C8.5010507@linux.vnet.ibm.com> <20070613163821.GB12277@mellanox.co.il> <20070613174930.GE12277@mellanox.co.il> <46716F3D.7050206@ichips.intel.com> <20070614175030.GB29561@mellanox.co.il> <4671C541.4040503@linux.vnet.ibm.com> <20070615051846.GG2207@mellanox.co.il> <4672B523.50502@ichips.intel.com> Message-ID: <20070615160709.GK2207@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [ofa-general] Re: [PATCH draft,?untested] ehca srq emulation (for IPoIB CM) > > >We have different definitions of "across layers": in my code everything is > >kept > >inside ehca. I call it a maintenance nightmare when there's code in IPoIB > >that > >only ehca owners can test. > > I disagree with the concept of adding this code into the lower level > driver. Posting a receive buffer onto a QP after it gets a receive > completion is something the ULP can and should do. > > SRQ support is optional. There's no reason why the no-SRQ code in IPoIB > can't be tested on all HCAs. It's the SRQ code that requires specific > hardware. Basically, I think that because of lack of SW level flow control, generally IPoIB CM without SRQ does not make sense because of the scalabilty problems. However, the argument for adding this protocol revolves around the claim that ehca (the only low level driver without SRQ that we have) is used on systems with huge amount of memory and a small number of nodes. -- MST From mst at dev.mellanox.co.il Fri Jun 15 09:09:24 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 15 Jun 2007 19:09:24 +0300 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com> <466F36C8.5010507@linux.vnet.ibm.com> <20070613163821.GB12277@mellanox.co.il> <20070613174930.GE12277@mellanox.co.il> <46716F3D.7050206@ichips.intel.com> Message-ID: <20070615160924.GL2207@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [ofa-general] Re: [PATCH draft,?untested] ehca srq emulation (for IPoIB CM) > > > > Note this is not a full emulation, just close enough to make IPoIB CM work. > > > If the emulation is only enough for IPoIB, then I think it belongs in > > IPoIB, and not in every HCA driver. > > I was thinking the same thing. Otherwise you're just setting a booby > trap for someone who tries to use SRQ for something else. Would adding "wrs per qp" in srq attr structure solve this? > However it may be a good approach to put an abstraction layer in IPoIB > so that the CM code can use an SRQ-like interface to both HCAs that > support SRQ and HCAs that don't. If you are thinking about something like what was done to solve ipath DMA problems, I'm for it. This will likely require minor extensions to verbs API, like DMA thing did. -- MST From Kapil.Dukle at med.ge.com Fri Jun 15 09:21:16 2007 From: Kapil.Dukle at med.ge.com (Dukle, Kapil (GE Healthcare)) Date: Fri, 15 Jun 2007 12:21:16 -0400 Subject: [ofa-general] Infiniband data transfer across different IB drivers Message-ID: Hi, I am currently experimenting with Infiniband data transfers across two servers with different operating systems and IB drivers. Server A runs VxWorks 5.5 and uses Mellanox IB driver modules and VAPI interface Server B runs Linux 2.6.x and uses OFED 1.0 drivers and the OFED Verbs API Problem: I have written code (that makes the respective Verbs calls) to setup queue pairs and initialize them with the destination queue pair number and lid. The connection type is IBV_QPT_RC (Reliable Connection). The traces seem to confirm that the destination qpn, lid values are correct. The next thing I try to do is to post send requests on Server A, and receive requests on Server B. I then check the respective completion queues for any events. The problem is that I do NOT see any completion events on the receive completion queue for Server B. Questions: - Are these two drivers (Mellanox VAPI and OFED) compatible with each other in the first place? - Is it possible to verify the two queue pairs are indeed "connected" to each other? - Can I enable some debug mechanism at the driver level to see what the send/receive requests translate to, and what the underlying errors could be (if any)? Here is some information about the network that may help: [root at ServerB ~]# ps -elf | grep opensm 4 S root 2695 1 0 32 - - 14738 stext Jun14 ? 00:00:00 /usr/local/ofed/bin/opensm -t 200 -g 0 0 S root 12030 11992 0 76 0 - 13981 pipe_w 11:18 pts/1 00:00:00 grep opensm [root at ServerB ~]# sminfo sminfo: sm lid 0x1 sm guid 0x2c90200212251, activity count 40926 priority 1 state SMINFO_MASTER 3 [root at ServerB ~]# ibnetdiscover -v [1] {0002c90200212250} DR path [0][1] -> new remote ca {00d01c000001010a} portnum 2 lid 0x2-0x2 "ServerA HCA-1 (Topspin HCA)" [2] {00d01c000001010a} # # Topology file: generated on Fri Jun 15 11:05:52 2007 # # Max of 1 hops discovered # Initiated from node 0002c90200212250 port 0002c90200212251 vendid=0xd01c devid=0x5a44 sysimgguid=0xd01c000001010a caguid=0xd01c000001010a Ca 2 "H-00d01c000001010a" # ServerA HCA-1 (Topspin HCA) [2] "H-0002c90200212250"[1] # lid 2 lmc 0 vendid=0x2c9 devid=0x5a44 sysimgguid=0x2c90200212253 caguid=0x2c90200212250 Ca 2 "H-0002c90200212250" # ServerB HCA-1 [1] "H-00d01c000001010a"[2] # lid 1 lmc 0 [root at ServerB ~]# ibcheckstate -v # Checking Ca: nodeguid 0x00d01c000001010a Node check lid 2: OK Port check lid 2 port 2: OK # Checking Ca: nodeguid 0x0002c90200212250 Node check lid 1: OK Port check lid 1 port 1: OK ## Summary: 2 nodes checked, 0 bad nodes found ## 2 ports checked, 0 ports with bad state found [root at ServerB ~]# ibnodes -v Ca : 0x00d01c000001010a ports 2 "ServerA HCA-1 (Topspin HCA)" Ca : 0x0002c90200212250 ports 2 "ServerB HCA-1" Please let me know if you need any other information. Thanks in advance, Kapil -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Fri Jun 15 09:28:19 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 15 Jun 2007 09:28:19 -0700 Subject: [ofa-general] crash in ipoib In-Reply-To: <20070614190837.GA2207@mellanox.co.il> References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com> <46716AE6.9050804@ichips.intel.com> <20070614173522.GA29561@mellanox.co.il> <46717F1C.3010604@ichips.intel.com> <20070614184445.GC29561@mellanox.co.il> <20070614190837.GA2207@mellanox.co.il> Message-ID: <4672BE23.3050809@ichips.intel.com> > And here's a version with error handling fixed. > Sean, does this solve your crash? We've been running this patch since yesterday and haven't seen any crashes. We'll continue testing this over the week-end. - Sean From sean.hefty at intel.com Fri Jun 15 09:34:55 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 15 Jun 2007 09:34:55 -0700 Subject: [ofa-general] [PATCH] for-2.6.23 ib/umad: add partition support Message-ID: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> In order to support multiple partitions, user_mad needs to handle different pkey's. PKeys must be specified by the user when sending and receiving MADs. This bumps the ABI. Signed-off-by: Sean Hefty --- If there are no objections, I will queue this patch for 2.6.23, and request a pull when 2.6.23 is closer. drivers/infiniband/core/user_mad.c | 5 +++-- include/rdma/ib_user_mad.h | 4 +++- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index d97ded2..b0128fa 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -228,6 +228,7 @@ static void recv_handler(struct ib_mad_agent *agent, packet->mad.hdr.lid = cpu_to_be16(mad_recv_wc->wc->slid); packet->mad.hdr.sl = mad_recv_wc->wc->sl; packet->mad.hdr.path_bits = mad_recv_wc->wc->dlid_path_bits; + packet->mad.hdr.pkey_index = mad_recv_wc->wc->pkey_index; packet->mad.hdr.grh_present = !!(mad_recv_wc->wc->wc_flags & IB_WC_GRH); if (packet->mad.hdr.grh_present) { struct ib_ah_attr ah_attr; @@ -503,8 +504,8 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf, data_len = count - sizeof (struct ib_user_mad) - hdr_len; packet->msg = ib_create_send_mad(agent, be32_to_cpu(packet->mad.hdr.qpn), - 0, rmpp_active, hdr_len, - data_len, GFP_KERNEL); + packet->mad.hdr.pkey_index, rmpp_active, + hdr_len, data_len, GFP_KERNEL); if (IS_ERR(packet->msg)) { ret = PTR_ERR(packet->msg); goto err_ah; diff --git a/include/rdma/ib_user_mad.h b/include/rdma/ib_user_mad.h index d66b15e..e7bf6fa 100644 --- a/include/rdma/ib_user_mad.h +++ b/include/rdma/ib_user_mad.h @@ -43,7 +43,7 @@ * Increment this value if any changes that break userspace ABI * compatibility are made. */ -#define IB_USER_MAD_ABI_VERSION 5 +#define IB_USER_MAD_ABI_VERSION 6 /* * Make sure that all structs defined in this file remain laid out so @@ -88,6 +88,8 @@ struct ib_user_mad_hdr { __u8 traffic_class; __u8 gid[16]; __be32 flow_label; + __u16 pkey_index; + __u8 reserved[6]; }; /** -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 4370 bytes Desc: not available URL: From halr at voltaire.com Fri Jun 15 09:36:41 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Jun 2007 12:36:41 -0400 Subject: [ofa-general] Re: [PATCH] opensm/osm_helper.c: fixing PortInfo CapMask printing In-Reply-To: <20070614113757.GA5908@sashak.voltaire.com> References: <20070614113757.GA5908@sashak.voltaire.com> Message-ID: <1181925385.5681.364065.camel@hal.voltaire.com> On Thu, 2007-06-14 at 07:37, Sasha Khapyorsky wrote: > When PortInfo:CapMask is zero, non-initialized local buffer (garbage) > is printed. There is the fix. > > Signed-off-by: Sasha Khapyorsky Good find. Thanks. Applied. -- Hal From pradeeps at linux.vnet.ibm.com Fri Jun 15 09:39:56 2007 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Fri, 15 Jun 2007 09:39:56 -0700 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <20070615051846.GG2207@mellanox.co.il> References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com> <466F36C8.5010507@linux.vnet.ibm.com> <20070613163821.GB12277@mellanox.co.il> <20070613174930.GE12277@mellanox.co.il> <46716F3D.7050206@ichips.intel.com> <20070614175030.GB29561@mellanox.co.il> <4671C541.4040503@linux.vnet.ibm.com> <20070615051846.GG2207@mellanox.co.il> Message-ID: <4672C0DC.8060308@linux.vnet.ibm.com> Michael S. Tsirkin wrote: >> Pushing the changes into the driver is a potential maintenance >> nightmare. How does one keep changes across layers in sync? > > We have different definitions of "across layers": in my code everything is kept > inside ehca. I call it a maintenance nightmare when there's code in IPoIB that > only ehca owners can test. > >> That was the reason I strived to use common code in the NOSRQ case; at >> least as much as possible and all of it in IPoIB. > > And you ended up with a bigger patch. > >> In the emulation approach by apportioning off WRs across QPs, we will be >> sacrificing performance by dropping packets or returning an RNR on a >> really busy QP. As I see it, the alternative is to allocate a really big >> SRQ, even when there are very few QPs and wasting a lot of the unused WRs. > > As I said, there are obvious performance optimisatons to implement. > We can later add code in IPoIB that, for very large SRQ size, > will post WRs on demand. But at least that will be common code > that everyone can test. > Micheal, That is exactly the point. I made some decisions that you may not agree with entirely. Each solution has its benefits and draw backs. I feel that for a "performance related patch", performance should be one of the most important attributes. Some of the other issues are secondary. Here is a patch that is working and tested on multiple HCAs. If you feel it needs to be embellished in certain ways, sure go ahead and incorporate changes on top of my patch. After all this is open source development. At the same time I would have reservations about a patch that takes a performance hit even though it may have other desirable attributes. I have already incorporated several of your valuable suggestions into this patch, even though I did not agree with all of them. I see no need for us to take opposite sides on every issue, but rather we should work more constructively. This issue has dragged on for weeks without much forward progress. We need to make some decisions and close out this issue at the earliest. Pradeep From robert.j.woodruff at intel.com Fri Jun 15 09:43:51 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Fri, 15 Jun 2007 09:43:51 -0700 Subject: [ofa-general] crash in ipoib In-Reply-To: <4672BE23.3050809@ichips.intel.com> Message-ID: Sean wrote, >> And here's a version with error handling fixed. >> Sean, does this solve your crash? >We've been running this patch since yesterday and haven't seen any >crashes. We'll continue testing this over the week-end. >- Sean This looks like it fixed the panic. Should we try to put out a new RC with this latest ipoib fix ? I really think we need it in the release. If we could get another RC out today, that would only delay the release by a couple of more days and we could release on next Friday rather than wed. and still give people a week to test the final RC. woody From halr at voltaire.com Fri Jun 15 09:54:55 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Jun 2007 12:54:55 -0400 Subject: [ofa-general] OFED SRP Frame/MTU tunning In-Reply-To: <200706151433.l5FEXT0X032144@cmf.nrl.navy.mil> References: <200706151433.l5FEXT0X032144@cmf.nrl.navy.mil> Message-ID: <1181926495.5681.365291.camel@hal.voltaire.com> On Fri, 2007-06-15 at 10:33, chas williams - CONTRACTOR wrote: > In message <46729984.3070101 at asc.hpc.mil>,MAHMOUD HANAFI writes: > >I would like to configure SRP to use frame size of 1k,it defaults to 2K. Is th > >is an options that can > >be set/configured? > > apply this patch. i should have made this a per login item though. If you are running OpenSM, you don't need this if you set enable_quirks in opensm.opts. -- Hal > > --- a/drivers/infiniband/ulp/srp/ib_srp.c.orig 2006-12-21 14:15:33.728164124 -0500 > +++ b/drivers/infiniband/ulp/srp/ib_srp.c 2006-12-21 15:26:44.234250010 -0500 > @@ -83,6 +83,10 @@ > MODULE_PARM_DESC(mellanox_workarounds, > "Enable workarounds for Mellanox SRP target bugs if != 0"); > > +static int tavor_quirk = 0; > +module_param_named(tavor_quirk, tavor_quirk, int, 0644); > +MODULE_PARM_DESC(tavor_quirk, "Tavor performance quirk: limit MTU to 1K if > 0"); > + > static const u8 mellanox_oui[3] = { 0x00, 0x02, 0xc9 }; > > static void srp_add_one(struct ib_device *device); > @@ -256,8 +260,14 @@ > target->status = status; > if (status) > printk(KERN_ERR PFX "Got failed path rec status %d\n", status); > - else > + else { > target->path = *pathrec; > + if (tavor_quirk) { > + if (target->path.mtu > IB_MTU_1024) > + target->path.mtu = IB_MTU_1024; > + } > + } > + > complete(&target->done); > } > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From sean.hefty at intel.com Fri Jun 15 09:59:04 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 15 Jun 2007 09:59:04 -0700 Subject: [ofa-general] [PATCH 1/2] libibumad: fix partition support In-Reply-To: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> Message-ID: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com> Allow sending MADs on different partitions. This requires kernel support, so requires an ABI bump. This patch maintains support for the previous ABI. Clarify that umad_set_pkey() takes a pkey index, and not the pkey itself. (Unfortunately, the call is used both ways in the management tree.) Signed-off-by: Sean Hefty --- Additional changes are needed to retrieve the PKey and GID tables, so that the PKeys and GIDs can be converted to the correct index. These will come in future patches. doc/libibumad.txt | 2 libibumad/include/infiniband/umad.h | 7 + libibumad/src/umad.c | 192 +++++++++++++++++++++++++++-------- 3 files changed, 156 insertions(+), 45 deletions(-) diff --git a/doc/libibumad.txt b/doc/libibumad.txt index 7b2b4f4..4e37e60 100644 --- a/doc/libibumad.txt +++ b/doc/libibumad.txt @@ -336,7 +336,7 @@ the given host ordered fields. Return 0 on success, -1 on errors. umad_set_pkey: Synopsis: - int umad_set_pkey(void *umad, int pkey); + int umad_set_pkey(void *umad, int pkey_index); Description: Set the pkey within the 'umad' buffer. Return 0 on success, -1 on errors. diff --git a/libibumad/include/infiniband/umad.h b/libibumad/include/infiniband/umad.h old mode 100644 new mode 100755 index 9020649..9369d95 --- a/libibumad/include/infiniband/umad.h +++ b/libibumad/include/infiniband/umad.h @@ -60,6 +60,8 @@ typedef struct ib_mad_addr { uint8_t traffic_class; uint8_t gid[16]; uint32_t flow_label; + uint16_t pkey_index; + uint8_t reserved[6]; } ib_mad_addr_t; typedef struct ib_user_mad { @@ -72,7 +74,8 @@ typedef struct ib_user_mad { uint8_t data[0]; } ib_user_mad_t; -#define IB_UMAD_ABI_VERSION 5 +#define IB_UMAD_MIN_ABI_VERSION 5 +#define IB_UMAD_MAX_ABI_VERSION 6 #define IB_UMAD_ABI_DIR "/sys/class/infiniband_mad" #define IB_UMAD_ABI_FILE "abi_version" @@ -167,7 +170,7 @@ int umad_set_grh_net(void *umad, void *mad_addr); int umad_set_grh(void *umad, void *mad_addr); int umad_set_addr_net(void *umad, int dlid, int dqp, int sl, int qkey); int umad_set_addr(void *umad, int dlid, int dqp, int sl, int qkey); -int umad_set_pkey(void *umad, int pkey); +int umad_set_pkey(void *umad, int pkey_index); int umad_send(int portid, int agentid, void *umad, int length, int timeout_ms, int retries); diff --git a/libibumad/src/umad.c b/libibumad/src/umad.c old mode 100644 new mode 100755 index 5f9b36b..c750fe0 --- a/libibumad/src/umad.c +++ b/libibumad/src/umad.c @@ -69,6 +69,7 @@ int umaddebug = 0; #define UMAD_DEV_NAME_SZ 32 #define UMAD_DEV_FILE_SZ 256 +static uint abi_version; static char *def_ca_name = "mthca0"; static int def_ca_port = 1; @@ -82,6 +83,31 @@ typedef struct Port { static Port ports[UMAD_MAX_PORTS]; +typedef struct ib_mad_addr_abi_5 { + uint32_t qpn; + uint32_t qkey; + uint16_t lid; + uint8_t sl; + uint8_t path_bits; + uint8_t grh_present; + uint8_t gid_index; + uint8_t hop_limit; + uint8_t traffic_class; + uint8_t gid[16]; + uint32_t flow_label; +} ib_mad_addr_abi_5_t; + +typedef struct ib_user_mad_abi_5 { + uint32_t agent_id; + uint32_t status; + uint32_t timeout_ms; + uint32_t retries; + uint32_t length; + ib_mad_addr_abi_5_t addr; + uint8_t data[0]; +} ib_user_mad_abi_5_t; + + /************************************* * Port */ @@ -463,6 +489,101 @@ dev_to_umad_id(char *dev, uint port) return -1; /* not found */ } +static int +write_data(int fd, void *data, int size) +{ + int n; + + n = write(fd, data, size); + if (n != size) { + DEBUG("write returned %d != sizeof mad data %d (%m)", n, size); + if (!errno) + errno = EIO; + return -EIO; + } + + return 0; +} + +static int +write_abi_5(int fd, struct ib_user_mad *mad, int length) +{ + struct ib_user_mad_abi_5 *umad_5; + int n; + + n = sizeof *umad_5 + length; + umad_5 = malloc(n); + if (!umad_5) { + errno = ENOMEM; + return -ENOMEM; + } + + memcpy(umad_5, mad, sizeof *umad_5); + memcpy(umad_5->data, mad->data, length); + + n = write_data(fd, umad_5, n); + free(umad_5); + return n; +} + +static int +read_data(int fd, void *data, int size, int *length) +{ + struct ib_user_mad *mad = data; + int n, umad_size; + + umad_size = size - *length; + + n = read(fd, data, size); + if ((n >= 0) && (n <= size)) { + DEBUG("mad received by agent %d length %d", mad->agent_id, n); + if (n > umad_size) + *length = n - umad_size; + else + *length = 0; + return mad->agent_id; + } + + if (n == -EWOULDBLOCK) { + if (!errno) + errno = EWOULDBLOCK; + return n; + } + + DEBUG("read returned %zu > sizeof mad %zu (%m)", + mad->length - umad_size, *length); + + *length = mad->length - umad_size; + if (!errno) + errno = EIO; + return -errno; +} + +static int +read_abi_5(int fd, void *umad, int *length) +{ + struct ib_user_mad *mad = umad; + struct ib_user_mad_abi_5 *umad_5; + int n; + + n = sizeof *umad_5 + *length; + umad_5 = malloc(n); + if (!umad_5) { + errno = EINVAL; + return -EINVAL; + } + + n = read_data(fd, umad_5, n, length); + if (n >= 0) { + memcpy(mad, umad_5, sizeof *umad_5); + mad->addr.pkey_index = 0; + memcpy(mad->data, umad_5->data, *length); + } + + free(umad_5); + return n; +} + /******************************* * Public interface */ @@ -470,17 +591,19 @@ dev_to_umad_id(char *dev, uint port) int umad_init(void) { - uint abi_version; - TRACE("umad_init"); if (sys_read_uint(IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, &abi_version) < 0) { IBWARN("can't read ABI version from %s/%s (%m): is ib_umad module loaded?", IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE); return -1; } - if (abi_version != IB_UMAD_ABI_VERSION) { - IBWARN("wrong ABI version: %s/%s is %d but library ABI is %d", - IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, abi_version, IB_UMAD_ABI_VERSION); + + if (abi_version < IB_UMAD_MIN_ABI_VERSION || + abi_version > IB_UMAD_MAX_ABI_VERSION) { + IBWARN("wrong ABI version: %s/%s is %d but library ABI " + "supports %d through %d", + IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, abi_version, + IB_UMAD_MIN_ABI_VERSION, IB_UMAD_MAX_ABI_VERSION); return -1; } return 0; @@ -699,11 +822,16 @@ umad_set_grh(void *umad, void *mad_addr) } int -umad_set_pkey(void *umad, int pkey) +umad_set_pkey(void *umad, int pkey_index) { -#if 0 - mad->addr.pkey = 0; /* FIXME - PKEY support */ -#endif + struct ib_user_mad *mad = umad; + + if (abi_version == 5 && pkey_index != 0) { + IBWARN("umad_set_pkey: ABI 5 only supports pkey_index 0\n"); + return -EINVAL; + } + + mad->addr.pkey_index = pkey_index; return 0; } @@ -761,15 +889,12 @@ umad_send(int portid, int agentid, void *umad, int length, if (umaddebug > 1) umad_dump(mad); - n = write(port->dev_fd, mad, length + sizeof *mad); - if (n == length + sizeof *mad) - return 0; + if (abi_version == 5) + n = write_abi_5(port->dev_fd, mad, length); + else + n = write_data(port->dev_fd, mad, sizeof *mad + length); - DEBUG("write returned %d != sizeof umad %zu + length %d (%m)", - n, sizeof *mad, length); - if (!errno) - errno = EIO; - return -EIO; + return n; } static int @@ -793,7 +918,6 @@ dev_poll(int fd, int timeout_ms) int umad_recv(int portid, void *umad, int *length, int timeout_ms) { - struct ib_user_mad *mad = umad; Port *port; int n; @@ -817,29 +941,13 @@ umad_recv(int portid, void *umad, int *length, int timeout_ms) return n; } - n = read(port->dev_fd, umad, sizeof *mad + *length); - if ((n >= 0) && (n <= sizeof *mad + *length)) { - DEBUG("mad received by agent %d length %d", mad->agent_id, n); - if (n > sizeof *mad) - *length = n - sizeof *mad; - else - *length = 0; - return mad->agent_id; - } - - if (n == -EWOULDBLOCK) { - if (!errno) - errno = EWOULDBLOCK; - return n; - } - - DEBUG("read returned %zu > sizeof umad %zu + length %d (%m)", - mad->length - sizeof *mad, sizeof *mad, *length); + if (abi_version == 5) + n = read_abi_5(port->dev_fd, umad, length); + else + n = read_data(port->dev_fd, umad, + sizeof(struct ib_user_mad) + *length, length); - *length = mad->length - sizeof *mad; - if (!errno) - errno = EIO; - return -errno; + return n; } int @@ -996,10 +1104,10 @@ umad_addr_dump(ib_mad_addr_t *addr) gid_str[i*2] = 0; IBWARN("qpn %d qkey 0x%x lid 0x%x sl %d\n" "grh_present %d gid_index %d hop_limit %d traffic_class %d flow_label 0x%x\n" - "Gid 0x%s", + "Gid 0x%s pkey_index %d", ntohl(addr->qpn), ntohl(addr->qkey), ntohs(addr->lid), addr->sl, addr->grh_present, (int)addr->gid_index, (int)addr->hop_limit, - (int)addr->traffic_class, addr->flow_label, gid_str); + (int)addr->traffic_class, addr->flow_label, gid_str, addr->pkey_index); } void From sean.hefty at intel.com Fri Jun 15 10:01:05 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 15 Jun 2007 10:01:05 -0700 Subject: [ofa-general] [PATCH 2/2] opensm: use pkey index, rather than pkey with libibumad In-Reply-To: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com> Message-ID: <000901c7af6e$c2d79480$ff0da8c0@amr.corp.intel.com> The call to umad_set_pkey expects an index, not a pkey. Use index 0 for now. Signed-off-by: Sean Hefty --- This was the one place I found where the pkey was being passed into umad_set_pkey(). opensm/libvendor/osm_vendor_ibumad.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/opensm/libvendor/osm_vendor_ibumad.c b/opensm/libvendor/osm_vendor_ibumad.c index ee94203..a10388c 100644 --- a/opensm/libvendor/osm_vendor_ibumad.c +++ b/opensm/libvendor/osm_vendor_ibumad.c @@ -1086,7 +1086,8 @@ osm_vendor_send( p_mad_addr->addr_type.gsi.service_level, IB_QP1_WELL_KNOWN_Q_KEY); umad_set_grh(p_vw->umad, 0); /* FIXME: GRH support */ - umad_set_pkey(p_vw->umad, p_mad_addr->addr_type.gsi.pkey); + umad_set_pkey(p_vw->umad, 0); + /* FIXME: p_mad_addr->addr_type.gsi.pkey to index */ if (ib_class_is_rmpp(p_mad->mgmt_class)) { /* RMPP GSI classes FIXME: no GRH */ if (!ib_rmpp_is_flag_set((ib_rmpp_mad_t *)p_sa, IB_RMPP_FLAG_ACTIVE)) { From halr at voltaire.com Fri Jun 15 10:29:27 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Jun 2007 13:29:27 -0400 Subject: [ofa-general] Re: [PATCH] osm: bugfix - if fat-tree failed, osm should fall back to default routing In-Reply-To: <4670FB71.5090406@dev.mellanox.co.il> References: <4670FB71.5090406@dev.mellanox.co.il> Message-ID: <1181928567.5681.367677.camel@hal.voltaire.com> Hi Yevgeny, On Thu, 2007-06-14 at 04:25, Yevgeny Kliteynik wrote: > Hi Hal, > > When fat-tree fails to populate all the data structures, > it should return error and let osm fall back to default routing. > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied. -- Hal From mshefty at ichips.intel.com Fri Jun 15 11:24:19 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 15 Jun 2007 11:24:19 -0700 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <20070615160709.GK2207@mellanox.co.il> References: <466F36C8.5010507@linux.vnet.ibm.com> <20070613163821.GB12277@mellanox.co.il> <20070613174930.GE12277@mellanox.co.il> <46716F3D.7050206@ichips.intel.com> <20070614175030.GB29561@mellanox.co.il> <4671C541.4040503@linux.vnet.ibm.com> <20070615051846.GG2207@mellanox.co.il> <4672B523.50502@ichips.intel.com> <20070615160709.GK2207@mellanox.co.il> Message-ID: <4672D953.3050506@ichips.intel.com> > Basically, I think that because of lack of SW level flow control, > generally IPoIB CM without SRQ does not make sense because of > the scalabilty problems. Most clusters are only 16-32 nodes. If IPoIB CM without SRQ can support this number of systems and outperforms IPoIB UD mode, then I do believe that it makes sense. IPoIB CM support, with or without SRQ, is less scalable than IPoIB UD mode, but it was still added because it provided a benefit under most conditions. - Sean From friedman at ucla.edu Fri Jun 15 12:01:04 2007 From: friedman at ucla.edu (Scott A. Friedman) Date: Fri, 15 Jun 2007 12:01:04 -0700 Subject: [ofa-general] Re: iWarp cxgb3 firmware In-Reply-To: <20070615162126.5D7D4E60886@openfabrics.org> References: <20070615162126.5D7D4E60886@openfabrics.org> Message-ID: <4672E1F0.9010406@ucla.edu> > Scott A. Friedman wrote: >> > Hi >> > >> > Is anyone using the cxgb3 module in rc4 or rc5? If so, where are you >> > getting the correct firmware that it seems to want (4.2)? Chelsio is >> > only distributing v4.1 on their web site. I would like to know since my >> > iWarp nodes are currently stuck at rc3, whose cxgb3 needs version 4.0 >> > >> > Do these firmware versions make significant changes? >> > > > Unfortunately, yes, they do. -rc4 and beyond requires firwmare version > 4.2 to fix some streaming mode->rdma mode connection transition fixes. > And the interface between the driver and firmare changed which is why > the requirement is there. -rc4 and beyond _will not_ work with anything > less than 4.2 I pushed the changes into -rc4 to get them in before > ofed-1.2 ships as this was a critical bug. > > The 4.2 firmware will be available this week from Chelsio. Contact your > chelsio rep to get it. Perhaps you can get a pre-release version today... > Thanks Steve, this explains a lot - all of my trouble have been connection related. The connection stage would either work, or not, or hang. I will contact them and try again to get the firmware - or wait... Scott From panda at cse.ohio-state.edu Fri Jun 15 12:01:54 2007 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri, 15 Jun 2007 15:01:54 -0400 (EDT) Subject: [ofa-general] Announcing the availability of MVAPICH support for QLogic InfiniPath adapters Message-ID: <200706151901.l5FJ1sMk008862@xi.cse.ohio-state.edu> The MVAPICH team is pleased to announce the availability of MVAPICH native support for QLogic InfiniPath adapters. Sample performance numbers include: - Opteron single-core with HT and InfiniPath-SDR: - 1.26 microsec one-way latency (4 bytes) - 953 MB/sec unidirectional bandwidth - 1889 MB/sec bidirectional bandwidth - EM64T quad-core with PCIe and InfiniPath-SDR: - 1.91 microsec one-way latency (4 bytes) - 957 MB/sec unidirectional bandwidth - 1565 MB/sec bidirectional bandwidth More detailed performance numbers can be viewed by visiting `Performance' section of the project's web page. For downloading this new support and accessing the anonymous SVN, please visit the following URL: http://mvapich.cse.ohio-state.edu/ Please post your feedback to mvapich-discuss mailing list. Thanks, MVAPICH Team ====================================================================== MVAPICH/MVAPICH2 project is currently supported with funding from U.S. National Science Foundation, U.S. DOE Office of Science, Mellanox, Intel, Cisco Systems, QLogic, Sun Microsystems and Linux Networx; and with equipment support from Advanced Clustering, AMD, Apple, Appro, Chelsio, Dell, Fujitsu, Fulcrum, IBM, Intel, Mellanox, Microway, NetEffect, QLogic and Sun Microsystems. Other technology partner includes Etnus. ====================================================================== From halr at voltaire.com Fri Jun 15 13:01:37 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Jun 2007 16:01:37 -0400 Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support In-Reply-To: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com> References: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com> Message-ID: <1181937695.5681.377979.camel@hal.voltaire.com> On Fri, 2007-06-15 at 12:59, Sean Hefty wrote: > Allow sending MADs on different partitions. This requires kernel support, > so requires an ABI bump. This patch maintains support for the previous > ABI. Looks good. A few minor questions/comments embedded below. > Clarify that umad_set_pkey() takes a pkey index, and not the pkey itself. > (Unfortunately, the call is used both ways in the management tree.) Sigh... and opensm (actually libvendor) is the one which uses this incorrectly. I'm worried about existing OpenSM compatibility with the new libibumad when ABI 6 is in effect. I think the long standing ABI 5 should be fine, right ? > Signed-off-by: Sean Hefty > --- > Additional changes are needed to retrieve the PKey and GID tables, so that > the PKeys and GIDs can be converted to the correct index. These will come > in future patches. > > > doc/libibumad.txt | 2 > libibumad/include/infiniband/umad.h | 7 + > libibumad/src/umad.c | 192 +++++++++++++++++++++++++++-------- > 3 files changed, 156 insertions(+), 45 deletions(-) > > diff --git a/doc/libibumad.txt b/doc/libibumad.txt > index 7b2b4f4..4e37e60 100644 > --- a/doc/libibumad.txt > +++ b/doc/libibumad.txt > @@ -336,7 +336,7 @@ the given host ordered fields. Return 0 on success, -1 on errors. > umad_set_pkey: > > Synopsis: > - int umad_set_pkey(void *umad, int pkey); > + int umad_set_pkey(void *umad, int pkey_index); > > Description: Set the pkey within the 'umad' buffer. Return 0 on success, > -1 on errors. > diff --git a/libibumad/include/infiniband/umad.h b/libibumad/include/infiniband/umad.h > old mode 100644 > new mode 100755 > index 9020649..9369d95 > --- a/libibumad/include/infiniband/umad.h > +++ b/libibumad/include/infiniband/umad.h > @@ -60,6 +60,8 @@ typedef struct ib_mad_addr { > uint8_t traffic_class; > uint8_t gid[16]; > uint32_t flow_label; > + uint16_t pkey_index; > + uint8_t reserved[6]; > } ib_mad_addr_t; > > typedef struct ib_user_mad { > @@ -72,7 +74,8 @@ typedef struct ib_user_mad { > uint8_t data[0]; > } ib_user_mad_t; > > -#define IB_UMAD_ABI_VERSION 5 > +#define IB_UMAD_MIN_ABI_VERSION 5 > +#define IB_UMAD_MAX_ABI_VERSION 6 > #define IB_UMAD_ABI_DIR "/sys/class/infiniband_mad" > #define IB_UMAD_ABI_FILE "abi_version" > > @@ -167,7 +170,7 @@ int umad_set_grh_net(void *umad, void *mad_addr); > int umad_set_grh(void *umad, void *mad_addr); > int umad_set_addr_net(void *umad, int dlid, int dqp, int sl, int qkey); > int umad_set_addr(void *umad, int dlid, int dqp, int sl, int qkey); > -int umad_set_pkey(void *umad, int pkey); > +int umad_set_pkey(void *umad, int pkey_index); > > int umad_send(int portid, int agentid, void *umad, int length, > int timeout_ms, int retries); > diff --git a/libibumad/src/umad.c b/libibumad/src/umad.c > old mode 100644 > new mode 100755 Why the mode change ? > index 5f9b36b..c750fe0 > --- a/libibumad/src/umad.c > +++ b/libibumad/src/umad.c > @@ -69,6 +69,7 @@ int umaddebug = 0; > #define UMAD_DEV_NAME_SZ 32 > #define UMAD_DEV_FILE_SZ 256 > > +static uint abi_version; > static char *def_ca_name = "mthca0"; > static int def_ca_port = 1; > > @@ -82,6 +83,31 @@ typedef struct Port { > > static Port ports[UMAD_MAX_PORTS]; > > +typedef struct ib_mad_addr_abi_5 { > + uint32_t qpn; > + uint32_t qkey; > + uint16_t lid; > + uint8_t sl; > + uint8_t path_bits; > + uint8_t grh_present; > + uint8_t gid_index; > + uint8_t hop_limit; > + uint8_t traffic_class; > + uint8_t gid[16]; > + uint32_t flow_label; > +} ib_mad_addr_abi_5_t; > + > +typedef struct ib_user_mad_abi_5 { > + uint32_t agent_id; > + uint32_t status; > + uint32_t timeout_ms; > + uint32_t retries; > + uint32_t length; > + ib_mad_addr_abi_5_t addr; > + uint8_t data[0]; > +} ib_user_mad_abi_5_t; > + > + > /************************************* > * Port > */ > @@ -463,6 +489,101 @@ dev_to_umad_id(char *dev, uint port) > return -1; /* not found */ > } > > +static int > +write_data(int fd, void *data, int size) > +{ > + int n; > + > + n = write(fd, data, size); > + if (n != size) { > + DEBUG("write returned %d != sizeof mad data %d (%m)", n, size); Is this really the sizeof the mad data ? > + if (!errno) > + errno = EIO; > + return -EIO; > + } > + > + return 0; > +} > + > +static int > +write_abi_5(int fd, struct ib_user_mad *mad, int length) > +{ > + struct ib_user_mad_abi_5 *umad_5; > + int n; > + > + n = sizeof *umad_5 + length; > + umad_5 = malloc(n); > + if (!umad_5) { > + errno = ENOMEM; > + return -ENOMEM; > + } > + > + memcpy(umad_5, mad, sizeof *umad_5); > + memcpy(umad_5->data, mad->data, length); > + > + n = write_data(fd, umad_5, n); > + free(umad_5); > + return n; > +} > + > +static int > +read_data(int fd, void *data, int size, int *length) > +{ > + struct ib_user_mad *mad = data; > + int n, umad_size; > + > + umad_size = size - *length; > + > + n = read(fd, data, size); > + if ((n >= 0) && (n <= size)) { > + DEBUG("mad received by agent %d length %d", mad->agent_id, n); > + if (n > umad_size) > + *length = n - umad_size; > + else > + *length = 0; > + return mad->agent_id; > + } > + > + if (n == -EWOULDBLOCK) { > + if (!errno) > + errno = EWOULDBLOCK; > + return n; > + } > + > + DEBUG("read returned %zu > sizeof mad %zu (%m)", > + mad->length - umad_size, *length); > + > + *length = mad->length - umad_size; > + if (!errno) > + errno = EIO; > + return -errno; > +} > + > +static int > +read_abi_5(int fd, void *umad, int *length) > +{ > + struct ib_user_mad *mad = umad; > + struct ib_user_mad_abi_5 *umad_5; > + int n; > + > + n = sizeof *umad_5 + *length; > + umad_5 = malloc(n); > + if (!umad_5) { > + errno = EINVAL; > + return -EINVAL; > + } > + > + n = read_data(fd, umad_5, n, length); > + if (n >= 0) { > + memcpy(mad, umad_5, sizeof *umad_5); > + mad->addr.pkey_index = 0; > + memcpy(mad->data, umad_5->data, *length); > + } > + > + free(umad_5); > + return n; > +} > + > /******************************* > * Public interface > */ > @@ -470,17 +591,19 @@ dev_to_umad_id(char *dev, uint port) > int > umad_init(void) > { > - uint abi_version; > - > TRACE("umad_init"); > if (sys_read_uint(IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, &abi_version) < 0) { > IBWARN("can't read ABI version from %s/%s (%m): is ib_umad module loaded?", > IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE); > return -1; > } > - if (abi_version != IB_UMAD_ABI_VERSION) { > - IBWARN("wrong ABI version: %s/%s is %d but library ABI is %d", > - IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, abi_version, IB_UMAD_ABI_VERSION); > + > + if (abi_version < IB_UMAD_MIN_ABI_VERSION || > + abi_version > IB_UMAD_MAX_ABI_VERSION) { > + IBWARN("wrong ABI version: %s/%s is %d but library ABI " > + "supports %d through %d", > + IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, abi_version, > + IB_UMAD_MIN_ABI_VERSION, IB_UMAD_MAX_ABI_VERSION); > return -1; > } > return 0; > @@ -699,11 +822,16 @@ umad_set_grh(void *umad, void *mad_addr) > } > > int > -umad_set_pkey(void *umad, int pkey) > +umad_set_pkey(void *umad, int pkey_index) > { > -#if 0 > - mad->addr.pkey = 0; /* FIXME - PKEY support */ > -#endif > + struct ib_user_mad *mad = umad; > + > + if (abi_version == 5 && pkey_index != 0) { > + IBWARN("umad_set_pkey: ABI 5 only supports pkey_index 0\n"); > + return -EINVAL; > + } > + > + mad->addr.pkey_index = pkey_index; > return 0; > } > > @@ -761,15 +889,12 @@ umad_send(int portid, int agentid, void *umad, int length, > if (umaddebug > 1) > umad_dump(mad); > > - n = write(port->dev_fd, mad, length + sizeof *mad); > - if (n == length + sizeof *mad) > - return 0; > + if (abi_version == 5) > + n = write_abi_5(port->dev_fd, mad, length); > + else > + n = write_data(port->dev_fd, mad, sizeof *mad + length); > > - DEBUG("write returned %d != sizeof umad %zu + length %d (%m)", > - n, sizeof *mad, length); > - if (!errno) > - errno = EIO; > - return -EIO; > + return n; > } > > static int > @@ -793,7 +918,6 @@ dev_poll(int fd, int timeout_ms) > int > umad_recv(int portid, void *umad, int *length, int timeout_ms) > { > - struct ib_user_mad *mad = umad; > Port *port; > int n; > > @@ -817,29 +941,13 @@ umad_recv(int portid, void *umad, int *length, int timeout_ms) > return n; > } > > - n = read(port->dev_fd, umad, sizeof *mad + *length); > - if ((n >= 0) && (n <= sizeof *mad + *length)) { > - DEBUG("mad received by agent %d length %d", mad->agent_id, n); > - if (n > sizeof *mad) > - *length = n - sizeof *mad; > - else > - *length = 0; > - return mad->agent_id; > - } > - > - if (n == -EWOULDBLOCK) { > - if (!errno) > - errno = EWOULDBLOCK; > - return n; > - } > - > - DEBUG("read returned %zu > sizeof umad %zu + length %d (%m)", > - mad->length - sizeof *mad, sizeof *mad, *length); > + if (abi_version == 5) > + n = read_abi_5(port->dev_fd, umad, length); > + else > + n = read_data(port->dev_fd, umad, > + sizeof(struct ib_user_mad) + *length, length); > > - *length = mad->length - sizeof *mad; > - if (!errno) > - errno = EIO; > - return -errno; > + return n; > } > > int > @@ -996,10 +1104,10 @@ umad_addr_dump(ib_mad_addr_t *addr) > gid_str[i*2] = 0; > IBWARN("qpn %d qkey 0x%x lid 0x%x sl %d\n" > "grh_present %d gid_index %d hop_limit %d traffic_class %d flow_label 0x%x\n" > - "Gid 0x%s", > + "Gid 0x%s pkey_index %d", > ntohl(addr->qpn), ntohl(addr->qkey), ntohs(addr->lid), addr->sl, > addr->grh_present, (int)addr->gid_index, (int)addr->hop_limit, > - (int)addr->traffic_class, addr->flow_label, gid_str); > + (int)addr->traffic_class, addr->flow_label, gid_str, addr->pkey_index); > } > > void > From halr at voltaire.com Fri Jun 15 13:39:20 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Jun 2007 16:39:20 -0400 Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node guid files options for fat-tree In-Reply-To: <20070614134519.GD5908@sashak.voltaire.com> References: <4670FA2D.7070708@dev.mellanox.co.il> <20070614121501.GC5908@sashak.voltaire.com> <4671363F.6060600@dev.mellanox.co.il> <20070614134519.GD5908@sashak.voltaire.com> Message-ID: <1181939959.5681.380508.camel@hal.voltaire.com> On Thu, 2007-06-14 at 09:45, Sasha Khapyorsky wrote: > On 15:36 Thu 14 Jun , Yevgeny Kliteynik wrote: > > Sasha Khapyorsky wrote: > > > Hi Yevgeny, > > > On 11:19 Thu 14 Jun , Yevgeny Kliteynik wrote: > > >> The following three patches are adding root and compute node guid files > > >> options for fat-tree routing, > > > Is there any reason to not share root guids file option with up/down? > > > > There are two new options for fat-tree: roots and compute nodes (CN). > > These two will be very "tightly coupled" and would have more implication > > on the routing than in case of up/dn roots. For instance, having root > > file but not CN file means that the topology doesn't have to be pure > > fat-tree, > > but all the CAs are considered CNs and have to be on the same level of the > > tree. > > And there is similar implication of all the combinations of these two > > options. > > > > Because of this coupling I wanted to differentiate these two options from > > the up/dn roots. > > > > Thoughts? > > I still not have strong option about two options against common one. Me neither. > Hypothetically if in some days we will implement routing engine chains > (so failed algo will fallback to next in chain and not just to default) > separate options could be useful. So is this a(nother) reason to keep the roots separate or would that be dealt with when the routing fallback strategy changes ? -- Hal > > > Also the way how root guids are handled (in both up/down and ftree) > > > doesn't look very optimal - guids are loaded to dynamic list, the list > > > is converted to map, this map is matched and root nodes are marked as > > > roots. Isn't it would be easy just to mark root nodes during file parsing? > > > > The only thing you can save here is converting list to map: > > I don't think the root guids map is needed - you can just set is_root > field for sw nodes by guid(s) specified in the file, since you already > have sw by guid map. > > > You have to parse the guids file anyway, and you have to build all the > > fat-tree data structures anyway. So if you parse the file and fill the > > map right away instead of filling the list first, you will save the list2map > > conversion. > > But then up/dn and fat-tree can't use the same function to parse the guid > > file, > > and since the list2map conversion is not a big deal (we're talking about > > list > > of roots, which is couple of hundreds of guids at max), I prefer to leave it > > and not to use separate parsing functions for up/dn and fat-tree. > > You can pass custom callback to common parser. > > > BTW, since we're on this subject, how about removing the list2array > > conversion > > in the same place in up/dn routing? > > Sure, similar junk should be cleaned up in up/down too (and my original > complain was about both root guids users). > > Sasha From sashak at voltaire.com Fri Jun 15 13:59:58 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 15 Jun 2007 23:59:58 +0300 Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node guid files options for fat-tree In-Reply-To: <1181939959.5681.380508.camel@hal.voltaire.com> References: <4670FA2D.7070708@dev.mellanox.co.il> <20070614121501.GC5908@sashak.voltaire.com> <4671363F.6060600@dev.mellanox.co.il> <20070614134519.GD5908@sashak.voltaire.com> <1181939959.5681.380508.camel@hal.voltaire.com> Message-ID: <20070615205958.GB10766@sashak.voltaire.com> On 16:39 Fri 15 Jun , Hal Rosenstock wrote: > On Thu, 2007-06-14 at 09:45, Sasha Khapyorsky wrote: > > On 15:36 Thu 14 Jun , Yevgeny Kliteynik wrote: > > > Sasha Khapyorsky wrote: > > > > Hi Yevgeny, > > > > On 11:19 Thu 14 Jun , Yevgeny Kliteynik wrote: > > > >> The following three patches are adding root and compute node guid files > > > >> options for fat-tree routing, > > > > Is there any reason to not share root guids file option with up/down? > > > > > > There are two new options for fat-tree: roots and compute nodes (CN). > > > These two will be very "tightly coupled" and would have more implication > > > on the routing than in case of up/dn roots. For instance, having root > > > file but not CN file means that the topology doesn't have to be pure > > > fat-tree, > > > but all the CAs are considered CNs and have to be on the same level of the > > > tree. > > > And there is similar implication of all the combinations of these two > > > options. > > > > > > Because of this coupling I wanted to differentiate these two options from > > > the up/dn roots. > > > > > > Thoughts? > > > > I still not have strong option about two options against common one. > > Me neither. > > > Hypothetically if in some days we will implement routing engine chains > > (so failed algo will fallback to next in chain and not just to default) > > separate options could be useful. > > So is this a(nother) reason to keep the roots separate or would that be > dealt with when the routing fallback strategy changes ? It is yet hypothetical. Currently I don't see a strong practical reasons to have two separate root guids file options for up/down and fat-tree, but guess this is minor and not showstopper. Sasha From halr at voltaire.com Fri Jun 15 13:57:20 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Jun 2007 16:57:20 -0400 Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node guid files options for fat-tree In-Reply-To: <20070615205958.GB10766@sashak.voltaire.com> References: <4670FA2D.7070708@dev.mellanox.co.il> <20070614121501.GC5908@sashak.voltaire.com> <4671363F.6060600@dev.mellanox.co.il> <20070614134519.GD5908@sashak.voltaire.com> <1181939959.5681.380508.camel@hal.voltaire.com> <20070615205958.GB10766@sashak.voltaire.com> Message-ID: <1181941040.5681.381698.camel@hal.voltaire.com> On Fri, 2007-06-15 at 16:59, Sasha Khapyorsky wrote: > On 16:39 Fri 15 Jun , Hal Rosenstock wrote: > > On Thu, 2007-06-14 at 09:45, Sasha Khapyorsky wrote: > > > On 15:36 Thu 14 Jun , Yevgeny Kliteynik wrote: > > > > Sasha Khapyorsky wrote: > > > > > Hi Yevgeny, > > > > > On 11:19 Thu 14 Jun , Yevgeny Kliteynik wrote: > > > > >> The following three patches are adding root and compute node guid files > > > > >> options for fat-tree routing, > > > > > Is there any reason to not share root guids file option with up/down? > > > > > > > > There are two new options for fat-tree: roots and compute nodes (CN). > > > > These two will be very "tightly coupled" and would have more implication > > > > on the routing than in case of up/dn roots. For instance, having root > > > > file but not CN file means that the topology doesn't have to be pure > > > > fat-tree, > > > > but all the CAs are considered CNs and have to be on the same level of the > > > > tree. > > > > And there is similar implication of all the combinations of these two > > > > options. > > > > > > > > Because of this coupling I wanted to differentiate these two options from > > > > the up/dn roots. > > > > > > > > Thoughts? > > > > > > I still not have strong option about two options against common one. > > > > Me neither. > > > > > Hypothetically if in some days we will implement routing engine chains > > > (so failed algo will fallback to next in chain and not just to default) > > > separate options could be useful. > > > > So is this a(nother) reason to keep the roots separate or would that be > > dealt with when the routing fallback strategy changes ? > > It is yet hypothetical. Currently I don't see a strong practical reasons > to have two separate root guids file options for up/down and fat-tree, > but guess this is minor and not showstopper. Wouldn't a current practical reason be switching between up/down and fat tree and they each have different roots ? Is that a real scenario ? -- Hal > Sasha From wombat2 at us.ibm.com Fri Jun 15 14:04:16 2007 From: wombat2 at us.ibm.com (Bernard King-Smith) Date: Fri, 15 Jun 2007 17:04:16 -0400 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <20070615190004.4BCB8E6086F@openfabrics.org> Message-ID: "Sean Hefty" wrote on 06/15/2007 03:00:04 PM: > > > Basically, I think that because of lack of SW level flow control, > > generally IPoIB CM without SRQ does not make sense because of > > the scalability problems. > > Most clusters are only 16-32 nodes. If IPoIB CM without SRQ can support > this number of systems and outperforms IPoIB UD mode, then I do believe > that it makes sense. IPoIB CM support, with or without SRQ, is less > scalable than IPoIB UD mode, but it was still added because it provided > a benefit under most conditions. I think Pradeep has been making this very clear all along and that scaling is a restriction we can make. Since SRQ is not a required part of the spec, then having support for non-SRQ in the IPoIB-CM driver supports the minimal requirements. I think it is typical that any driver that supports enhancements from a basic spec has exception handling for both cases ( base and enhanced ) in the layer in question (ipoib). Putting it in the device driver splits the non-SRQ IPoIB support to two layers which is not a good idea. We are already running with the non-SRQ patch here and the results are very good. Changing to a different approach is not the right thing to do at this time. Emulating in the device driver will only increase the amount of work everyone will have to do to get this out, and runs the risk of uncovering more complex problems. Can we close on the last few issues and get this lined up for OFED 1.3? > > - Sean Regards. Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Fri Jun 15 14:36:21 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 15 Jun 2007 14:36:21 -0700 Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support In-Reply-To: <1181937695.5681.377979.camel@hal.voltaire.com> References: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com> <1181937695.5681.377979.camel@hal.voltaire.com> Message-ID: <46730655.7020808@ichips.intel.com> > Sigh... and opensm (actually libvendor) is the one which uses this > incorrectly. I'm worried about existing OpenSM compatibility with the > new libibumad when ABI 6 is in effect. I think the long standing ABI 5 > should be fine, right ? ABI 5 should be fine, since the pkey isn't actually passed to the kernel. ABI 6 would pass down the wrong index. I do print a warning if umad_set_pkey() is called with an index != 0, but we can remove that. >> old mode 100644 >> new mode 100755 > > Why the mode change ? This is just my editor being dumb, and me forgetting to tell git to ignore mode changes. >> + n = write(fd, data, size); >> + if (n != size) { >> + DEBUG("write returned %d != sizeof mad data %d (%m)", n, size); > > Is this really the sizeof the mad data ? This is not the size of the data field in a MAD. It's the size of the write = sizeof ib_user_mad + MAD data length. I can change the comment to clarify. - Sean From jwong at datallegro.com Fri Jun 15 14:49:38 2007 From: jwong at datallegro.com (Jeffrey Wong) Date: Fri, 15 Jun 2007 17:49:38 -0400 Subject: [ofa-general] Trouble installing OFED 1.2-rc5, kernel SUSE 10.2.6.21-5 default x86_64 Message-ID: Hello, I'm getting the following error when trying to install OFED 1.2-rc5: configure: error: libpci not found. Failed to execute: cd /var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/tvflash && env ac_cv_lib_ibverbs_ibv_get_device_list=yes ac_cv_he ader_infiniband_driver_h=yes ac_cv_func_ibv_read_sysfs_file=yes ac_cv_func_ibv_dontfork_range=yes ac_cv_func_ibv_dofork_range=yes ac_cv_f unc_ibv_register_driver=yes HAVE_IBV_DEVICE_LIBRARY_EXTENSION_TRUE=yes ./configure --cache-file=/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/conf igure.cache --disable-libcheck --prefix /usr --libdir /usr/lib64 --mandir=/usr/share/man --sysconfdir=/etc CPPFLAGS="-I../libibverbs/incl ude" error: Bad exit status from /var/tmp/rpm-tmp.30970 (%install) RPM build errors: user vlad does not exist - using root group vlad does not exist - using root user vlad does not exist - using root group vlad does not exist - using root Bad exit status from /var/tmp/rpm-tmp.30970 (%install) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFE D' --define 'configure_options --with-dapl --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-lib ibumad --with-libibverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-sdpnetstat --with-srptoo ls --with-mstflint --with-perftest --with-tvflash --sysconfdir=/etc --mandir=/usr/share/man' --define 'configure_options32 %{nil} --sysco nfdir=/etc --mandir=/usr/share/man' --define 'build_32bit 0' --define '_mandir /usr/share/man' /root/OFED-1.2-rc5/SRPMS/ofa_user-1.2-rc5. src.rpm" Thanks in advance, Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Fri Jun 15 14:50:54 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Fri, 15 Jun 2007 14:50:54 -0700 Subject: [ofa-general] Trouble installing OFED 1.2-rc5, kernel SUSE 10.2.6.21-5 default x86_64 In-Reply-To: References: Message-ID: https://bugs.openfabrics.org/show_bug.cgi?id=558 , has been fixed since rc5. ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Jeffrey Wong Sent: Friday, June 15, 2007 2:50 PM To: general at lists.openfabrics.org Subject: [ofa-general] Trouble installing OFED 1.2-rc5,kernel SUSE 10.2.6.21-5 default x86_64 Hello, I'm getting the following error when trying to install OFED 1.2-rc5: configure: error: libpci not found. Failed to execute: cd /var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/tvflash && env ac_cv_lib_ibverbs_ibv_get_device_list=yes ac_cv_he ader_infiniband_driver_h=yes ac_cv_func_ibv_read_sysfs_file=yes ac_cv_func_ibv_dontfork_range=yes ac_cv_func_ibv_dofork_range=yes ac_cv_f unc_ibv_register_driver=yes HAVE_IBV_DEVICE_LIBRARY_EXTENSION_TRUE=yes ./configure --cache-file=/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/conf igure.cache --disable-libcheck --prefix /usr --libdir /usr/lib64 --mandir=/usr/share/man --sysconfdir=/etc CPPFLAGS="-I../libibverbs/incl ude" error: Bad exit status from /var/tmp/rpm-tmp.30970 (%install) RPM build errors: user vlad does not exist - using root group vlad does not exist - using root user vlad does not exist - using root group vlad does not exist - using root Bad exit status from /var/tmp/rpm-tmp.30970 (%install) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFE D' --define 'configure_options --with-dapl --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-lib ibumad --with-libibverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-sdpnetstat --with-srptoo ls --with-mstflint --with-perftest --with-tvflash --sysconfdir=/etc --mandir=/usr/share/man' --define 'configure_options32 %{nil} --sysco nfdir=/etc --mandir=/usr/share/man' --define 'build_32bit 0' --define '_mandir /usr/share/man' /root/OFED-1.2-rc5/SRPMS/ofa_user-1.2-rc5. src.rpm" Thanks in advance, Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Fri Jun 15 15:00:15 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Jun 2007 18:00:15 -0400 Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support In-Reply-To: <46730655.7020808@ichips.intel.com> References: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com> <1181937695.5681.377979.camel@hal.voltaire.com> <46730655.7020808@ichips.intel.com> Message-ID: <1181944814.5681.385918.camel@hal.voltaire.com> On Fri, 2007-06-15 at 17:36, Sean Hefty wrote: > > Sigh... and opensm (actually libvendor) is the one which uses this > > incorrectly. I'm worried about existing OpenSM compatibility with the > > new libibumad when ABI 6 is in effect. I think the long standing ABI 5 > > should be fine, right ? > > ABI 5 should be fine, since the pkey isn't actually passed to the > kernel. ABI 6 would pass down the wrong index. Right. > I do print a warning if > umad_set_pkey() is called with an index != 0, but we can remove that. How about if abi_version == 5, setting pkey_index to 0 regardless of what is set ? Isn't that all that ABI v5 really supports ? > >> old mode 100644 > >> new mode 100755 > > > > Why the mode change ? > > This is just my editor being dumb, and me forgetting to tell git to > ignore mode changes. OK. > >> + n = write(fd, data, size); > >> + if (n != size) { > >> + DEBUG("write returned %d != sizeof mad data %d (%m)", n, size); > > > > Is this really the sizeof the mad data ? > > This is not the size of the data field in a MAD. It's the size of the > write = sizeof ib_user_mad + MAD data length. I can change the comment > to clarify. Thanks. -- Hal > - Sean From jwong at datallegro.com Fri Jun 15 14:59:23 2007 From: jwong at datallegro.com (Jeffrey Wong) Date: Fri, 15 Jun 2007 17:59:23 -0400 Subject: [ofa-general] Trouble installing OFED 1.2-rc5, kernel SUSE 10.2.6.21-5 default x86_64 References: Message-ID: Thanks I've downloaded the latest build. Now I get a different error: Make ipoibtools started make -C src/userspace/ipoibtools make[1]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools' gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -g -include include-glibc/glibc-bugs.h -I/lib/modules/2.6.21.5-default/build/include arping.c -lresolv -o arping gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -g -include include-glibc/glibc-bugs.h -I/lib/modules/2.6.21.5-default/build/include mcasthandle.c -lresolv -o mcasthandle make[1]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools' make -C src/userspace/ipoibtools/iproute2 ip make[1]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools/iproute2' make -w -C lib make[2]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools/iproute2/lib' gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -I../include -DRESOLVE_HOSTNAMES -c -o ll_map.o ll_map.c gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -I../include -DRESOLVE_HOSTNAMES -c -o libnetlink.o libnetlink.c ar rcs libnetlink.a ll_map.o libnetlink.o gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -I../include -DRESOLVE_HOSTNAMES -c -o utils.o utils.c utils.c: In function ‘inet_addr_matchÂ’: utils.c:333: warning: initialization discards qualifiers from pointer target type utils.c:334: warning: initialization discards qualifiers from pointer target type utils.c: In function ‘__get_hzÂ’: utils.c:368: error: ‘HZÂ’ undeclared (first use in this function) utils.c:368: error: (Each undeclared identifier is reported only once utils.c:368: error: for each function it appears in.) make[2]: *** [utils.o] Error 1 make[2]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools/iproute2/lib' make[1]: *** [lib] Error 2 make[1]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools/iproute2' make: *** [ipoibtools] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.18693 (%install) RPM build errors: user vlad does not exist - using root group vlad does not exist - using root user vlad does not exist - using root group vlad does not exist - using root Bad exit status from /var/tmp/rpm-tmp.18693 (%install) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-dapl --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-sdpnetstat --with-srptools --with-mstflint --with-perftest --with-tvflash --sysconfdir=/etc --mandir=/usr/share/man' --define 'configure_options32 %{nil} --sysconfdir=/etc --mandir=/usr/share/man' --define 'build_32bit 0' --define '_mandir /usr/share/man' /root/OFED-1.2-20070615-0600/SRPMS/ofa_user-1.2-rc5.src.rpm" -----Original Message----- From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Sent: Fri 6/15/2007 5:50 PM To: Jeffrey Wong; general at lists.openfabrics.org Subject: RE: [ofa-general] Trouble installing OFED 1.2-rc5,kernel SUSE 10.2.6.21-5 default x86_64 https://bugs.openfabrics.org/show_bug.cgi?id=558 , has been fixed since rc5. ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Jeffrey Wong Sent: Friday, June 15, 2007 2:50 PM To: general at lists.openfabrics.org Subject: [ofa-general] Trouble installing OFED 1.2-rc5,kernel SUSE 10.2.6.21-5 default x86_64 Hello, I'm getting the following error when trying to install OFED 1.2-rc5: configure: error: libpci not found. Failed to execute: cd /var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/tvflash && env ac_cv_lib_ibverbs_ibv_get_device_list=yes ac_cv_he ader_infiniband_driver_h=yes ac_cv_func_ibv_read_sysfs_file=yes ac_cv_func_ibv_dontfork_range=yes ac_cv_func_ibv_dofork_range=yes ac_cv_f unc_ibv_register_driver=yes HAVE_IBV_DEVICE_LIBRARY_EXTENSION_TRUE=yes ./configure --cache-file=/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/conf igure.cache --disable-libcheck --prefix /usr --libdir /usr/lib64 --mandir=/usr/share/man --sysconfdir=/etc CPPFLAGS="-I../libibverbs/incl ude" error: Bad exit status from /var/tmp/rpm-tmp.30970 (%install) RPM build errors: user vlad does not exist - using root group vlad does not exist - using root user vlad does not exist - using root group vlad does not exist - using root Bad exit status from /var/tmp/rpm-tmp.30970 (%install) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFE D' --define 'configure_options --with-dapl --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-lib ibumad --with-libibverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-sdpnetstat --with-srptoo ls --with-mstflint --with-perftest --with-tvflash --sysconfdir=/etc --mandir=/usr/share/man' --define 'configure_options32 %{nil} --sysco nfdir=/etc --mandir=/usr/share/man' --define 'build_32bit 0' --define '_mandir /usr/share/man' /root/OFED-1.2-rc5/SRPMS/ofa_user-1.2-rc5. src.rpm" Thanks in advance, Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From jwong at datallegro.com Fri Jun 15 15:07:07 2007 From: jwong at datallegro.com (Jeffrey Wong) Date: Fri, 15 Jun 2007 18:07:07 -0400 Subject: [ofa-general] Trouble installing OFED 1.2-rc5, kernel SUSE 10.2.6.21-5 default x86_64 - sdpnetstat Message-ID: Hello, I'm getting the following errors when trying to build sdp: /var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/sdpnetstat/include -c -o inet_gr.o inet_gr.c cc -D_GNU_SOURCE -O2 -Wall -g -I. -idirafter ./include/ -Ilib -I/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/sdpnetstat -idirafter /var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/sdpnetstat/include -c -o inet_sr.o inet_sr.c inet_sr.c: In function ‘INET_setrouteÂ’: inet_sr.c:201: error: ‘HZÂ’ undeclared (first use in this function) inet_sr.c:201: error: (Each undeclared identifier is reported only once inet_sr.c:201: error: for each function it appears in.) make[2]: *** [inet_sr.o] Error 1 make[2]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/sdpnetstat/lib' make[1]: *** [subdirs] Error 2 make[1]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/sdpnetstat' make: *** [sdpnetstat] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.34022 (%install) RPM build errors: user vlad does not exist - using root group vlad does not exist - using root user vlad does not exist - using root group vlad does not exist - using root Bad exit status from /var/tmp/rpm-tmp.34022 (%install) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-dapl --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-sdpnetstat --with-srptools --with-mstflint --with-perftest --with-tvflash --sysconfdir=/etc --mandir=/usr/share/man' --define 'configure_options32 %{nil} --sysconfdir=/etc --mandir=/usr/share/man' --define 'build_32bit 0' --define '_mandir /usr/share/man' /root/OFED-1.2-20070615-0600/SRPMS/ofa_user-1.2-rc5.src.rpm" Thanks in advance, Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Fri Jun 15 15:11:32 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 15 Jun 2007 15:11:32 -0700 Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support In-Reply-To: <1181944814.5681.385918.camel@hal.voltaire.com> Message-ID: <000c01c7af9a$2267a270$3ccc180a@amr.corp.intel.com> >How about if abi_version == 5, setting pkey_index to 0 regardless of >what is set ? Isn't that all that ABI v5 really supports ? This is what ends up happening. The problem is that the pkey_index is set to 0 by the kernel code. Nothing it passed down from userspace. I added the warning to umad_set_pkey() to notify the user that the value that they're trying to set is ignored. - Sean From jwong at datallegro.com Fri Jun 15 15:18:26 2007 From: jwong at datallegro.com (Jeffrey Wong) Date: Fri, 15 Jun 2007 18:18:26 -0400 Subject: [ofa-general] Latest OFED builds everything except ipoibtools and sdpnetstat modules for SLES 10.2.6-21-5 Message-ID: -------------- next part -------------- An HTML attachment was scrubbed... URL: From sashak at voltaire.com Fri Jun 15 15:29:34 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 16 Jun 2007 01:29:34 +0300 Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node guid files options for fat-tree In-Reply-To: <1181941040.5681.381698.camel@hal.voltaire.com> References: <4670FA2D.7070708@dev.mellanox.co.il> <20070614121501.GC5908@sashak.voltaire.com> <4671363F.6060600@dev.mellanox.co.il> <20070614134519.GD5908@sashak.voltaire.com> <1181939959.5681.380508.camel@hal.voltaire.com> <20070615205958.GB10766@sashak.voltaire.com> <1181941040.5681.381698.camel@hal.voltaire.com> Message-ID: <20070615222934.GC10766@sashak.voltaire.com> On 16:57 Fri 15 Jun , Hal Rosenstock wrote: > On Fri, 2007-06-15 at 16:59, Sasha Khapyorsky wrote: > > On 16:39 Fri 15 Jun , Hal Rosenstock wrote: > > > On Thu, 2007-06-14 at 09:45, Sasha Khapyorsky wrote: > > > > On 15:36 Thu 14 Jun , Yevgeny Kliteynik wrote: > > > > > Sasha Khapyorsky wrote: > > > > > > Hi Yevgeny, > > > > > > On 11:19 Thu 14 Jun , Yevgeny Kliteynik wrote: > > > > > >> The following three patches are adding root and compute node guid files > > > > > >> options for fat-tree routing, > > > > > > Is there any reason to not share root guids file option with up/down? > > > > > > > > > > There are two new options for fat-tree: roots and compute nodes (CN). > > > > > These two will be very "tightly coupled" and would have more implication > > > > > on the routing than in case of up/dn roots. For instance, having root > > > > > file but not CN file means that the topology doesn't have to be pure > > > > > fat-tree, > > > > > but all the CAs are considered CNs and have to be on the same level of the > > > > > tree. > > > > > And there is similar implication of all the combinations of these two > > > > > options. > > > > > > > > > > Because of this coupling I wanted to differentiate these two options from > > > > > the up/dn roots. > > > > > > > > > > Thoughts? > > > > > > > > I still not have strong option about two options against common one. > > > > > > Me neither. > > > > > > > Hypothetically if in some days we will implement routing engine chains > > > > (so failed algo will fallback to next in chain and not just to default) > > > > separate options could be useful. > > > > > > So is this a(nother) reason to keep the roots separate or would that be > > > dealt with when the routing fallback strategy changes ? > > > > It is yet hypothetical. Currently I don't see a strong practical reasons > > to have two separate root guids file options for up/down and fat-tree, > > but guess this is minor and not showstopper. > > Wouldn't a current practical reason be switching between up/down and fat > tree and they each have different roots ? Is that a real scenario ? Sure (but guess in many cases selected roots will be same for both algos). I think this scenario will be handled well with single shared option, like: opensm -R ftree --roots-file ftree-roots-file , and opensm -R updn --roots-file updn-roots-file Sasha From halr at voltaire.com Fri Jun 15 15:23:00 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Jun 2007 18:23:00 -0400 Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support In-Reply-To: <000c01c7af9a$2267a270$3ccc180a@amr.corp.intel.com> References: <000c01c7af9a$2267a270$3ccc180a@amr.corp.intel.com> Message-ID: <1181946179.5681.387443.camel@hal.voltaire.com> On Fri, 2007-06-15 at 18:11, Sean Hefty wrote: > >How about if abi_version == 5, setting pkey_index to 0 regardless of > >what is set ? Isn't that all that ABI v5 really supports ? > > This is what ends up happening. The problem is that the pkey_index is set to 0 > by the kernel code. Nothing it passed down from userspace. I added the warning > to umad_set_pkey() to notify the user that the value that they're trying to set > is ignored. Oh, right. So the question is whether we want that error message over and over again. Sorry for being slow... Also, should the index requested be validated (within range) or is that handled with some error coming back from the lower levels when this is misspecified ? -- Hal > - Sean From sean.hefty at intel.com Fri Jun 15 15:33:40 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 15 Jun 2007 15:33:40 -0700 Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support In-Reply-To: <1181946179.5681.387443.camel@hal.voltaire.com> Message-ID: <000d01c7af9d$395e8310$3ccc180a@amr.corp.intel.com> >Also, should the index requested be validated (within range) or is that >handled with some error coming back from the lower levels when this is >misspecified ? umad_set_pkey() can't fully validate the index, since it doesn't know what port the mad will be used on. The send will eventually fail if an invalid pkey index is used, but checking could be added earlier in the send path. - Sean From sweitzen at cisco.com Fri Jun 15 15:36:36 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Fri, 15 Jun 2007 15:36:36 -0700 Subject: [ofa-general] Latest OFED builds everything except ipoibtools andsdpnetstat modules for SLES 10.2.6-21-5 In-Reply-To: References: Message-ID: Are you on SUSE 10 or SLES 10? SLES 10 has 2.6.16 kernels AFAIK. Scott ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Jeffrey Wong Sent: Friday, June 15, 2007 3:18 PM To: general at lists.openfabrics.org Subject: [ofa-general] Latest OFED builds everything except ipoibtools andsdpnetstat modules for SLES 10.2.6-21-5 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jwong at datallegro.com Fri Jun 15 15:39:29 2007 From: jwong at datallegro.com (Jeffrey Wong) Date: Fri, 15 Jun 2007 18:39:29 -0400 Subject: [ofa-general] Latest OFED builds everything except ipoibtools andsdpnetstat modules for SLES 10.2.6-21-5 In-Reply-To: Message-ID: Sorry I meant SUSE 10.2.6-21-5. Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Fri Jun 15 21:06:17 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 15 Jun 2007 21:06:17 -0700 Subject: [ofa-general] [ANNOUNCE] libibverbs 1.1.1 released Message-ID: I just tagged the 1.1.1 release of libibverbs and pushed it out to my git tree on kernel.org: git://git.kernel.org/pub/scm/libs/infiniband/libibverbs.git (the name of the tag is libibverbs-1.1.1). I've also copied a tarball to openfabrics.org, and it should appear eventually in . The sha1sum of the release is: eac666bf1080deef6e0d52810c83aa5611683828 libibverbs-1.1.1.tar.gz The most significant change since libibverbs 1.1 is fixing the initialization of new QPs' state to RESET. Without this fix, there will be problems using libmlx4 and ConnectX HCAs. I also fixed an annoying bug in the pingpong example programs that caused a crash at the end of a run in ibv_free_device_list() if a device name other than the first device present is specified. The git shortlog since libibverbs 1.1 is: Dotan Barak (1): ibv_devinfo: Decode max_vl_num to actual number Jack Morgenstein (1): Initialize QP state to RESET Michael S. Tsirkin (1): Don't warn root if RLIMIT_MEMLOCK is low Roland Dreier (5): Update Debian build Trivial whitespace fixes in examples/ Fix call to ibv_free_device_list() in pingpong examples Add wc_wmb() Roll libibverbs 1.1.1 release From rdreier at cisco.com Fri Jun 15 21:06:58 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 15 Jun 2007 21:06:58 -0700 Subject: [ofa-general] [ANNOUNCE] libibverbs 1.0.5 released Message-ID: I just tagged the 1.0.5 release of libibverbs and pushed it out to my git tree on kernel.org: git://git.kernel.org/pub/scm/libs/infiniband/libibverbs.git (the name of the tag is libibverbs-1.0.5). I've also copied a tarball to openfabrics.org, and it should appear eventually in . The sha1sum of the release is: 1c3537729774df8c7b7e31128fb28075681694ee libibverbs-1.0.5.tar.gz This is a maintenance release to flush out pending fixes for users of the old 1.0 stable branch. However, the 1.1 branch of libibverbs is considered stable and suitable for all users. The git shortlog since libibverbs 1.0.4 is: Dotan Barak (2): Handle asprintf memory allocation failures Check asprintf() return in pingpong examples Jack Morgenstein (1): Initialize QP state to RESET Roland Dreier (9): Add final Debian changelog for libibverbs 1.0.4 Bump version number Remove svn keywords Check return of calloc() in ibv_get_device_list() Fix checks of asprintf() return value The ibv_cmd_* create functions need to set context Revert "The ibv_cmd_* create functions need to set context" Fix ibv_srq_pingpong option handling Roll libibverbs 1.0.5 release From wfdnz at psd.k12.co.us Sat Jun 16 01:05:40 2007 From: wfdnz at psd.k12.co.us (Dinah K. Nash) Date: Sat, 16 Jun 2007 04:05:40 -0400 Subject: [ofa-general] I think I can only decide after hearing the evidence you may bring up, otherwise I will resort to the Atlas of Creation where you can find hundreds of evidence on the side of Creation. Message-ID: <467399D4.6080204@psd.k12.co.us> SREA Gets In On $75 Million Project. Investors Respond! Score One Inc. SREA $0.20 UP 33% Investors are hyped about this new project. It will not only bring increased revenues to Score but increased exposure on an international project like this. Read the news and get on SREA firs thing Monday! Advertisement Related BlogsThe Thyroid BlogMarie Lee's BlogHonestMedicineThe Health Care BlogMost PopularQuiz: Could You Be Hypothyroid? We are fighting what religions are doing, and we can make specific cases and level specific charges. If organized religions were truly about helping society and not just growing their wealth and membership numbers I might have a little bit more respect for them. Perhaps a little humility would help, but I doubt that it will ever happen. It is simply a set of arbitrary rules for a particular society. From vlad at lists.openfabrics.org Sat Jun 16 02:42:08 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Sat, 16 Jun 2007 02:42:08 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070616-0200 daily build status Message-ID: <20070616094208.32983E60836@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on ia64 with linux-2.6.13 Passed on powerpc with linux-2.6.19 Passed on ia64 with linux-2.6.16 Passed on x86_64 with linux-2.6.12 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.20 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.14 Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From mst at dev.mellanox.co.il Sat Jun 16 12:27:02 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sat, 16 Jun 2007 22:27:02 +0300 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <4672C0DC.8060308@linux.vnet.ibm.com> References: <466F36C8.5010507@linux.vnet.ibm.com> <20070613163821.GB12277@mellanox.co.il> <20070613174930.GE12277@mellanox.co.il> <46716F3D.7050206@ichips.intel.com> <20070614175030.GB29561@mellanox.co.il> <4671C541.4040503@linux.vnet.ibm.com> <20070615051846.GG2207@mellanox.co.il> <4672C0DC.8060308@linux.vnet.ibm.com> Message-ID: <20070616192702.GM2207@mellanox.co.il> > We need to make some decisions Earlier, Roland suggested: > However it may be a good approach to put an abstraction layer in IPoIB > so that the CM code can use an SRQ-like interface to both HCAs that > support SRQ and HCAs that don't. And I think this might be a good approach, too - and maybe this layer could be general enough to be reusable in other ULPs later. -- MST From swise at opengridcomputing.com Sat Jun 16 13:52:08 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Sat, 16 Jun 2007 15:52:08 -0500 Subject: [ofa-general] critical fixes for chelsio iwarp driver Message-ID: <46744D78.9040602@opengridcomputing.com> Tziporet, I'll be posting 2 fixes soon that I'd like included in ofed-1.2. Bugs 663 and 664. These bugs cause crashes that force a reboot of the system and should be considered stop-ship for ofed-1.2. Thanks, Steve. From mst at dev.mellanox.co.il Sat Jun 16 22:57:23 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 17 Jun 2007 08:57:23 +0300 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <4672D953.3050506@ichips.intel.com> References: <20070613174930.GE12277@mellanox.co.il> <46716F3D.7050206@ichips.intel.com> <20070614175030.GB29561@mellanox.co.il> <4671C541.4040503@linux.vnet.ibm.com> <20070615051846.GG2207@mellanox.co.il> <4672B523.50502@ichips.intel.com> <20070615160709.GK2207@mellanox.co.il> <4672D953.3050506@ichips.intel.com> Message-ID: <20070617055649.GN2207@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [ofa-general] Re: [PATCH draft,?untested] ehca srq emulation (for IPoIB CM) > > >Basically, I think that because of lack of SW level flow control, > >generally IPoIB CM without SRQ does not make sense because of > >the scalabilty problems. > > Most clusters are only 16-32 nodes. > > If IPoIB CM without SRQ can support > this number of systems and outperforms IPoIB UD mode, then I do believe > that it makes sense. Note that e.g. with mthca regular QP has lower overhead than SRQ (less locking, etc). So if your assumption on the number of nodes in IB clusters is generally correct, we need a generic layer that will start with regular QPs for a small number of connections, then switch to SRQ as the number of connections grows (and to datagram mode if SRQ is not available). > IPoIB CM support, with or without SRQ, is less > scalable than IPoIB UD mode, I believe this is incorrect: datagram mode has AH per destination, connected mode has a QP per destination, so with SRQ, I see no inherent lack of scalability with connected as compared to datagram mode. > but it was still added because it provided > a benefit under most conditions. -- MST From ogerlitz at voltaire.com Sun Jun 17 02:17:24 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 17 Jun 2007 12:17:24 +0300 (IDT) Subject: [ofa-general] disconnect implementation for rdma cm unconnected datagram service Message-ID: Hi Sean, Looking on cm_sidr_rep_handler we see that the cm id state is reseted to IB_CM_IDLE, and on the other hand ib_send_cm_dreq returns -EINVAL if the id state is not IB_CM_ESTABLISHED. I gueess this means that rdma_disconnect on RDMA_PS_UDP would never work? Now, even with fixing that, the disconnect packets can get lost or the remote side can reboot/etc before the CM manages to send the DREQ packet/s. Thinking on remote qp/lid change, the equivalent I see for UDP based apps, is that a remote qp/lid change would have been caught by the local stack neighbouring system since it sends few unicast arps probes and the re-issues a broadcast arp from which the new HW address (qpn / gid --> lid) would be learned. What you think would be the correct way to solve that for rdmacm based apps? is there a way for the RDMA/IB stack level to provide the solution? we were considering few alternatives but they all at the app level (eg send probes to the remote qp/lid, add another RC connection just for the sake of knowing the remote process is still there, etc). I guess that remote lid change can be emulated as disconnect if the rdmacm would listen on IN/OUT traps, but the question if what can we do about the remote process qp, eg in the case the process dies and then comes back again etc. thanks, Or. From kliteyn at dev.mellanox.co.il Sun Jun 17 02:28:20 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 17 Jun 2007 12:28:20 +0300 Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node guid files options for fat-tree In-Reply-To: <20070615222934.GC10766@sashak.voltaire.com> References: <4670FA2D.7070708@dev.mellanox.co.il> <20070614121501.GC5908@sashak.voltaire.com> <4671363F.6060600@dev.mellanox.co.il> <20070614134519.GD5908@sashak.voltaire.com> <1181939959.5681.380508.camel@hal.voltaire.com> <20070615205958.GB10766@sashak.voltaire.com> <1181941040.5681.381698.camel@hal.voltaire.com> <20070615222934.GC10766@sashak.voltaire.com> Message-ID: <4674FEB4.4000108@dev.mellanox.co.il> Sasha Khapyorsky wrote: > On 16:57 Fri 15 Jun , Hal Rosenstock wrote: >> On Fri, 2007-06-15 at 16:59, Sasha Khapyorsky wrote: >>> On 16:39 Fri 15 Jun , Hal Rosenstock wrote: >>>> On Thu, 2007-06-14 at 09:45, Sasha Khapyorsky wrote: >>>>> On 15:36 Thu 14 Jun , Yevgeny Kliteynik wrote: >>>>>> Sasha Khapyorsky wrote: >>>>>>> Hi Yevgeny, >>>>>>> On 11:19 Thu 14 Jun , Yevgeny Kliteynik wrote: >>>>>>>> The following three patches are adding root and compute node guid files >>>>>>>> options for fat-tree routing, >>>>>>> Is there any reason to not share root guids file option with up/down? >>>>>> There are two new options for fat-tree: roots and compute nodes (CN). >>>>>> These two will be very "tightly coupled" and would have more implication >>>>>> on the routing than in case of up/dn roots. For instance, having root >>>>>> file but not CN file means that the topology doesn't have to be pure >>>>>> fat-tree, >>>>>> but all the CAs are considered CNs and have to be on the same level of the >>>>>> tree. >>>>>> And there is similar implication of all the combinations of these two >>>>>> options. >>>>>> >>>>>> Because of this coupling I wanted to differentiate these two options from >>>>>> the up/dn roots. >>>>>> >>>>>> Thoughts? >>>>> I still not have strong option about two options against common one. >>>> Me neither. >>>> >>>>> Hypothetically if in some days we will implement routing engine chains >>>>> (so failed algo will fallback to next in chain and not just to default) >>>>> separate options could be useful. >>>> So is this a(nother) reason to keep the roots separate or would that be >>>> dealt with when the routing fallback strategy changes ? >>> It is yet hypothetical. Currently I don't see a strong practical reasons >>> to have two separate root guids file options for up/down and fat-tree, >>> but guess this is minor and not showstopper. >> Wouldn't a current practical reason be switching between up/down and fat >> tree and they each have different roots ? Is that a real scenario ? > > Sure (but guess in many cases selected roots will be same for both > algos). I think that selected roots will always be same for both algos. I can't think of any topology that will require different set of roots for two algorithms that see the fabric as tree with routes going up and then down. > I think this scenario will be handled well with single shared > option, like: > > opensm -R ftree --roots-file ftree-roots-file > > , and > > opensm -R updn --roots-file updn-roots-file I agree with this. I will rework the patch and replace the updn_guid_file with root_guid_file, and add cn_guid_file. This also means that the OSM command line options -a or --add_guid_file will be replaced with -O or --root_guid_file, and we will have additional options for CN file: -C or --cn_guid_file Sounds OK? -- Yevgeny > > Sasha > From vlad at lists.openfabrics.org Sun Jun 17 02:43:19 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Sun, 17 Jun 2007 02:43:19 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070617-0200 daily build status Message-ID: <20070617094319.4F88DE60839@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.14 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.12 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.14 Passed on powerpc with linux-2.6.19 Passed on ia64 with linux-2.6.13 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.17 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.21.1 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.16 Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on powerpc with linux-2.6.15 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From kliteyn at dev.mellanox.co.il Sun Jun 17 04:11:54 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 17 Jun 2007 14:11:54 +0300 Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node guid files options for fat-tree In-Reply-To: <4674FEB4.4000108@dev.mellanox.co.il> References: <4670FA2D.7070708@dev.mellanox.co.il> <20070614121501.GC5908@sashak.voltaire.com> <4671363F.6060600@dev.mellanox.co.il> <20070614134519.GD5908@sashak.voltaire.com> <1181939959.5681.380508.camel@hal.voltaire.com> <20070615205958.GB10766@sashak.voltaire.com> <1181941040.5681.381698.camel@hal.voltaire.com> <20070615222934.GC10766@sashak.voltaire.com> <4674FEB4.4000108@dev.mellanox.co.il> Message-ID: <467516FA.9000605@dev.mellanox.co.il> Yevgeny Kliteynik wrote: > Sasha Khapyorsky wrote: >> On 16:57 Fri 15 Jun , Hal Rosenstock wrote: >>> On Fri, 2007-06-15 at 16:59, Sasha Khapyorsky wrote: >>>> On 16:39 Fri 15 Jun , Hal Rosenstock wrote: >>>>> On Thu, 2007-06-14 at 09:45, Sasha Khapyorsky wrote: >>>>>> On 15:36 Thu 14 Jun , Yevgeny Kliteynik wrote: >>>>>>> Sasha Khapyorsky wrote: >>>>>>>> Hi Yevgeny, >>>>>>>> On 11:19 Thu 14 Jun , Yevgeny Kliteynik wrote: >>>>>>>>> The following three patches are adding root and compute node >>>>>>>>> guid files >>>>>>>>> options for fat-tree routing, >>>>>>>> Is there any reason to not share root guids file option with >>>>>>>> up/down? >>>>>>> There are two new options for fat-tree: roots and compute nodes >>>>>>> (CN). >>>>>>> These two will be very "tightly coupled" and would have more >>>>>>> implication >>>>>>> on the routing than in case of up/dn roots. For instance, having >>>>>>> root >>>>>>> file but not CN file means that the topology doesn't have to be >>>>>>> pure fat-tree, >>>>>>> but all the CAs are considered CNs and have to be on the same >>>>>>> level of the tree. >>>>>>> And there is similar implication of all the combinations of >>>>>>> these two options. >>>>>>> >>>>>>> Because of this coupling I wanted to differentiate these two >>>>>>> options from >>>>>>> the up/dn roots. >>>>>>> >>>>>>> Thoughts? >>>>>> I still not have strong option about two options against common one. >>>>> Me neither. >>>>> >>>>>> Hypothetically if in some days we will implement routing engine >>>>>> chains >>>>>> (so failed algo will fallback to next in chain and not just to >>>>>> default) >>>>>> separate options could be useful. >>>>> So is this a(nother) reason to keep the roots separate or would >>>>> that be >>>>> dealt with when the routing fallback strategy changes ? >>>> It is yet hypothetical. Currently I don't see a strong practical >>>> reasons >>>> to have two separate root guids file options for up/down and fat-tree, >>>> but guess this is minor and not showstopper. >>> Wouldn't a current practical reason be switching between up/down and fat >>> tree and they each have different roots ? Is that a real scenario ? >> >> Sure (but guess in many cases selected roots will be same for both >> algos). > > I think that selected roots will always be same for both algos. > I can't think of any topology that will require different set of roots > for two algorithms that see the fabric as tree with routes going up and > then down. > >> I think this scenario will be handled well with single shared >> option, like: >> >> opensm -R ftree --roots-file ftree-roots-file >> >> , and >> >> opensm -R updn --roots-file updn-roots-file > > I agree with this. > I will rework the patch and replace the updn_guid_file with root_guid_file, > and add cn_guid_file. > > This also means that the OSM command line options -a or --add_guid_file > will be replaced with -O or --root_guid_file, and we will have additional > options for CN file: -C or --cn_guid_file Sorry, -C is already taken. I'm running out of letters here... :) Suggesting leaving 'a' for roots, and using 'u' for CNs: -a or --root_guid_file -u or --cn_guid_file -- Yevgeny > Sounds OK? > > -- Yevgeny >> >> Sasha >> > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From kliteyn at dev.mellanox.co.il Sun Jun 17 05:26:02 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 17 Jun 2007 15:26:02 +0300 Subject: [ofa-general] [PATCH] osm: adding root_guid_file and cn_guid_file OpenSM options Message-ID: <4675285A.6060309@dev.mellanox.co.il> Hi Hal, This patch replaces updn_guid_file in the Up/Down routing with root_guid_file for Up/Down and Fat-Tree routing, and adds a new option - cn_guid_file for Fat-Tree routing. OpenSM command line options for these two files are: '-a' or '--root_guid_file' for roots '-u' or '--cn_guid_file' for compute nodes Signed-off-by: Yevgeny Kliteynik --- opensm/include/opensm/osm_subnet.h | 12 +++++++++--- opensm/opensm/main.c | 29 ++++++++++++++++++++++------- opensm/opensm/osm_subnet.c | 25 ++++++++++++++++++------- opensm/opensm/osm_ucast_updn.c | 6 +++--- 4 files changed, 52 insertions(+), 20 deletions(-) diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index c62128b..a38fc49 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -278,7 +278,8 @@ typedef struct _osm_subn_opt char * routing_engine_name; char * lid_matrix_dump_file; char * ucast_dump_file; - char * updn_guid_file; + char * root_guid_file; + char * cn_guid_file; char * sa_db_file; boolean_t exit_on_fatal; boolean_t honor_guid2lid_file; @@ -452,8 +453,13 @@ typedef struct _osm_subn_opt * Name of the unicast routing dump file from where switch * forwarding tables will be loaded * -* updn_guid_file -* Pointer to name of the UPDN guid file given by User +* root_guid_file +* Name of the file that contains list of root guids that +* will be used by fat-tree or up/dn routing (provided by User) +* +* cn_guid_file +* Name of the file that contains list of compute node guids that +* will be used by fat-tree routing (provided by User) * * sa_db_file * Name of the SA database file. diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index 6b4cb4f..d17a994 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -189,8 +189,14 @@ show_usage(void) " This option specifies the name of the SA DB dump file\n" " from where SA database will be loaded.\n\n"); printf ("-a\n" - "--add_guid_file \n" - " Set the root nodes for the Up/Down routing algorithm\n" + "--root_guid_file \n" + " Set the root nodes for the Up/Down or Fat-Tree routing\n" + " algorithm to the guids provided in the given file (one\n" + " to a line)\n" + "\n"); + printf ("-u\n" + "--cn_guid_file \n" + " Set the compute nodes for the Fat-Tree routing algorithm\n" " to the guids provided in the given file (one to a line)\n" "\n"); printf( "-o\n" @@ -585,7 +591,7 @@ main( char *ignore_guids_file_name = NULL; uint32_t val; const char * const short_option = - "i:f:ed:g:l:L:s:t:a:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:"; + "i:f:ed:g:l:L:s:t:a:u:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:"; /* In the array below, the 2nd parameter specifies the number @@ -622,7 +628,8 @@ main( { "lid_matrix_file",1, NULL, 'M'}, { "ucast_file", 1, NULL, 'U'}, { "sadb_file", 1, NULL, 'S'}, - { "add_guid_file", 1, NULL, 'a'}, + { "root_guid_file",1, NULL, 'a'}, + { "cn_guid_file", 1, NULL, 'u'}, { "cache-options", 0, NULL, 'c'}, { "stay_on_fatal", 0, NULL, 'y'}, { "honor_guid2lid",0, NULL, 'x'}, @@ -886,10 +893,18 @@ main( case 'a': /* - Specifies port guids file + Specifies root guids file + */ + opt.root_guid_file = optarg; + printf (" Root Guid File: %s\n", opt.root_guid_file ); + break; + + case 'u': + /* + Specifies compute node guids file */ - opt.updn_guid_file = optarg; - printf (" UPDN Guid File: %s\n", opt.updn_guid_file ); + opt.cn_guid_file = optarg; + printf (" Compute Node Guid File: %s\n", opt.cn_guid_file ); break; case 'c': diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 736f49a..4e080ba 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -500,7 +500,8 @@ osm_subn_set_default_opt( p_opt->routing_engine_name = NULL; p_opt->lid_matrix_dump_file = NULL; p_opt->ucast_dump_file = NULL; - p_opt->updn_guid_file = NULL; + p_opt->root_guid_file = NULL; + p_opt->cn_guid_file = NULL; p_opt->sa_db_file = NULL; p_opt->exit_on_fatal = TRUE; p_opt->enable_quirks = FALSE; @@ -1323,8 +1324,12 @@ osm_subn_parse_conf_file( p_key, p_val, &p_opts->ucast_dump_file); __osm_subn_opts_unpack_charp( - "updn_guid_file", - p_key, p_val, &p_opts->updn_guid_file); + "root_guid_file", + p_key, p_val, &p_opts->root_guid_file); + + __osm_subn_opts_unpack_charp( + "cn_guid_file", + p_key, p_val, &p_opts->cn_guid_file); __osm_subn_opts_unpack_charp( "sa_db_file", @@ -1548,12 +1553,18 @@ osm_subn_write_conf_file( "# Ucast dump file name\n" "ucast_dump_file %s\n\n", p_opts->ucast_dump_file); - if (p_opts->updn_guid_file) + if (p_opts->root_guid_file) + fprintf( opts_file, + "# The file holding the root node guids (for fat-tree or Up/Down)\n" + "# One guid in each line\n" + "root_guid_file %s\n\n", + p_opts->root_guid_file); + if (p_opts->cn_guid_file) fprintf( opts_file, - "# The file holding the Up/Down root node guids\n" + "# The file holding the fat-tree compute node guids\n" "# One guid in each line\n" - "updn_guid_file %s\n\n", - p_opts->updn_guid_file); + "cn_guid_file %s\n\n", + p_opts->cn_guid_file); if (p_opts->sa_db_file) fprintf( opts_file, "# SA database file name\n" diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c index 2448246..af5ee4e 100644 --- a/opensm/opensm/osm_ucast_updn.c +++ b/opensm/opensm/osm_ucast_updn.c @@ -311,10 +311,10 @@ updn_init( Check the source for root node list, if file parse it, otherwise wait for a callback to activate auto detection */ - if (p_osm->subn.opt.updn_guid_file) + if (p_osm->subn.opt.root_guid_file) { status = osm_ucast_mgr_read_guid_file( &p_osm->sm.ucast_mgr, - p_osm->subn.opt.updn_guid_file, + p_osm->subn.opt.root_guid_file, p_updn->p_root_nodes ); if (status != IB_SUCCESS) goto Exit; @@ -323,7 +323,7 @@ updn_init( osm_log( &p_osm->log, OSM_LOG_DEBUG, "updn_init: " "UPDN - Fetching root nodes from file %s\n", - p_osm->subn.opt.updn_guid_file ); + p_osm->subn.opt.root_guid_file ); guid_iterator = cl_list_head(p_updn->p_root_nodes); while( guid_iterator != cl_list_end(p_updn->p_root_nodes) ) { -- 1.5.1.4 From sashak at voltaire.com Sun Jun 17 05:22:29 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 17 Jun 2007 15:22:29 +0300 Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node guid files options for fat-tree In-Reply-To: <467516FA.9000605@dev.mellanox.co.il> References: <4670FA2D.7070708@dev.mellanox.co.il> <20070614121501.GC5908@sashak.voltaire.com> <4671363F.6060600@dev.mellanox.co.il> <20070614134519.GD5908@sashak.voltaire.com> <1181939959.5681.380508.camel@hal.voltaire.com> <20070615205958.GB10766@sashak.voltaire.com> <1181941040.5681.381698.camel@hal.voltaire.com> <20070615222934.GC10766@sashak.voltaire.com> <4674FEB4.4000108@dev.mellanox.co.il> <467516FA.9000605@dev.mellanox.co.il> Message-ID: <1182082950.4517.9.camel@localhost> On Sun, 2007-06-17 at 14:11 +0300, Yevgeny Kliteynik wrote: > Yevgeny Kliteynik wrote: > > Sasha Khapyorsky wrote: > >> On 16:57 Fri 15 Jun , Hal Rosenstock wrote: > >>> On Fri, 2007-06-15 at 16:59, Sasha Khapyorsky wrote: > >>>> On 16:39 Fri 15 Jun , Hal Rosenstock wrote: > >>>>> On Thu, 2007-06-14 at 09:45, Sasha Khapyorsky wrote: > >>>>>> On 15:36 Thu 14 Jun , Yevgeny Kliteynik wrote: > >>>>>>> Sasha Khapyorsky wrote: > >>>>>>>> Hi Yevgeny, > >>>>>>>> On 11:19 Thu 14 Jun , Yevgeny Kliteynik wrote: > >>>>>>>>> The following three patches are adding root and compute node > >>>>>>>>> guid files > >>>>>>>>> options for fat-tree routing, > >>>>>>>> Is there any reason to not share root guids file option with > >>>>>>>> up/down? > >>>>>>> There are two new options for fat-tree: roots and compute nodes > >>>>>>> (CN). > >>>>>>> These two will be very "tightly coupled" and would have more > >>>>>>> implication > >>>>>>> on the routing than in case of up/dn roots. For instance, having > >>>>>>> root > >>>>>>> file but not CN file means that the topology doesn't have to be > >>>>>>> pure fat-tree, > >>>>>>> but all the CAs are considered CNs and have to be on the same > >>>>>>> level of the tree. > >>>>>>> And there is similar implication of all the combinations of > >>>>>>> these two options. > >>>>>>> > >>>>>>> Because of this coupling I wanted to differentiate these two > >>>>>>> options from > >>>>>>> the up/dn roots. > >>>>>>> > >>>>>>> Thoughts? > >>>>>> I still not have strong option about two options against common one. > >>>>> Me neither. > >>>>> > >>>>>> Hypothetically if in some days we will implement routing engine > >>>>>> chains > >>>>>> (so failed algo will fallback to next in chain and not just to > >>>>>> default) > >>>>>> separate options could be useful. > >>>>> So is this a(nother) reason to keep the roots separate or would > >>>>> that be > >>>>> dealt with when the routing fallback strategy changes ? > >>>> It is yet hypothetical. Currently I don't see a strong practical > >>>> reasons > >>>> to have two separate root guids file options for up/down and fat-tree, > >>>> but guess this is minor and not showstopper. > >>> Wouldn't a current practical reason be switching between up/down and fat > >>> tree and they each have different roots ? Is that a real scenario ? > >> > >> Sure (but guess in many cases selected roots will be same for both > >> algos). > > > > I think that selected roots will always be same for both algos. > > I can't think of any topology that will require different set of roots > > for two algorithms that see the fabric as tree with routes going up and > > then down. > > > >> I think this scenario will be handled well with single shared > >> option, like: > >> > >> opensm -R ftree --roots-file ftree-roots-file > >> > >> , and > >> > >> opensm -R updn --roots-file updn-roots-file > > > > I agree with this. > > I will rework the patch and replace the updn_guid_file with root_guid_file, > > and add cn_guid_file. > > > > This also means that the OSM command line options -a or --add_guid_file > > will be replaced with -O or --root_guid_file, and we will have additional > > options for CN file: -C or --cn_guid_file > > Sorry, -C is already taken. I'm running out of letters here... :) > Suggesting leaving 'a' for roots, and using 'u' for CNs: > > -a or --root_guid_file > -u or --cn_guid_file Looks perfect for me. Sasha From tziporet at mellanox.co.il Sun Jun 17 06:51:28 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 17 Jun 2007 16:51:28 +0300 Subject: [ofa-general] Re: [ewg] critical fixes for chelsio iwarp driver In-Reply-To: <46744D78.9040602@opengridcomputing.com> References: <46744D78.9040602@opengridcomputing.com> Message-ID: <46753C60.90008@mellanox.co.il> Steve Wise wrote: > Tziporet, > > I'll be posting 2 fixes soon that I'd like included in ofed-1.2. > > Bugs 663 and 664. These bugs cause crashes that force a reboot of the > system and should be considered stop-ship for ofed-1.2. > > Thanks, > > Steve. > OK - but make sure the patches are ready on Monday since we wish to do the GA release this week Tziporet From tziporet at mellanox.co.il Sun Jun 17 07:50:07 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 17 Jun 2007 17:50:07 +0300 Subject: [ofa-general] crash in ipoib In-Reply-To: References: Message-ID: <46754A1F.9060106@mellanox.co.il> Woodruff, Robert J wrote: > This looks like it fixed the panic. > > Should we try to put out a new RC with this latest ipoib fix ? > I really think we need it in the release. If we could get another RC out > today, > that would only delay the release by a couple of more days and we could > release on next Friday rather than wed. and still give people a week to > test the final RC. > > woody > > I agree we need this fix. I suggest we create RC6 once this and Steve fixes for 663 and 664 are fixed Lets close all details in the meeting tomorrow Tziporet From vlad at dev.mellanox.co.il Sun Jun 17 07:58:24 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Sun, 17 Jun 2007 17:58:24 +0300 Subject: [ofa-general] quick IPoIB config question In-Reply-To: <467009FC.3070402@scalableinformatics.com> References: <467009FC.3070402@scalableinformatics.com> Message-ID: <46754C10.4010801@dev.mellanox.co.il> Joe Landman wrote: > Hi folks: > > Built OFED-1.2-rc4 on OpenSuSE 10.2, works fine as long as I turn of > 32-bit build, and update to a 2.6.20 kernel. Installed the RPMs after > build, and the system appears to be fine/well behaved. Is there a > OFED-specific technique to have the ib0 interface configure at boot > time, after drivers load? This might be distribution specific. > > I created a file named /etc/sysconfig/network/ifcfg-ib0 which contained > > BOOTPROTO='static' > MTU='' > REMOTE_IPADDR='' > STARTMODE='onboot' > USERCONTROL='no' > NETMASK='255.255.0.0' > IPADDR='10.1.32.2' > DEVICE='ib0' > > Bringing the interface up with an 'ifconfig ib0 up' doesn't seem to > assign the IP address and netmask to it. > > Hence my question. Is there an OFED specific method of configuring this > (e.g. a config file I need to edit/create), or is it distribution > dependent? > > If I force the issue with an ifconfig, it looks like it works fine. This > is ok as a work around, and I can create an /etc/init.d/ib or similar to > force the issue. I would prefer to do this "the right way", and if > there is someone with guidance/pointers as to what that is, I would > prefer to follow that. > > Thanks. > > Joe > Hi Joe, You can do one of the following to set ib0 configuration from ifcfg-ib0: * ifup ib0 * /etc/init.d/openibd restart Regards, Vladimir From tziporet at mellanox.co.il Sun Jun 17 08:05:58 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 17 Jun 2007 18:05:58 +0300 Subject: [ofa-general] Re: ipoib / bonding and OFED In-Reply-To: <15ddcffd0706081420r79984701u4e385e28857cb68b@mail.gmail.com> References: <3857BB049D83424D9DB82753D37CEA55459C41@taurus.voltaire.com> <4657373E.2030903@hp.com> <465BDC90.5080305@voltaire.com> <466702A8.5080302@hp.com> <4667B5FD.4070600@voltaire.com> <15ddcffd0706081420r79984701u4e385e28857cb68b@mail.gmail.com> Message-ID: <46754DD6.2080807@mellanox.co.il> > On 6/7/07, *Scott Weitzenkamp (sweitzen)* > wrote: > > I don't know if I've said this in public, but I've stopped testing > ipoibtools HA as of OFED 1.2 rc2 and Cisco is only going to support > ib-bonding HA for our OFED 1.2 customers, as our testing has revealed > ib-bonding is more robust than ipoibtools. I know I said this to > Tziporet at Sonoma, and she seemed to agree we could eventually > remove > ipoibtools from OFED. > > > Scott, > > the ipoibtools will be removed from OFED 1.3 Tziporet From jackm at dev.mellanox.co.il Sun Jun 17 08:18:34 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Sun, 17 Jun 2007 18:18:34 +0300 Subject: [ofa-general] Re: [PATCH/RFC] IB/mlx4: Handle new FW requirement for send request prefetching In-Reply-To: References: <200706051602.14182.jackm@dev.mellanox.co.il> Message-ID: <200706171818.34690.jackm@dev.mellanox.co.il> On Wednesday 13 June 2007 20:29, Roland Dreier wrote: > I just queued this patch to handle new FW up. Please let me know if > it looks OK to you, and I will ask Linus to pull it. > > Thanks. > Looks good! - Jack > commit f22332295cb218ad12db2b521a34553ff5790c34 > Author: Roland Dreier > Date: Wed Jun 13 10:26:43 2007 -0700 > > IB/mlx4: Handle new FW requirement for send request prefetching > > New ConnectX firmware introduces FW command interface revision 2, > which requires that for each QP, a chunk of send queue entries (the > "headroom") is kept marked as invalid, so that the HCA doesn't get > confused if it prefetches entries that haven't been posted yet. Add > code to the driver to do this, and also update the user ABI so that > userspace can request that the prefetcher be turned off for userspace > QPs (we just leave the prefetcher on for all kernel QPs). > > Marking send queue entries this way is OK for older firmware too, so > we change the driver to allow FW command interface revisions 1 and 2. > > Based on a patch from Jack Morgenstein . > > Signed-off-by: Roland Dreier > > From jackm at dev.mellanox.co.il Sun Jun 17 08:24:47 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Sun, 17 Jun 2007 18:24:47 +0300 Subject: [ofa-general] Re: [PATCH/RFC] libmlx4: Handle new FW requirement for send request prefetching In-Reply-To: References: <200706051602.14182.jackm@dev.mellanox.co.il> Message-ID: <200706171824.47371.jackm@dev.mellanox.co.il> On Wednesday 13 June 2007 20:34, Roland Dreier wrote: > Similarly I just added this to libmlx4. The change to handle alignment > for inline send segments will be a separate patch, and I'm still > cleaning it up. Anyway, let me know if you see any problems with > this. > Looks good! (I like how you handled the (2K+ 1-wqe headroom). - Jack From landman at scalableinformatics.com Sun Jun 17 08:54:25 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Sun, 17 Jun 2007 11:54:25 -0400 Subject: [ofa-general] quick IPoIB config question In-Reply-To: <46754C10.4010801@dev.mellanox.co.il> References: <467009FC.3070402@scalableinformatics.com> <46754C10.4010801@dev.mellanox.co.il> Message-ID: <46755931.10802@scalableinformatics.com> Hi Vladimir: Vladimir Sokolovsky wrote: > Hi Joe, > You can do one of the following to set ib0 configuration from ifcfg-ib0: > > * ifup ib0 > > * /etc/init.d/openibd restart I had tried those, to no effect. When I rebooted, after chkconfig'ing openibd on, it didnt come up properly either. I had to force the issue in a special /etc/init.d/ipoib I created to come up after the openibd, where I sourced the /etc/sysconfig/network/ifcfg-ib0 file, and then did a simple "ifconfig" of ib0 after it. This was/is strange. The work-around is fine for the moment for this customer. I will see if I can dig in and file a better report on what happened. > > Regards, > Vladimir -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From swise at opengridcomputing.com Sun Jun 17 08:58:53 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Sun, 17 Jun 2007 10:58:53 -0500 Subject: [ofa-general] [GIT PULL ofed_1_2] iw_cxgb3 fixes for bugs 663/664 Message-ID: <46755A3D.1030300@opengridcomputing.com> Vlad, Please pull in these fixes for bugs 663/664 from git://git.openfabrics.org/~swise/ofed_1_2 ofed_1_2 Thanks, Steve. git-log commit bd3a007a1432ded7d5d538d2125249d111c2644f Author: Steve Wise Date: Sat Jun 16 15:48:28 2007 -0500 Don't count neg_adv abort_req_rss messages as real aborts. negative advice messages should _not_ count toward the 2 abort requests needed to indicate an abort request. Signed-off-by: Steve Wise diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index ed56d55..a654bd5 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -1464,6 +1464,13 @@ static int peer_abort(struct t3cdev *tde int ret; int state; + if (is_neg_adv_abort(req->status)) { + PDBG("%s neg_adv_abort ep %p tid %d\n", __FUNCTION__, ep, + ep->hwtid); + t3_l2t_send_event(ep->com.tdev, ep->l2t); + return CPL_RET_BUF_DONE; + } + /* * We get 2 peer aborts from the HW. The first one must * be ignored except for scribbling that we need one more. @@ -1473,13 +1480,6 @@ static int peer_abort(struct t3cdev *tde return CPL_RET_BUF_DONE; } - if (is_neg_adv_abort(req->status)) { - PDBG("%s neg_adv_abort ep %p tid %d\n", __FUNCTION__, ep, - ep->hwtid); - t3_l2t_send_event(ep->com.tdev, ep->l2t); - return CPL_RET_BUF_DONE; - } - state = state_read(&ep->com); PDBG("%s ep %p state %u\n", __FUNCTION__, ep, state); switch (state) { commit 70549f1d9b8d0420bd97111d3c73973723a88a9c Author: Steve Wise Date: Fri Jun 15 11:44:40 2007 -0500 TERMINATE WRs can hang the tx ofld queue. Don't set the gen bits nor length bits in the terminate wr. This is done by the LLD driver. Signed-off-by: Steve Wise diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index 162d1fa..431a7e8 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -630,9 +630,9 @@ int iwch_post_terminate(struct iwch_qp * /* immediate data starts here. */ term = (struct terminate_message *)wqe->send.sgl; build_term_codes(rsp_msg, &term->layer_etype, &term->ecode); - build_fw_riwrh((void *)wqe, T3_WR_SEND, - T3_COMPLETION_FLAG | T3_NOTIFY_FLAG, 1, - qhp->ep->hwtid, 5); + wqe->send.wrh.op_seop_flags = cpu_to_be32(V_FW_RIWR_OP(T3_WR_SEND) | + V_FW_RIWR_FLAGS(T3_COMPLETION_FLAG | T3_NOTIFY_FLAG)); + wqe->send.wrh.gen_tid_len = cpu_to_be32(V_FW_RIWR_TID(qhp->ep->hwtid)); skb->priority = CPL_PRIORITY_DATA; return (cxgb3_ofld_send(qhp->rhp->rdev.t3cdev_p, skb)); } From mst at dev.mellanox.co.il Sun Jun 17 12:10:56 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 17 Jun 2007 22:10:56 +0300 Subject: [ofa-general] [PATCH] IB/iser: make all fixes patches apply on full kernel source Message-ID: <20070617191056.GC27233@mellanox.co.il> Since we'll have RC6, I'd like to add the following cosmetic change: move iscsi_scsi_makefile from kernel_patches/fixes to ofed_scripts, and use a softlink to put in in place. The solves the following problem: if I do "git clone" on ofed tree without -n, and try to apply the fixes patchset, I get a conflict on iscsi_scsi_makefile.patch simply because the makefile that this patch attempts to create is part of upstream kernel already. I think this is 0-risk and carries real benefit for developers who'll need to support OFED 1.2. Erez, do you agree? If yes, I'll ask Tziporet to approve, too. Signed-off-by: Michael S. Tsirkin diff --git a/kernel_patches/fixes/iscsi_scsi_makefile.patch b/kernel_patches/fixes/iscsi_scsi_makefile.patch deleted file mode 100644 index 9c4fd01..0000000 --- a/kernel_patches/fixes/iscsi_scsi_makefile.patch +++ /dev/null @@ -1,10 +0,0 @@ -Add a Makefile based on the kernel's drivers/scsi/Makefile in order to build open-iscsi. - -Signed-off-by: Erez Zilber - -diff -ruN ofa_1_2_kernel-20061228-0200/drivers/scsi/Makefile ofa_1_2_kernel-20061228-0200-open-iscsi/drivers/scsi/Makefile ---- ofa_1_2_kernel-20061228-0200/drivers/scsi/Makefile 1970-01-01 02:00:00.000000000 +0200 -+++ ofa_1_2_kernel-20061228-0200-open-iscsi/drivers/scsi/Makefile 2006-12-28 17:01:22.000000000 +0200 -@@ -0,0 +1,2 @@ -+obj-$(CONFIG_SCSI_ISCSI_ATTRS) += scsi_transport_iscsi.o -+obj-$(CONFIG_ISCSI_TCP) += libiscsi.o iscsi_tcp.o diff --git a/ofed_scripts/iscsi_scsi_makefile b/ofed_scripts/iscsi_scsi_makefile new file mode 100644 index 0000000..cfdf3e0 --- /dev/null +++ b/ofed_scripts/iscsi_scsi_makefile @@ -0,0 +1,4 @@ +# Makefile based on the kernel's drivers/scsi/Makefile +# to build open-iscsi. +obj-$(CONFIG_SCSI_ISCSI_ATTRS) += scsi_transport_iscsi.o +obj-$(CONFIG_ISCSI_TCP) += libiscsi.o iscsi_tcp.o diff --git a/ofed_scripts/ofed_checkout.sh b/ofed_scripts/ofed_checkout.sh index 037b391..86fc8b8 100755 --- a/ofed_scripts/ofed_checkout.sh +++ b/ofed_scripts/ofed_checkout.sh @@ -43,3 +43,4 @@ ex git update-ref HEAD $1 ln -snf ofed_scripts/configure ln -snf ofed_scripts/Makefile ln -snf ofed_scripts/makefile +(cd drivers/scsi/; ln -snf ../../ofed_scripts/iscsi_scsi_makefile Makefile) -- MST From mst at dev.mellanox.co.il Sun Jun 17 14:02:14 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 18 Jun 2007 00:02:14 +0300 Subject: [ofa-general] ~mst/ofed_kernel.git updated to 2.6.22-rc5 Message-ID: <20070617210154.GD27233@mellanox.co.il> FYI, git://git.openfabrics.org/~mst/ofed_kernel.git I've merged in 2.6.22-rc5 which will pull in multiple bug fixes. I also added local sa patch back in (not sure how but it went missing). -- Michael S. Tsirkin - Staff Engineer, Mellanox Technologies Ltd. Eternity is a very long time, especially towards the end. From pradeeps at linux.vnet.ibm.com Sun Jun 17 19:36:20 2007 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Sun, 17 Jun 2007 19:36:20 -0700 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <20070616192702.GM2207@mellanox.co.il> References: <466F36C8.5010507@linux.vnet.ibm.com> <20070613163821.GB12277@mellanox.co.il> <20070613174930.GE12277@mellanox.co.il> <46716F3D.7050206@ichips.intel.com> <20070614175030.GB29561@mellanox.co.il> <4671C541.4040503@linux.vnet.ibm.com> <20070615051846.GG2207@mellanox.co.il> <4672C0DC.8060308@linux.vnet.ibm.com> <20070616192702.GM2207@mellanox.co.il> Message-ID: <4675EFA4.5050209@linux.vnet.ibm.com> Michael S. Tsirkin wrote: >> We need to make some decisions > > Earlier, Roland suggested: >> However it may be a good approach to put an abstraction layer in IPoIB >> so that the CM code can use an SRQ-like interface to both HCAs that >> support SRQ and HCAs that don't. This approach would be a regression; no guarantees that anything else would be better. As Bernard King-Smith said changing to a different approach (mid-stream) is not the right thing to do. > > And I think this might be a good approach, too - and maybe > this layer could be general enough to be reusable in other > ULPs later. > Pradeep From shani.moideen at wipro.com Sun Jun 17 20:16:41 2007 From: shani.moideen at wipro.com (Shani Moideen) Date: Mon, 18 Jun 2007 08:46:41 +0530 Subject: [ofa-general] [KJ PATCH] Replacing memset(,0,PAGE_SIZE) with clear_page() in drivers/infiniband/hw/mthca/mthca_allocator.c Message-ID: <1182136601.9020.7.camel@shani-win> Replacing memset(,0,PAGE_SIZE) with clear_page() in drivers/infiniband/hw/mthca/mthca_allocator.c Signed-off-by: Shani Moideen ---- diff --git a/drivers/infiniband/hw/mthca/mthca_allocator.c b/drivers/infiniband/hw/mthca/mthca_allocator.c index f930e55..a763067 100644 --- a/drivers/infiniband/hw/mthca/mthca_allocator.c +++ b/drivers/infiniband/hw/mthca/mthca_allocator.c @@ -255,7 +255,7 @@ int mthca_buf_alloc(struct mthca_dev *dev, int size, int max_direct, dma_list[i] = t; pci_unmap_addr_set(&buf->page_list[i], mapping, t); - memset(buf->page_list[i].buf, 0, PAGE_SIZE); + clear_page(buf->page_list[i].buf); } } -- Shani The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com From shani.moideen at wipro.com Sun Jun 17 20:23:00 2007 From: shani.moideen at wipro.com (Shani Moideen) Date: Mon, 18 Jun 2007 08:53:00 +0530 Subject: [ofa-general] [KJ PATCH] Replacing memset(,0,PAGE_SIZE) with clear_page() in drivers/infiniband/hw/mthca/mthca_eq.c Message-ID: <1182136980.9020.13.camel@shani-win> Replacing memset(,0,PAGE_SIZE) with clear_page() in drivers/infiniband/hw/mthca/mthca_eq.c Signed-off-by: Shani Moideen ---- diff --git a/drivers/infiniband/hw/mthca/mthca_eq.c b/drivers/infiniband/hw/mthca/mthca_eq.c index 8ec9fa1..8592b26 100644 --- a/drivers/infiniband/hw/mthca/mthca_eq.c +++ b/drivers/infiniband/hw/mthca/mthca_eq.c @@ -522,7 +522,7 @@ static int mthca_create_eq(struct mthca_dev *dev, dma_list[i] = t; pci_unmap_addr_set(&eq->page_list[i], mapping, t); - memset(eq->page_list[i].buf, 0, PAGE_SIZE); + clear_page(eq->page_list[i].buf); } for (i = 0; i < eq->nent; ++i) -- Shani The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com From shani.moideen at wipro.com Sun Jun 17 20:33:56 2007 From: shani.moideen at wipro.com (Shani Moideen) Date: Mon, 18 Jun 2007 09:03:56 +0530 Subject: [ofa-general] [KJ PATCH] Replacing memset(,0,PAGE_SIZE) with clear_page() in drivers/infiniband/hw/ipath/ipath_driver.c Message-ID: <1182137636.9020.17.camel@shani-win> Replacing memset(,0,PAGE_SIZE) with clear_page() in drivers/infiniband/hw/ipath/ipath_driver.c Signed-off-by: Shani Moideen ---- diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index e3a2232..417e3ca 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -1509,7 +1509,7 @@ int ipath_create_rcvhdrq(struct ipath_devdata *dd, /* clear for security and sanity on each use */ memset(pd->port_rcvhdrq, 0, pd->port_rcvhdrq_size); - memset(pd->port_rcvhdrtail_kvaddr, 0, PAGE_SIZE); + clear_page(pd->port_rcvhdrtail_kvaddr); /* * tell chip each time we init it, even if we are re-using previous -- Shani The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com From vlad at dev.mellanox.co.il Sun Jun 17 23:07:29 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 18 Jun 2007 09:07:29 +0300 Subject: [ofa-general] Re: [ewg] [GIT PULL ofed_1_2] iw_cxgb3 fixes for bugs 663/664 In-Reply-To: <46755A3D.1030300@opengridcomputing.com> References: <46755A3D.1030300@opengridcomputing.com> Message-ID: <46762121.70404@dev.mellanox.co.il> Steve Wise wrote: > Vlad, > > Please pull in these fixes for bugs 663/664 from > > git://git.openfabrics.org/~swise/ofed_1_2 ofed_1_2 > > Thanks, > > Steve. > Done, Regards, Vladimir From erezz at voltaire.com Sun Jun 17 23:50:40 2007 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 18 Jun 2007 09:50:40 +0300 Subject: [ofa-general] Re: [PATCH] IB/iser: make all fixes patches apply on full kernel source In-Reply-To: <20070617191056.GC27233@mellanox.co.il> References: <20070617191056.GC27233@mellanox.co.il> Message-ID: <46762B40.1010500@voltaire.com> Michael S. Tsirkin wrote: > Since we'll have RC6, I'd like to add the following cosmetic change: > move iscsi_scsi_makefile from kernel_patches/fixes to ofed_scripts, > and use a softlink to put in in place. > > The solves the following problem: if I do "git clone" on ofed > tree without -n, and try to apply the fixes patchset, > I get a conflict on iscsi_scsi_makefile.patch simply because > the makefile that this patch attempts to create is part > of upstream kernel already. > > I think this is 0-risk and carries real benefit for developers > who'll need to support OFED 1.2. > > Erez, do you agree? If yes, I'll ask Tziporet to approve, too. > I'm ok with it. Thanks, Erez From mst at dev.mellanox.co.il Mon Jun 18 01:32:40 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 18 Jun 2007 11:32:40 +0300 Subject: [ofa-general] [PATCH for-2.6.22] ipoib/cm: initialize RX before moving QP to RTR In-Reply-To: References: <4672BE23.3050809@ichips.intel.com> Message-ID: <20070618083240.GK14335@mellanox.co.il> Fix a crasher bug in IPoIB CM: once QP is in RTR, an RX completion (and even an asynchronous error) might be observed on this QP, so we have to initialize all RX fields beforehand. This fixes bug Signed-off-by: Michael S. Tsirkin --- > Quoting Woodruff, Robert J : > Subject: RE: [ofa-general] crash in ipoib > > Sean wrote, > >> And here's a version with error handling fixed. > >> Sean, does this solve your crash? > > >We've been running this patch since yesterday and haven't seen any > >crashes. We'll continue testing this over the week-end. > > >- Sean > > This looks like it fixed the panic. > > Should we try to put out a new RC with this latest ipoib fix ? > I really think we need it in the release. If we could get another RC out > today, > that would only delay the release by a couple of more days and we could > release on next Friday rather than wed. and still give people a week to > test the final RC. > > woody OK, the following patch has been added to OFED 1.2. Roland, please consider this bugfix for 2.6.22. diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 076a0bb..c64249f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -309,6 +309,11 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even return -ENOMEM; p->dev = dev; p->id = cm_id; + cm_id->context = p; + p->state = IPOIB_CM_RX_LIVE; + p->jiffies = jiffies; + INIT_LIST_HEAD(&p->list); + p->qp = ipoib_cm_create_rx_qp(dev, p); if (IS_ERR(p->qp)) { ret = PTR_ERR(p->qp); @@ -320,24 +325,24 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even if (ret) goto err_modify; + spin_lock_irq(&priv->lock); + queue_delayed_work(ipoib_workqueue, + &priv->cm.stale_task, IPOIB_CM_RX_DELAY); + /* Add this entry to passive ids list head, but do not re-add it + * if IB_EVENT_QP_LAST_WQE_REACHED has moved it to flush list. */ + p->jiffies = jiffies; + if (p->state == IPOIB_CM_RX_LIVE) + list_move(&p->list, &priv->cm.passive_ids); + spin_unlock_irq(&priv->lock); + ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); if (ret) { ipoib_warn(priv, "failed to send REP: %d\n", ret); - goto err_rep; + if (ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE)) + ipoib_warn(priv, "unable to move qp to error state\n"); } - - cm_id->context = p; - p->jiffies = jiffies; - p->state = IPOIB_CM_RX_LIVE; - spin_lock_irq(&priv->lock); - if (list_empty(&priv->cm.passive_ids)) - queue_delayed_work(ipoib_workqueue, - &priv->cm.stale_task, IPOIB_CM_RX_DELAY); - list_add(&p->list, &priv->cm.passive_ids); - spin_unlock_irq(&priv->lock); return 0; -err_rep: err_modify: ib_destroy_qp(p->qp); err_qp: -- MST From vlad at lists.openfabrics.org Mon Jun 18 02:45:10 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Mon, 18 Jun 2007 02:45:10 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070618-0200 daily build status Message-ID: <20070618094510.B9DFAE6082B@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.20 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.15 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.15 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.14 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.16 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.15 Passed on x86_64 with linux-2.6.17 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.16 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From mst at dev.mellanox.co.il Mon Jun 18 04:48:43 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 18 Jun 2007 14:48:43 +0300 Subject: [ofa-general] hang at module removal with local sa patches applied Message-ID: <20070618114843.GA25428@mellanox.co.il> Hi! I tried applying the latest local sa patches to 2.6.2-rc5, and applied the patch at the bottom to disable sa cache by default. After this, "openib stop" hangs forever. You can see the exact patches I applied here: http://git.openfabrics.org/git/?p=~mst/ofed_kernel.git;a=tree;f=kernel_patches/attic;hb=ofed_kernel Here's sysrq trace of threads that look IB-related. [14897.168101] mthca_catas S 0000000000000001 0 8330 2 (L-TLB) [14897.168104] ffff8100764bded0 0000000000000046 0000000000000000 0000000000000000 [14897.168107] ffff81007ebea950 0000000000000006 ffff81007ebea920 ffff81007ff1f4a0 [14897.168111] 00000d83a434d314 00000000000004b6 ffff81007ebeaad0 0000000000000046 [14897.168113] Call Trace: [14897.168116] [] worker_thread+0x0/0xe7 [14897.168119] [] worker_thread+0xa2/0xe7 [14897.168122] [] autoremove_wake_function+0x0/0x38 [14897.168125] [] kthread+0x49/0x76 [14897.168127] [] child_rip+0xa/0x12 [14897.168130] [] kthread+0x0/0x76 [14897.168133] [] child_rip+0x0/0x12 [14897.168134] [14897.168136] ib_mad1 S 0000000000000003 0 8333 2 (L-TLB) [14897.168139] ffff81007ce53ed0 0000000000000046 0000000000000000 ffff81007fcdc400 [14897.168142] 000000007ebf4990 000000000000000a ffff81007ebf4960 ffff81007fe0b520 [14897.168146] 00000d853dc5974d 00000000000012c8 ffff81007ebf4b10 ffff81007fe0b520 [14897.168149] Call Trace: [14897.168152] [] worker_thread+0x0/0xe7 [14897.168154] [] worker_thread+0xa2/0xe7 [14897.168157] [] autoremove_wake_function+0x0/0x38 [14897.168160] [] kthread+0x49/0x76 [14897.168162] [] child_rip+0xa/0x12 [14897.168165] [] kthread+0x0/0x76 [14897.168168] [] child_rip+0x0/0x12 [14897.168169] [14897.168171] ib_mad2 S 0000000000000000 0 8334 2 (L-TLB) [14897.168174] ffff81007ce51ed0 0000000000000046 0000000000000000 ffff81007edcdc00 [14897.168177] 000000007e86f710 000000000000000a ffff81007e86f6e0 ffffffff8070d4c0 [14897.168181] 00000d853dc7ba8f 00000000000012aa ffff81007e86f890 ffffffff8070d4c0 [14897.168184] Call Trace: [14897.168187] [] worker_thread+0x0/0xe7 [14897.168189] [] worker_thread+0xa2/0xe7 [14897.168192] [] autoremove_wake_function+0x0/0x38 [14897.168195] [] kthread+0x49/0x76 [14897.168198] [] child_rip+0xa/0x12 [14897.168201] [] kthread+0x0/0x76 [14897.168203] [] child_rip+0x0/0x12 [14897.168205] [14897.168206] ib_mcast S 0000000000000000 0 8359 2 (L-TLB) [14897.168210] ffff81007d3a3ed0 0000000000000046 0000000000000000 0000000000000000 [14897.168213] 0000ffff1b4012ff 000000000000000a ffff81007e8830c0 ffffffff8070d4c0 [14897.168216] 00000d84fe84fafa 0000000000001105 ffff81007e883270 0000000000010000 [14897.168219] Call Trace: [14897.168222] [] worker_thread+0x0/0xe7 [14897.168225] [] worker_thread+0xa2/0xe7 [14897.168228] [] autoremove_wake_function+0x0/0x38 [14897.168230] [] kthread+0x49/0x76 [14897.168233] [] child_rip+0xa/0x12 [14897.168236] [] kthread+0x0/0x76 [14897.168239] [] child_rip+0x0/0x12 [14897.168240] [14897.168242] ib_inform S ffff81007e4d1740 0 8360 2 (L-TLB) [14897.168245] ffff81007d0d1ed0 0000000000000046 0000000024000000 0000000000000000 [14897.168248] ffff810076c60130 0000000000000006 ffff810076c60100 ffff81007d1c7560 [14897.168252] 00000d83ee2e6167 000000000000035a ffff810076c602b0 0000000000000046 [14897.168254] Call Trace: [14897.168257] [] worker_thread+0x0/0xe7 [14897.168260] [] worker_thread+0xa2/0xe7 [14897.168263] [] autoremove_wake_function+0x0/0x38 [14897.168266] [] kthread+0x49/0x76 [14897.168268] [] child_rip+0xa/0x12 [14897.168271] [] kthread+0x0/0x76 [14897.168274] [] child_rip+0x0/0x12 [14897.168275] [14897.168277] local_sa D 0000000000000001 0 8361 2 (L-TLB) [14897.168280] ffff81007d0d3c10 0000000000000046 0000000000000000 800000ce00000000 [14897.168283] 84000b0000000000 000000000000000a ffff81007e8f3420 ffff81007ff1f4a0 [14897.168287] 00000d8431895ed4 0000000000000d33 ffff81007e8f35d0 800000ce00000000 [14897.168290] Call Trace: [14897.168294] [] __mutex_lock_slowpath+0x69/0xaa [14897.168303] [] :ib_sa:port_work_handler+0x0/0x34 [14897.168306] [] mutex_lock+0xe/0x10 [14897.168311] [] :ib_sa:port_work_handler+0x1c/0x34 [14897.168314] [] run_workqueue+0x85/0x10f [14897.168317] [] flush_cpu_workqueue+0x28/0x7b [14897.168320] [] flush_workqueue+0x43/0x5d [14897.168326] [] :ib_sa:cleanup_port+0x25/0x7b [14897.168331] [] :ib_sa:process_updates+0x61/0x336 [14897.168335] [] thread_return+0x0/0xea [14897.168341] [] :ib_sa:add_update+0x7a/0x83 [14897.168347] [] :ib_sa:port_work_handler+0x0/0x34 [14897.168352] [] :ib_sa:refresh_port_db+0x36/0x3b [14897.168358] [] :ib_sa:port_work_handler+0x24/0x34 [14897.168361] [] run_workqueue+0x85/0x10f [14897.168363] [] worker_thread+0x0/0xe7 [14897.168366] [] worker_thread+0xdc/0xe7 [14897.168368] [] autoremove_wake_function+0x0/0x38 [14897.168371] [] kthread+0x49/0x76 [14897.168374] [] child_rip+0xa/0x12 [14897.168377] [] kthread+0x0/0x76 [14897.168379] [] child_rip+0x0/0x12 [14897.168381] [14897.168382] openibd S 0000000000000002 0 8598 6178 (NOTLB) [14897.168386] ffff81007fadbeb8 0000000000000082 0000000000000000 ffff81007d4b2678 [14897.168389] 00000000005a5640 0000000000000001 ffff81007f7e60c0 ffff81007ff574e0 [14897.168392] 00000d84e88f6e97 0000000000007060 ffff81007f7e6270 ffff81007c309600 [14897.168396] Call Trace: [14897.168399] [] do_wait+0xa0a/0xb1f [14897.168402] [] default_wake_function+0x0/0xf [14897.168405] [] sys_wait4+0x28/0x2a [14897.168408] [] system_call+0x7e/0x83 [14897.168410] [14897.168411] modprobe D 0000000000000000 0 8640 8598 (NOTLB) [14897.168415] ffff81007c90bd78 0000000000000086 0000000000000000 ffffffff807186a0 [14897.168418] ffff81007c90be68 0000000000000007 ffff81007730edc0 ffffffff8070d4c0 [14897.168422] 00000d852f6be2aa 0000000000000b50 ffff81007730ef70 0000000000000001 [14897.168424] Call Trace: [14897.168428] [] wait_for_completion+0x82/0xc1 [14897.168431] [] default_wake_function+0x0/0xf [14897.168434] [] flush_cpu_workqueue+0x6f/0x7b [14897.168436] [] wq_barrier_func+0x0/0xf [14897.168439] [] flush_workqueue+0x43/0x5d [14897.168445] [] :ib_sa:sa_db_remove_dev+0x3d/0x9c [14897.168448] [] default_wake_function+0x0/0xf [14897.168458] [] :ib_core:ib_unregister_client+0x37/0xf0 [14897.168465] [] :ib_sa:sa_db_cleanup+0x10/0x2a [14897.168470] [] :ib_sa:ib_sa_cleanup+0x9/0x2d [14897.168474] [] sys_delete_module+0x1b5/0x1e6 [14897.168477] [] system_call+0x7e/0x83 [14897.168479] --- Disable SA cache by default. Signed-off-by: Michael S. Tsirkin --- Index: connectx/drivers/infiniband/core/local_sa.c =================================================================== --- connectx.orig/drivers/infiniband/core/local_sa.c 2007-05-31 09:32:50.000000000 +0300 +++ connectx/drivers/infiniband/core/local_sa.c 2007-05-31 09:33:55.000000000 +0300 @@ -55,7 +55,7 @@ enum { }; static int set_paths_per_dest(const char *val, struct kernel_param *kp); -static unsigned long paths_per_dest = SA_DB_MAX_PATHS_PER_DEST; +static unsigned long paths_per_dest = 0; module_param_call(paths_per_dest, set_paths_per_dest, param_get_ulong, &paths_per_dest, 0644); MODULE_PARM_DESC(paths_per_dest, "Maximum number of paths to retrieve " -- MST From support at qlogic.com Mon Jun 18 05:51:05 2007 From: support at qlogic.com (QLogic Support) Date: Mon, 18 Jun 2007 05:51:05 -0700 (PDT) Subject: [ofa-general] Re: [KJ PATCH] Replacing memset(, 0, PAGE_SIZE) with clear_page() in drivers/infiniband/hw/ipath/ipath_driver.c [REF:7963312062] In-Reply-To: <1182137636.9020.17.camel@shani-win> Message-ID: <4658214.1182171064319.JavaMail.support@qlogic.com> Regards, Steve Newberger QLogic Corporation Support at QLogic.com Please visit our web @ http://support.qlogic.com/ ---- Original Message ---- From: shani.moideen at wipro.com Sent: 17-Jun-2007 22:33:56 To: support at pathscale.com Cc: openib-general at openib.org; kernel-janitors at lists.osdl.org Subject: [KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with clear_page(<addr>) in drivers/infiniband/hw/ipath/ipath_driver.c Replacing memset(,0,PAGE_SIZE) with clear_page() in drivers/infiniband/hw/ipath/ipath_driver.c Signed-off-by: Shani Moideen ---- diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index e3a2232..417e3ca 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -1509,7 +1509,7 @@ int ipath_create_rcvhdrq(struct ipath_devdata *dd, /* clear for security and sanity on each use */ memset(pd->port_rcvhdrq, 0, pd->port_rcvhdrq_size); - memset(pd->port_rcvhdrtail_kvaddr, 0, PAGE_SIZE); + clear_page(pd->port_rcvhdrtail_kvaddr); /* * tell chip each time we init it, even if we are re-using previous -- Shani The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com From barr_robertpeters70 at yahoo.com Mon Jun 18 06:37:44 2007 From: barr_robertpeters70 at yahoo.com (peter robert) Date: Mon, 18 Jun 2007 06:37:44 -0700 (PDT) Subject: [ofa-general] You are listed in Late Mr. Mark Patrick's Inheritance Message-ID: <261622.76583.qm@web63906.mail.re1.yahoo.com> Attention: Bequest Beneficiary, We act as solicitors and our services have been retained by Mark Patrick, now late here in after referred to as our client. On behalf of late Mark Patrick, I write to notify you that our late client made you a beneficiary to the bequest sum of One Million, Seven Hundred Thousand British pound sterling in the codicil to his will and last testament. Mark Patrick died on 8th day February 2005 after a brief illness at the age of 85. Until his death he was consultant to several oil and gas industries. He had a sojourn in the United States and so many other countries before he came to Cairn Energy PLC oil and gas exploration and Production Company based in the United Kingdom. He was a knight in the Church and belonged to several non-governmental and scientific organizations. He was also a great philanthropist and a Paul Harris Fellow of the Rotary Club International. This bequest is to support your activities, humanitarian services and help to the less privileged. In accordance with our inheritance law you are required to apply for claims through this law firm to a Finance House in United Kingdom, where this fund was deposited. We are perfecting arrangements to complete the transfer of this inheritance to you. You are required to forward the following details of yours; full names, address, occupation, age, phone and fax numbers to Robert Peters (Attorney At Law) through this email address; barrister211 at gmail.com, Tell: +44-701-112-9478, for verification and re-confirmation. Please acknowledge the receipt of this letter immediately by replying. Yours in service, Dynamic Law Firm, Solicitors & Advocates. 12 Campshill Road, London United Kingdom. --------------------------------- Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase. -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Mon Jun 18 08:41:08 2007 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 18 Jun 2007 08:41:08 -0700 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <20070617055649.GN2207@mellanox.co.il> Message-ID: Hello Michael, >> IPoIB CM support, with or without SRQ, is less >> scalable than IPoIB UD mode, >I believe this is incorrect: datagram mode has AH per destination, >connected mode has a QP per destination, so with SRQ, I see no >inherent lack of scalability with connected as compared to datagram mode. How many nodes of cluster have you tested for IPoIB-CM mode? What kind of tests? Do you have any data to share? Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon Jun 18 08:49:20 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 18 Jun 2007 08:49:20 -0700 Subject: [ofa-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get a bunch of fixes to the new mlx4 driver. This pull is bigger than I would have liked after -rc5, but Mellanox discovered a problem that required a firmware change and also some driver help to fix. Since this is a new driver for 2.6.22, which is for new hardware that no one has in production yet, I think it's better to merge this early even if it risks introducing a bug, rather than have a driver in 2.6.22 that doesn't work at all with current adapter firmware. Jack Morgenstein (1): IB/mlx4: Handle buffer wraparound in __mlx4_ib_cq_clean() Roland Dreier (6): IB/mlx4: Fix handling of wq->tail for send completions IB/mlx4: Fix warning in rounding up queue sizes IB/mlx4: Handle new FW requirement for send request prefetching IB/mlx4: Get rid of max_inline_data calculation IB/mlx4: Handle FW command interface rev 3 IB/mlx4: Make sure inline data segments don't cross a 64 byte boundary drivers/infiniband/hw/mlx4/cq.c | 19 ++-- drivers/infiniband/hw/mlx4/main.c | 16 ++- drivers/infiniband/hw/mlx4/mlx4_ib.h | 5 +- drivers/infiniband/hw/mlx4/qp.c | 196 ++++++++++++++++++++++------------ drivers/infiniband/hw/mlx4/user.h | 9 +- drivers/net/mlx4/fw.c | 110 +++++++++++++------- drivers/net/mlx4/fw.h | 10 +- drivers/net/mlx4/main.c | 14 ++- include/linux/mlx4/cmd.h | 1 + include/linux/mlx4/device.h | 13 ++- include/linux/mlx4/qp.h | 4 + 11 files changed, 259 insertions(+), 138 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index b2a290c..660b27a 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -354,8 +354,8 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq, if (is_send) { wq = &(*cur_qp)->sq; wqe_ctr = be16_to_cpu(cqe->wqe_index); - wq->tail += wqe_ctr - (u16) wq->tail; - wc->wr_id = wq->wrid[wq->tail & (wq->max - 1)]; + wq->tail += (u16) (wqe_ctr - (u16) wq->tail); + wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)]; ++wq->tail; } else if ((*cur_qp)->ibqp.srq) { srq = to_msrq((*cur_qp)->ibqp.srq); @@ -364,7 +364,7 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq, mlx4_ib_free_srq_wqe(srq, wqe_ctr); } else { wq = &(*cur_qp)->rq; - wc->wr_id = wq->wrid[wq->tail & (wq->max - 1)]; + wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)]; ++wq->tail; } @@ -478,7 +478,8 @@ void __mlx4_ib_cq_clean(struct mlx4_ib_cq *cq, u32 qpn, struct mlx4_ib_srq *srq) { u32 prod_index; int nfreed = 0; - struct mlx4_cqe *cqe; + struct mlx4_cqe *cqe, *dest; + u8 owner_bit; /* * First we need to find the current producer index, so we @@ -501,9 +502,13 @@ void __mlx4_ib_cq_clean(struct mlx4_ib_cq *cq, u32 qpn, struct mlx4_ib_srq *srq) if (srq && !(cqe->owner_sr_opcode & MLX4_CQE_IS_SEND_MASK)) mlx4_ib_free_srq_wqe(srq, be16_to_cpu(cqe->wqe_index)); ++nfreed; - } else if (nfreed) - memcpy(get_cqe(cq, (prod_index + nfreed) & cq->ibcq.cqe), - cqe, sizeof *cqe); + } else if (nfreed) { + dest = get_cqe(cq, (prod_index + nfreed) & cq->ibcq.cqe); + owner_bit = dest->owner_sr_opcode & MLX4_CQE_OWNER_MASK; + memcpy(dest, cqe, sizeof *cqe); + dest->owner_sr_opcode = owner_bit | + (dest->owner_sr_opcode & ~MLX4_CQE_OWNER_MASK); + } } if (nfreed) { diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 402f3a2..1095c82 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -125,7 +125,7 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, props->local_ca_ack_delay = dev->dev->caps.local_ca_ack_delay; props->atomic_cap = dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_ATOMIC ? IB_ATOMIC_HCA : IB_ATOMIC_NONE; - props->max_pkeys = dev->dev->caps.pkey_table_len; + props->max_pkeys = dev->dev->caps.pkey_table_len[1]; props->max_mcast_grp = dev->dev->caps.num_mgms + dev->dev->caps.num_amgms; props->max_mcast_qp_attach = dev->dev->caps.num_qp_per_mgm; props->max_total_mcast_qp_attach = props->max_mcast_qp_attach * @@ -168,9 +168,9 @@ static int mlx4_ib_query_port(struct ib_device *ibdev, u8 port, props->state = out_mad->data[32] & 0xf; props->phys_state = out_mad->data[33] >> 4; props->port_cap_flags = be32_to_cpup((__be32 *) (out_mad->data + 20)); - props->gid_tbl_len = to_mdev(ibdev)->dev->caps.gid_table_len; + props->gid_tbl_len = to_mdev(ibdev)->dev->caps.gid_table_len[port]; props->max_msg_sz = 0x80000000; - props->pkey_tbl_len = to_mdev(ibdev)->dev->caps.pkey_table_len; + props->pkey_tbl_len = to_mdev(ibdev)->dev->caps.pkey_table_len[port]; props->bad_pkey_cntr = be16_to_cpup((__be16 *) (out_mad->data + 46)); props->qkey_viol_cntr = be16_to_cpup((__be16 *) (out_mad->data + 48)); props->active_width = out_mad->data[31] & 0xf; @@ -280,8 +280,14 @@ static int mlx4_SET_PORT(struct mlx4_ib_dev *dev, u8 port, int reset_qkey_viols, return PTR_ERR(mailbox); memset(mailbox->buf, 0, 256); - *(u8 *) mailbox->buf = !!reset_qkey_viols << 6; - ((__be32 *) mailbox->buf)[2] = cpu_to_be32(cap_mask); + + if (dev->dev->flags & MLX4_FLAG_OLD_PORT_CMDS) { + *(u8 *) mailbox->buf = !!reset_qkey_viols << 6; + ((__be32 *) mailbox->buf)[2] = cpu_to_be32(cap_mask); + } else { + ((u8 *) mailbox->buf)[3] = !!reset_qkey_viols; + ((__be32 *) mailbox->buf)[1] = cpu_to_be32(cap_mask); + } err = mlx4_cmd(dev->dev, mailbox->dma, port, 0, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B); diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 93dac71..24ccadd 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -95,7 +95,8 @@ struct mlx4_ib_mr { struct mlx4_ib_wq { u64 *wrid; spinlock_t lock; - int max; + int wqe_cnt; + int max_post; int max_gs; int offset; int wqe_shift; @@ -113,6 +114,7 @@ struct mlx4_ib_qp { u32 doorbell_qpn; __be32 sq_signal_bits; + int sq_spare_wqes; struct mlx4_ib_wq sq; struct ib_umem *umem; @@ -123,6 +125,7 @@ struct mlx4_ib_qp { u8 alt_port; u8 atomic_rd_en; u8 resp_depth; + u8 sq_no_prefetch; u8 state; }; diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 5c6d054..f8a1a08 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -109,6 +109,20 @@ static void *get_send_wqe(struct mlx4_ib_qp *qp, int n) return get_wqe(qp, qp->sq.offset + (n << qp->sq.wqe_shift)); } +/* + * Stamp a SQ WQE so that it is invalid if prefetched by marking the + * first four bytes of every 64 byte chunk with 0xffffffff, except for + * the very first chunk of the WQE. + */ +static void stamp_send_wqe(struct mlx4_ib_qp *qp, int n) +{ + u32 *wqe = get_send_wqe(qp, n); + int i; + + for (i = 16; i < 1 << (qp->sq.wqe_shift - 2); i += 16) + wqe[i] = 0xffffffff; +} + static void mlx4_ib_qp_event(struct mlx4_qp *qp, enum mlx4_event type) { struct ib_event event; @@ -178,6 +192,8 @@ static int send_wqe_overhead(enum ib_qp_type type) case IB_QPT_GSI: return sizeof (struct mlx4_wqe_ctrl_seg) + ALIGN(MLX4_IB_UD_HEADER_SIZE + + DIV_ROUND_UP(MLX4_IB_UD_HEADER_SIZE, + MLX4_INLINE_ALIGN) * sizeof (struct mlx4_wqe_inline_seg), sizeof (struct mlx4_wqe_data_seg)) + ALIGN(4 + @@ -201,18 +217,18 @@ static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, if (cap->max_recv_wr) return -EINVAL; - qp->rq.max = qp->rq.max_gs = 0; + qp->rq.wqe_cnt = qp->rq.max_gs = 0; } else { /* HW requires >= 1 RQ entry with >= 1 gather entry */ if (is_user && (!cap->max_recv_wr || !cap->max_recv_sge)) return -EINVAL; - qp->rq.max = roundup_pow_of_two(max(1, cap->max_recv_wr)); - qp->rq.max_gs = roundup_pow_of_two(max(1, cap->max_recv_sge)); + qp->rq.wqe_cnt = roundup_pow_of_two(max(1U, cap->max_recv_wr)); + qp->rq.max_gs = roundup_pow_of_two(max(1U, cap->max_recv_sge)); qp->rq.wqe_shift = ilog2(qp->rq.max_gs * sizeof (struct mlx4_wqe_data_seg)); } - cap->max_recv_wr = qp->rq.max; + cap->max_recv_wr = qp->rq.max_post = qp->rq.wqe_cnt; cap->max_recv_sge = qp->rq.max_gs; return 0; @@ -236,8 +252,6 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, cap->max_send_sge + 2 > dev->dev->caps.max_sq_sg) return -EINVAL; - qp->sq.max = cap->max_send_wr ? roundup_pow_of_two(cap->max_send_wr) : 1; - qp->sq.wqe_shift = ilog2(roundup_pow_of_two(max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg), cap->max_inline_data + @@ -246,20 +260,27 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, qp->sq.max_gs = ((1 << qp->sq.wqe_shift) - send_wqe_overhead(type)) / sizeof (struct mlx4_wqe_data_seg); - qp->buf_size = (qp->rq.max << qp->rq.wqe_shift) + - (qp->sq.max << qp->sq.wqe_shift); + /* + * We need to leave 2 KB + 1 WQE of headroom in the SQ to + * allow HW to prefetch. + */ + qp->sq_spare_wqes = (2048 >> qp->sq.wqe_shift) + 1; + qp->sq.wqe_cnt = roundup_pow_of_two(cap->max_send_wr + qp->sq_spare_wqes); + + qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) + + (qp->sq.wqe_cnt << qp->sq.wqe_shift); if (qp->rq.wqe_shift > qp->sq.wqe_shift) { qp->rq.offset = 0; - qp->sq.offset = qp->rq.max << qp->rq.wqe_shift; + qp->sq.offset = qp->rq.wqe_cnt << qp->rq.wqe_shift; } else { - qp->rq.offset = qp->sq.max << qp->sq.wqe_shift; + qp->rq.offset = qp->sq.wqe_cnt << qp->sq.wqe_shift; qp->sq.offset = 0; } - cap->max_send_wr = qp->sq.max; - cap->max_send_sge = qp->sq.max_gs; - cap->max_inline_data = (1 << qp->sq.wqe_shift) - send_wqe_overhead(type) - - sizeof (struct mlx4_wqe_inline_seg); + cap->max_send_wr = qp->sq.max_post = qp->sq.wqe_cnt - qp->sq_spare_wqes; + cap->max_send_sge = qp->sq.max_gs; + /* We don't support inline sends for kernel QPs (yet) */ + cap->max_inline_data = 0; return 0; } @@ -267,11 +288,11 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap, static int set_user_sq_size(struct mlx4_ib_qp *qp, struct mlx4_ib_create_qp *ucmd) { - qp->sq.max = 1 << ucmd->log_sq_bb_count; + qp->sq.wqe_cnt = 1 << ucmd->log_sq_bb_count; qp->sq.wqe_shift = ucmd->log_sq_stride; - qp->buf_size = (qp->rq.max << qp->rq.wqe_shift) + - (qp->sq.max << qp->sq.wqe_shift); + qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) + + (qp->sq.wqe_cnt << qp->sq.wqe_shift); return 0; } @@ -307,6 +328,8 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, goto err; } + qp->sq_no_prefetch = ucmd.sq_no_prefetch; + err = set_user_sq_size(qp, &ucmd); if (err) goto err; @@ -334,6 +357,8 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, goto err_mtt; } } else { + qp->sq_no_prefetch = 0; + err = set_kernel_sq_size(dev, &init_attr->cap, init_attr->qp_type, qp); if (err) goto err; @@ -360,16 +385,13 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, if (err) goto err_mtt; - qp->sq.wrid = kmalloc(qp->sq.max * sizeof (u64), GFP_KERNEL); - qp->rq.wrid = kmalloc(qp->rq.max * sizeof (u64), GFP_KERNEL); + qp->sq.wrid = kmalloc(qp->sq.wqe_cnt * sizeof (u64), GFP_KERNEL); + qp->rq.wrid = kmalloc(qp->rq.wqe_cnt * sizeof (u64), GFP_KERNEL); if (!qp->sq.wrid || !qp->rq.wrid) { err = -ENOMEM; goto err_wrid; } - - /* We don't support inline sends for kernel QPs (yet) */ - init_attr->cap.max_inline_data = 0; } err = mlx4_qp_alloc(dev->dev, sqpn, &qp->mqp); @@ -583,24 +605,6 @@ int mlx4_ib_destroy_qp(struct ib_qp *qp) return 0; } -static void init_port(struct mlx4_ib_dev *dev, int port) -{ - struct mlx4_init_port_param param; - int err; - - memset(¶m, 0, sizeof param); - - param.port_width_cap = dev->dev->caps.port_width_cap; - param.vl_cap = dev->dev->caps.vl_cap; - param.mtu = ib_mtu_enum_to_int(dev->dev->caps.mtu_cap); - param.max_gid = dev->dev->caps.gid_table_len; - param.max_pkey = dev->dev->caps.pkey_table_len; - - err = mlx4_INIT_PORT(dev->dev, ¶m, port); - if (err) - printk(KERN_WARNING "INIT_PORT failed, return code %d.\n", err); -} - static int to_mlx4_st(enum ib_qp_type type) { switch (type) { @@ -674,9 +678,9 @@ static int mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah, path->counter_index = 0xff; if (ah->ah_flags & IB_AH_GRH) { - if (ah->grh.sgid_index >= dev->dev->caps.gid_table_len) { + if (ah->grh.sgid_index >= dev->dev->caps.gid_table_len[port]) { printk(KERN_ERR "sgid_index (%u) too large. max is %d\n", - ah->grh.sgid_index, dev->dev->caps.gid_table_len - 1); + ah->grh.sgid_index, dev->dev->caps.gid_table_len[port] - 1); return -1; } @@ -743,14 +747,17 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, context->mtu_msgmax = (attr->path_mtu << 5) | 31; } - if (qp->rq.max) - context->rq_size_stride = ilog2(qp->rq.max) << 3; + if (qp->rq.wqe_cnt) + context->rq_size_stride = ilog2(qp->rq.wqe_cnt) << 3; context->rq_size_stride |= qp->rq.wqe_shift - 4; - if (qp->sq.max) - context->sq_size_stride = ilog2(qp->sq.max) << 3; + if (qp->sq.wqe_cnt) + context->sq_size_stride = ilog2(qp->sq.wqe_cnt) << 3; context->sq_size_stride |= qp->sq.wqe_shift - 4; + if (cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT) + context->sq_size_stride |= !!qp->sq_no_prefetch << 7; + if (qp->ibqp.uobject) context->usr_page = cpu_to_be32(to_mucontext(ibqp->uobject->context)->uar.index); else @@ -789,13 +796,14 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, } if (attr_mask & IB_QP_ALT_PATH) { - if (attr->alt_pkey_index >= dev->dev->caps.pkey_table_len) - return -EINVAL; - if (attr->alt_port_num == 0 || attr->alt_port_num > dev->dev->caps.num_ports) return -EINVAL; + if (attr->alt_pkey_index >= + dev->dev->caps.pkey_table_len[attr->alt_port_num]) + return -EINVAL; + if (mlx4_set_path(dev, &attr->alt_ah_attr, &context->alt_path, attr->alt_port_num)) return -EINVAL; @@ -884,16 +892,19 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, /* * Before passing a kernel QP to the HW, make sure that the - * ownership bits of the send queue are set so that the - * hardware doesn't start processing stale work requests. + * ownership bits of the send queue are set and the SQ + * headroom is stamped so that the hardware doesn't start + * processing stale work requests. */ if (!ibqp->uobject && cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT) { struct mlx4_wqe_ctrl_seg *ctrl; int i; - for (i = 0; i < qp->sq.max; ++i) { + for (i = 0; i < qp->sq.wqe_cnt; ++i) { ctrl = get_send_wqe(qp, i); ctrl->owner_opcode = cpu_to_be32(1 << 31); + + stamp_send_wqe(qp, i); } } @@ -923,7 +934,9 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, */ if (is_qp0(dev, qp)) { if (cur_state != IB_QPS_RTR && new_state == IB_QPS_RTR) - init_port(dev, qp->port); + if (mlx4_INIT_PORT(dev->dev, qp->port)) + printk(KERN_WARNING "INIT_PORT failed for port %d\n", + qp->port); if (cur_state != IB_QPS_RESET && cur_state != IB_QPS_ERR && (new_state == IB_QPS_RESET || new_state == IB_QPS_ERR)) @@ -986,16 +999,17 @@ int mlx4_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, if (!ib_modify_qp_is_ok(cur_state, new_state, ibqp->qp_type, attr_mask)) goto out; - if ((attr_mask & IB_QP_PKEY_INDEX) && - attr->pkey_index >= dev->dev->caps.pkey_table_len) { - goto out; - } - if ((attr_mask & IB_QP_PORT) && (attr->port_num == 0 || attr->port_num > dev->dev->caps.num_ports)) { goto out; } + if (attr_mask & IB_QP_PKEY_INDEX) { + int p = attr_mask & IB_QP_PORT ? attr->port_num : qp->port; + if (attr->pkey_index >= dev->dev->caps.pkey_table_len[p]) + goto out; + } + if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC && attr->max_rd_atomic > dev->dev->caps.max_qp_init_rdma) { goto out; @@ -1037,6 +1051,7 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, u16 pkey; int send_size; int header_size; + int spc; int i; send_size = 0; @@ -1112,10 +1127,43 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, printk("\n"); } - inl->byte_count = cpu_to_be32(1 << 31 | header_size); - memcpy(inl + 1, sqp->header_buf, header_size); + /* + * Inline data segments may not cross a 64 byte boundary. If + * our UD header is bigger than the space available up to the + * next 64 byte boundary in the WQE, use two inline data + * segments to hold the UD header. + */ + spc = MLX4_INLINE_ALIGN - + ((unsigned long) (inl + 1) & (MLX4_INLINE_ALIGN - 1)); + if (header_size <= spc) { + inl->byte_count = cpu_to_be32(1 << 31 | header_size); + memcpy(inl + 1, sqp->header_buf, header_size); + i = 1; + } else { + inl->byte_count = cpu_to_be32(1 << 31 | spc); + memcpy(inl + 1, sqp->header_buf, spc); - return ALIGN(sizeof (struct mlx4_wqe_inline_seg) + header_size, 16); + inl = (void *) (inl + 1) + spc; + memcpy(inl + 1, sqp->header_buf + spc, header_size - spc); + /* + * Need a barrier here to make sure all the data is + * visible before the byte_count field is set. + * Otherwise the HCA prefetcher could grab the 64-byte + * chunk with this inline segment and get a valid (!= + * 0xffffffff) byte count but stale data, and end up + * processing generating a packet with bad headers. + * + * The first inline segment's byte_count field doesn't + * need a barrier, because it comes after a + * control/MLX segment and therefore is at an offset + * of 16 mod 64. + */ + wmb(); + inl->byte_count = cpu_to_be32(1 << 31 | (header_size - spc)); + i = 2; + } + + return ALIGN(i * sizeof (struct mlx4_wqe_inline_seg) + header_size, 16); } static int mlx4_wq_overflow(struct mlx4_ib_wq *wq, int nreq, struct ib_cq *ib_cq) @@ -1124,7 +1172,7 @@ static int mlx4_wq_overflow(struct mlx4_ib_wq *wq, int nreq, struct ib_cq *ib_cq struct mlx4_ib_cq *cq; cur = wq->head - wq->tail; - if (likely(cur + nreq < wq->max)) + if (likely(cur + nreq < wq->max_post)) return 0; cq = to_mcq(ib_cq); @@ -1132,7 +1180,7 @@ static int mlx4_wq_overflow(struct mlx4_ib_wq *wq, int nreq, struct ib_cq *ib_cq cur = wq->head - wq->tail; spin_unlock(&cq->lock); - return cur + nreq >= wq->max; + return cur + nreq >= wq->max_post; } int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, @@ -1165,8 +1213,8 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, goto out; } - ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.max - 1)); - qp->sq.wrid[ind & (qp->sq.max - 1)] = wr->wr_id; + ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.wqe_cnt - 1)); + qp->sq.wrid[ind & (qp->sq.wqe_cnt - 1)] = wr->wr_id; ctrl->srcrb_flags = (wr->send_flags & IB_SEND_SIGNALED ? @@ -1301,7 +1349,16 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, } ctrl->owner_opcode = mlx4_ib_opcode[wr->opcode] | - (ind & qp->sq.max ? cpu_to_be32(1 << 31) : 0); + (ind & qp->sq.wqe_cnt ? cpu_to_be32(1 << 31) : 0); + + /* + * We can improve latency by not stamping the last + * send queue WQE until after ringing the doorbell, so + * only stamp here if there are still more WQEs to post. + */ + if (wr->next) + stamp_send_wqe(qp, (ind + qp->sq_spare_wqes) & + (qp->sq.wqe_cnt - 1)); ++ind; } @@ -1324,6 +1381,9 @@ out: * and reach the HCA out of order. */ mmiowb(); + + stamp_send_wqe(qp, (ind + qp->sq_spare_wqes - 1) & + (qp->sq.wqe_cnt - 1)); } spin_unlock_irqrestore(&qp->rq.lock, flags); @@ -1344,7 +1404,7 @@ int mlx4_ib_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr, spin_lock_irqsave(&qp->rq.lock, flags); - ind = qp->rq.head & (qp->rq.max - 1); + ind = qp->rq.head & (qp->rq.wqe_cnt - 1); for (nreq = 0; wr; ++nreq, wr = wr->next) { if (mlx4_wq_overflow(&qp->rq, nreq, qp->ibqp.send_cq)) { @@ -1375,7 +1435,7 @@ int mlx4_ib_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr, qp->rq.wrid[ind] = wr->wr_id; - ind = (ind + 1) & (qp->rq.max - 1); + ind = (ind + 1) & (qp->rq.wqe_cnt - 1); } out: diff --git a/drivers/infiniband/hw/mlx4/user.h b/drivers/infiniband/hw/mlx4/user.h index 88c72d5..e2d11be 100644 --- a/drivers/infiniband/hw/mlx4/user.h +++ b/drivers/infiniband/hw/mlx4/user.h @@ -39,7 +39,7 @@ * Increment this value if any changes that break userspace ABI * compatibility are made. */ -#define MLX4_IB_UVERBS_ABI_VERSION 2 +#define MLX4_IB_UVERBS_ABI_VERSION 3 /* * Make sure that all structs defined in this file remain laid out so @@ -87,9 +87,10 @@ struct mlx4_ib_create_srq_resp { struct mlx4_ib_create_qp { __u64 buf_addr; __u64 db_addr; - __u8 log_sq_bb_count; - __u8 log_sq_stride; - __u8 reserved[6]; + __u8 log_sq_bb_count; + __u8 log_sq_stride; + __u8 sq_no_prefetch; + __u8 reserved[5]; }; #endif /* MLX4_IB_USER_H */ diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c index e7ca118..d2b0653 100644 --- a/drivers/net/mlx4/fw.c +++ b/drivers/net/mlx4/fw.c @@ -38,7 +38,9 @@ #include "icm.h" enum { - MLX4_COMMAND_INTERFACE_REV = 1 + MLX4_COMMAND_INTERFACE_MIN_REV = 2, + MLX4_COMMAND_INTERFACE_MAX_REV = 3, + MLX4_COMMAND_INTERFACE_NEW_PORT_CMDS = 3, }; extern void __buggy_use_of_MLX4_GET(void); @@ -107,6 +109,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) u16 size; u16 stat_rate; int err; + int i; #define QUERY_DEV_CAP_OUT_SIZE 0x100 #define QUERY_DEV_CAP_MAX_SRQ_SZ_OFFSET 0x10 @@ -176,7 +179,6 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) err = mlx4_cmd_box(dev, 0, mailbox->dma, 0, 0, MLX4_CMD_QUERY_DEV_CAP, MLX4_CMD_TIME_CLASS_A); - if (err) goto out; @@ -216,18 +218,10 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) dev_cap->max_rdma_global = 1 << (field & 0x3f); MLX4_GET(field, outbox, QUERY_DEV_CAP_ACK_DELAY_OFFSET); dev_cap->local_ca_ack_delay = field & 0x1f; - MLX4_GET(field, outbox, QUERY_DEV_CAP_MTU_WIDTH_OFFSET); - dev_cap->max_mtu = field >> 4; - dev_cap->max_port_width = field & 0xf; MLX4_GET(field, outbox, QUERY_DEV_CAP_VL_PORT_OFFSET); - dev_cap->max_vl = field >> 4; dev_cap->num_ports = field & 0xf; - MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GID_OFFSET); - dev_cap->max_gids = 1 << (field & 0xf); MLX4_GET(stat_rate, outbox, QUERY_DEV_CAP_RATE_SUPPORT_OFFSET); dev_cap->stat_rate_support = stat_rate; - MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_PKEY_OFFSET); - dev_cap->max_pkeys = 1 << (field & 0xf); MLX4_GET(dev_cap->flags, outbox, QUERY_DEV_CAP_FLAGS_OFFSET); MLX4_GET(field, outbox, QUERY_DEV_CAP_RSVD_UAR_OFFSET); dev_cap->reserved_uars = field >> 4; @@ -304,6 +298,42 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) MLX4_GET(dev_cap->max_icm_sz, outbox, QUERY_DEV_CAP_MAX_ICM_SZ_OFFSET); + if (dev->flags & MLX4_FLAG_OLD_PORT_CMDS) { + for (i = 1; i <= dev_cap->num_ports; ++i) { + MLX4_GET(field, outbox, QUERY_DEV_CAP_VL_PORT_OFFSET); + dev_cap->max_vl[i] = field >> 4; + MLX4_GET(field, outbox, QUERY_DEV_CAP_MTU_WIDTH_OFFSET); + dev_cap->max_mtu[i] = field >> 4; + dev_cap->max_port_width[i] = field & 0xf; + MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GID_OFFSET); + dev_cap->max_gids[i] = 1 << (field & 0xf); + MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_PKEY_OFFSET); + dev_cap->max_pkeys[i] = 1 << (field & 0xf); + } + } else { +#define QUERY_PORT_MTU_OFFSET 0x01 +#define QUERY_PORT_WIDTH_OFFSET 0x06 +#define QUERY_PORT_MAX_GID_PKEY_OFFSET 0x07 +#define QUERY_PORT_MAX_VL_OFFSET 0x0b + + for (i = 1; i <= dev_cap->num_ports; ++i) { + err = mlx4_cmd_box(dev, 0, mailbox->dma, i, 0, MLX4_CMD_QUERY_PORT, + MLX4_CMD_TIME_CLASS_B); + if (err) + goto out; + + MLX4_GET(field, outbox, QUERY_PORT_MTU_OFFSET); + dev_cap->max_mtu[i] = field & 0xf; + MLX4_GET(field, outbox, QUERY_PORT_WIDTH_OFFSET); + dev_cap->max_port_width[i] = field & 0xf; + MLX4_GET(field, outbox, QUERY_PORT_MAX_GID_PKEY_OFFSET); + dev_cap->max_gids[i] = 1 << (field >> 4); + dev_cap->max_pkeys[i] = 1 << (field & 0xf); + MLX4_GET(field, outbox, QUERY_PORT_MAX_VL_OFFSET); + dev_cap->max_vl[i] = field & 0xf; + } + } + if (dev_cap->bmme_flags & 1) mlx4_dbg(dev, "Base MM extensions: yes " "(flags %d, rsvd L_Key %08x)\n", @@ -338,8 +368,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) mlx4_dbg(dev, "Max CQEs: %d, max WQEs: %d, max SRQ WQEs: %d\n", dev_cap->max_cq_sz, dev_cap->max_qp_sz, dev_cap->max_srq_sz); mlx4_dbg(dev, "Local CA ACK delay: %d, max MTU: %d, port width cap: %d\n", - dev_cap->local_ca_ack_delay, 128 << dev_cap->max_mtu, - dev_cap->max_port_width); + dev_cap->local_ca_ack_delay, 128 << dev_cap->max_mtu[1], + dev_cap->max_port_width[1]); mlx4_dbg(dev, "Max SQ desc size: %d, max SQ S/G: %d\n", dev_cap->max_sq_desc_sz, dev_cap->max_sq_sg); mlx4_dbg(dev, "Max RQ desc size: %d, max RQ S/G: %d\n", @@ -491,7 +521,8 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev) ((fw_ver & 0x0000ffffull) << 16); MLX4_GET(cmd_if_rev, outbox, QUERY_FW_CMD_IF_REV_OFFSET); - if (cmd_if_rev != MLX4_COMMAND_INTERFACE_REV) { + if (cmd_if_rev < MLX4_COMMAND_INTERFACE_MIN_REV || + cmd_if_rev > MLX4_COMMAND_INTERFACE_MAX_REV) { mlx4_err(dev, "Installed FW has unsupported " "command interface revision %d.\n", cmd_if_rev); @@ -499,12 +530,15 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev) (int) (dev->caps.fw_ver >> 32), (int) (dev->caps.fw_ver >> 16) & 0xffff, (int) dev->caps.fw_ver & 0xffff); - mlx4_err(dev, "This driver version supports only revision %d.\n", - MLX4_COMMAND_INTERFACE_REV); + mlx4_err(dev, "This driver version supports only revisions %d to %d.\n", + MLX4_COMMAND_INTERFACE_MIN_REV, MLX4_COMMAND_INTERFACE_MAX_REV); err = -ENODEV; goto out; } + if (cmd_if_rev < MLX4_COMMAND_INTERFACE_NEW_PORT_CMDS) + dev->flags |= MLX4_FLAG_OLD_PORT_CMDS; + MLX4_GET(lg, outbox, QUERY_FW_MAX_CMD_OFFSET); cmd->max_cmds = 1 << lg; @@ -708,13 +742,15 @@ int mlx4_INIT_HCA(struct mlx4_dev *dev, struct mlx4_init_hca_param *param) return err; } -int mlx4_INIT_PORT(struct mlx4_dev *dev, struct mlx4_init_port_param *param, int port) +int mlx4_INIT_PORT(struct mlx4_dev *dev, int port) { struct mlx4_cmd_mailbox *mailbox; u32 *inbox; int err; u32 flags; + u16 field; + if (dev->flags & MLX4_FLAG_OLD_PORT_CMDS) { #define INIT_PORT_IN_SIZE 256 #define INIT_PORT_FLAGS_OFFSET 0x00 #define INIT_PORT_FLAG_SIG (1 << 18) @@ -729,32 +765,32 @@ int mlx4_INIT_PORT(struct mlx4_dev *dev, struct mlx4_init_port_param *param, int #define INIT_PORT_NODE_GUID_OFFSET 0x18 #define INIT_PORT_SI_GUID_OFFSET 0x20 - mailbox = mlx4_alloc_cmd_mailbox(dev); - if (IS_ERR(mailbox)) - return PTR_ERR(mailbox); - inbox = mailbox->buf; + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return PTR_ERR(mailbox); + inbox = mailbox->buf; - memset(inbox, 0, INIT_PORT_IN_SIZE); + memset(inbox, 0, INIT_PORT_IN_SIZE); - flags = 0; - flags |= param->set_guid0 ? INIT_PORT_FLAG_G0 : 0; - flags |= param->set_node_guid ? INIT_PORT_FLAG_NG : 0; - flags |= param->set_si_guid ? INIT_PORT_FLAG_SIG : 0; - flags |= (param->vl_cap & 0xf) << INIT_PORT_VL_SHIFT; - flags |= (param->port_width_cap & 0xf) << INIT_PORT_PORT_WIDTH_SHIFT; - MLX4_PUT(inbox, flags, INIT_PORT_FLAGS_OFFSET); + flags = 0; + flags |= (dev->caps.vl_cap[port] & 0xf) << INIT_PORT_VL_SHIFT; + flags |= (dev->caps.port_width_cap[port] & 0xf) << INIT_PORT_PORT_WIDTH_SHIFT; + MLX4_PUT(inbox, flags, INIT_PORT_FLAGS_OFFSET); - MLX4_PUT(inbox, param->mtu, INIT_PORT_MTU_OFFSET); - MLX4_PUT(inbox, param->max_gid, INIT_PORT_MAX_GID_OFFSET); - MLX4_PUT(inbox, param->max_pkey, INIT_PORT_MAX_PKEY_OFFSET); - MLX4_PUT(inbox, param->guid0, INIT_PORT_GUID0_OFFSET); - MLX4_PUT(inbox, param->node_guid, INIT_PORT_NODE_GUID_OFFSET); - MLX4_PUT(inbox, param->si_guid, INIT_PORT_SI_GUID_OFFSET); + field = 128 << dev->caps.mtu_cap[port]; + MLX4_PUT(inbox, field, INIT_PORT_MTU_OFFSET); + field = dev->caps.gid_table_len[port]; + MLX4_PUT(inbox, field, INIT_PORT_MAX_GID_OFFSET); + field = dev->caps.pkey_table_len[port]; + MLX4_PUT(inbox, field, INIT_PORT_MAX_PKEY_OFFSET); - err = mlx4_cmd(dev, mailbox->dma, port, 0, MLX4_CMD_INIT_PORT, - MLX4_CMD_TIME_CLASS_A); + err = mlx4_cmd(dev, mailbox->dma, port, 0, MLX4_CMD_INIT_PORT, + MLX4_CMD_TIME_CLASS_A); - mlx4_free_cmd_mailbox(dev, mailbox); + mlx4_free_cmd_mailbox(dev, mailbox); + } else + err = mlx4_cmd(dev, 0, port, 0, MLX4_CMD_INIT_PORT, + MLX4_CMD_TIME_CLASS_A); return err; } diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h index 2616fa5..296254a 100644 --- a/drivers/net/mlx4/fw.h +++ b/drivers/net/mlx4/fw.h @@ -59,13 +59,13 @@ struct mlx4_dev_cap { int max_responder_per_qp; int max_rdma_global; int local_ca_ack_delay; - int max_mtu; - int max_port_width; - int max_vl; int num_ports; - int max_gids; + int max_mtu[MLX4_MAX_PORTS + 1]; + int max_port_width[MLX4_MAX_PORTS + 1]; + int max_vl[MLX4_MAX_PORTS + 1]; + int max_gids[MLX4_MAX_PORTS + 1]; + int max_pkeys[MLX4_MAX_PORTS + 1]; u16 stat_rate_support; - int max_pkeys; u32 flags; int reserved_uars; int uar_size; diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index d417293..41eafeb 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -88,6 +88,7 @@ static struct mlx4_profile default_profile = { static int __devinit mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) { int err; + int i; err = mlx4_QUERY_DEV_CAP(dev, dev_cap); if (err) { @@ -117,11 +118,15 @@ static int __devinit mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev } dev->caps.num_ports = dev_cap->num_ports; + for (i = 1; i <= dev->caps.num_ports; ++i) { + dev->caps.vl_cap[i] = dev_cap->max_vl[i]; + dev->caps.mtu_cap[i] = dev_cap->max_mtu[i]; + dev->caps.gid_table_len[i] = dev_cap->max_gids[i]; + dev->caps.pkey_table_len[i] = dev_cap->max_pkeys[i]; + dev->caps.port_width_cap[i] = dev_cap->max_port_width[i]; + } + dev->caps.num_uars = dev_cap->uar_size / PAGE_SIZE; - dev->caps.vl_cap = dev_cap->max_vl; - dev->caps.mtu_cap = dev_cap->max_mtu; - dev->caps.gid_table_len = dev_cap->max_gids; - dev->caps.pkey_table_len = dev_cap->max_pkeys; dev->caps.local_ca_ack_delay = dev_cap->local_ca_ack_delay; dev->caps.bf_reg_size = dev_cap->bf_reg_size; dev->caps.bf_regs_per_page = dev_cap->bf_regs_per_page; @@ -148,7 +153,6 @@ static int __devinit mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev dev->caps.reserved_mrws = dev_cap->reserved_mrws; dev->caps.reserved_uars = dev_cap->reserved_uars; dev->caps.reserved_pds = dev_cap->reserved_pds; - dev->caps.port_width_cap = dev_cap->max_port_width; dev->caps.mtt_entry_sz = MLX4_MTT_ENTRY_PER_SEG * dev_cap->mtt_entry_sz; dev->caps.page_size_cap = ~(u32) (dev_cap->min_page_sz - 1); dev->caps.flags = dev_cap->flags; diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h index 4fb552d..7d1eaa9 100644 --- a/include/linux/mlx4/cmd.h +++ b/include/linux/mlx4/cmd.h @@ -54,6 +54,7 @@ enum { MLX4_CMD_INIT_PORT = 0x9, MLX4_CMD_CLOSE_PORT = 0xa, MLX4_CMD_QUERY_HCA = 0xb, + MLX4_CMD_QUERY_PORT = 0x43, MLX4_CMD_SET_PORT = 0xc, MLX4_CMD_ACCESS_DDR = 0x2e, MLX4_CMD_MAP_ICM = 0xffa, diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 8c5f8fd..b372f59 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -41,6 +41,7 @@ enum { MLX4_FLAG_MSI_X = 1 << 0, + MLX4_FLAG_OLD_PORT_CMDS = 1 << 1, }; enum { @@ -131,10 +132,10 @@ enum { struct mlx4_caps { u64 fw_ver; int num_ports; - int vl_cap; - int mtu_cap; - int gid_table_len; - int pkey_table_len; + int vl_cap[MLX4_MAX_PORTS + 1]; + int mtu_cap[MLX4_MAX_PORTS + 1]; + int gid_table_len[MLX4_MAX_PORTS + 1]; + int pkey_table_len[MLX4_MAX_PORTS + 1]; int local_ca_ack_delay; int num_uars; int bf_reg_size; @@ -174,7 +175,7 @@ struct mlx4_caps { u32 page_size_cap; u32 flags; u16 stat_rate_support; - u8 port_width_cap; + u8 port_width_cap[MLX4_MAX_PORTS + 1]; }; struct mlx4_buf_list { @@ -322,7 +323,7 @@ int mlx4_srq_alloc(struct mlx4_dev *dev, u32 pdn, struct mlx4_mtt *mtt, void mlx4_srq_free(struct mlx4_dev *dev, struct mlx4_srq *srq); int mlx4_srq_arm(struct mlx4_dev *dev, struct mlx4_srq *srq, int limit_watermark); -int mlx4_INIT_PORT(struct mlx4_dev *dev, struct mlx4_init_port_param *param, int port); +int mlx4_INIT_PORT(struct mlx4_dev *dev, int port); int mlx4_CLOSE_PORT(struct mlx4_dev *dev, int port); int mlx4_multicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16]); diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h index 9eeb61a..10c57d2 100644 --- a/include/linux/mlx4/qp.h +++ b/include/linux/mlx4/qp.h @@ -269,6 +269,10 @@ struct mlx4_wqe_data_seg { __be64 addr; }; +enum { + MLX4_INLINE_ALIGN = 64, +}; + struct mlx4_wqe_inline_seg { __be32 byte_count; }; From swise at opengridcomputing.com Mon Jun 18 09:35:51 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 18 Jun 2007 11:35:51 -0500 Subject: [ofa-general] conf call today Message-ID: <4676B467.2040606@opengridcomputing.com> What is the info for today's ofed call? Thanks, Steve. From mshefty at ichips.intel.com Mon Jun 18 09:42:55 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 18 Jun 2007 09:42:55 -0700 Subject: [ofa-general] disconnect implementation for rdma cm unconnected datagram service In-Reply-To: References: Message-ID: <4676B60F.3090002@ichips.intel.com> Or Gerlitz wrote: > Looking on cm_sidr_rep_handler we see that the cm id state > is reseted to IB_CM_IDLE, and on the other hand ib_send_cm_dreq > returns -EINVAL if the id state is not IB_CM_ESTABLISHED. I gueess > this means that rdma_disconnect on RDMA_PS_UDP would never work? Correct - there isn't a disconnect for UDP. > Thinking on remote qp/lid change, the equivalent I see for UDP based apps, > is that a remote qp/lid change would have been caught by the local stack > neighbouring system since it sends few unicast arps probes and the re-issues > a broadcast arp from which the new HW address (qpn / gid --> lid) would be learned. > > What you think would be the correct way to solve that for rdmacm based apps? I don't know that we can do anything about a QP change. > is there a way for the RDMA/IB stack level to provide the solution? we were Once the inform_info patches are in, we might be able to hook into that to at least provide notification that the remote address has changed. I don't think there's a LID change notice, though, only GID IN/OUT. LID changes would be difficult to hide from the app anyway, since the app must re-create their address vector. If we ever go as far as adding an rdma_send() call, we might be able to hide it better. > I guess that remote lid change can be emulated as disconnect if the rdmacm > would listen on IN/OUT traps, but the question if what can we do about the > remote process qp, eg in the case the process dies and then comes back again etc. I think the current solution is that the app must detect that they are no longer getting responses from the remote side and try to re-'connect'. I need to give this more thought to determine if there's anything that we can do here. (This seems hard without the rdma_cm controlling the QP and CQs.) Do you have any ideas? - Sean From jsquyres at cisco.com Mon Jun 18 09:58:54 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Mon, 18 Jun 2007 12:58:54 -0400 Subject: [ofa-general] conf call today In-Reply-To: <4676B467.2040606@opengridcomputing.com> References: <4676B467.2040606@opengridcomputing.com> Message-ID: I sent the info earlier this morning. But regardless, the call was over in about 8 minutes. I assume Tziporet will send out the minutes shortly. On Jun 18, 2007, at 12:35 PM, Steve Wise wrote: > What is the info for today's ofed call? > > Thanks, > > Steve. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Cisco Systems From or.gerlitz at gmail.com Mon Jun 18 11:54:10 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Mon, 18 Jun 2007 21:54:10 +0300 Subject: [ofa-general] disconnect implementation for rdma cm unconnected datagram service In-Reply-To: <4676B60F.3090002@ichips.intel.com> References: <4676B60F.3090002@ichips.intel.com> Message-ID: <15ddcffd0706181154m26a61ad5u6fe82ff1df19ff4d@mail.gmail.com> On 6/18/07, Sean Hefty wrote: > > Or Gerlitz wrote: > > Looking on cm_sidr_rep_handler we see that the cm id state > > is reseted to IB_CM_IDLE, and on the other hand ib_send_cm_dreq > > returns -EINVAL if the id state is not IB_CM_ESTABLISHED. I gueess > > this means that rdma_disconnect on RDMA_PS_UDP would never work? > > Correct - there isn't a disconnect for UDP. was that done on purpose? is there (eg implementation or spec related) any problem to send DREQ through the CM? > Thinking on remote qp/lid change, the equivalent I see for UDP based apps, > > is that a remote qp/lid change would have been caught by the local stack > > neighbouring system since it sends few unicast arps probes and the > re-issues > > a broadcast arp from which the new HW address (qpn / gid --> lid) would > be learned. > > > > What you think would be the correct way to solve that for rdmacm based > apps? > > I don't know that we can do anything about a QP change. Just to emphesize, typical QP change here, is when a remote server process exits and then spawned again so now the client has to reconnect else all its packets go nowhere. > > is there a way for the RDMA/IB stack level to provide the solution? we > were > > Once the inform_info patches are in, we might be able to hook into that > to at least provide notification that the remote address has changed. I > don't think there's a LID change notice, though, only GID IN/OUT. LID > changes would be difficult to hide from the app anyway, since the app > must re-create their address vector. I did not mean to totally hide from the app (eg to the extent of no need to re create the address vector), I just wonder if the mechanics to realize that an unconnected rdmacm id is not "connected" any more can be fully implemented within the rdmacm. > If we ever go as far as adding an rdma_send() call, we might be able to > hide it better. I don't think we want to go there. > > I guess that remote lid change can be emulated as disconnect if the > rdmacm > > would listen on IN/OUT traps, but the question if what can we do about > the > > remote process qp, eg in the case the process dies and then comes back > again etc. > > I think the current solution is that the app must detect that they are > no longer getting responses from the remote side and try to > re-'connect'. I need to give this more thought to determine if there's > anything that we can do here. (This seems hard without the rdma_cm > controlling the QP and CQs.) Do you have any ideas? Indeed, this is somehow not easily possible in all cases for us, as we are not always allowed to add a wire protocol on --this-- QP, but we are looking into that. Other solution we consider is "invalidate" the app level "address handle" (IB AH + remote QPN) every ten seconds or so and then re-connect, but this is not very much efficient. Or. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Mon Jun 18 12:23:39 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 18 Jun 2007 12:23:39 -0700 Subject: [ofa-general] disconnect implementation for rdma cm unconnected datagram service In-Reply-To: <15ddcffd0706181154m26a61ad5u6fe82ff1df19ff4d@mail.gmail.com> References: <4676B60F.3090002@ichips.intel.com> <15ddcffd0706181154m26a61ad5u6fe82ff1df19ff4d@mail.gmail.com> Message-ID: <4676DBBB.1010202@ichips.intel.com> > was that done on purpose? is there (eg implementation or spec related) > any problem to send DREQ through the CM? This is spec related - DREQ doesn't apply to UD QPs - only connected. > I did not mean to totally hide from the app (eg to the extent of no need > to re create the address vector), I just wonder if the mechanics to > realize that an unconnected rdmacm id is not "connected" any more can be > fully implemented within the rdmacm. I don't see a way to do this underneath within the existing spec. If the IB CM tracked SIDR lookups, maintaining state information, then we could make use of a DREQ type command to notify the remote side the the local QP is going away. But this is outside of the spec, plus doesn't solve all of the issues (like a remote system reboot). I don't think there's even an existing trap that we can use. > Indeed, this is somehow not easily possible in all cases for us, as we > are not always allowed to add a wire protocol on --this-- QP, but we are > looking into that. Other solution we consider is "invalidate" the app > level "address handle" (IB AH + remote QPN) every ten seconds or so and > then re-connect, but this is not very much efficient. How does IPoIB handle this? Does it just time out the ARP entries every x minutes, which requires a new lookup? Is there some way that you could map LIDs to QPNs, and use the SLID/src_qp data in the work completion to see if a remote service has moved QPs? - Sean From or.gerlitz at gmail.com Mon Jun 18 13:46:33 2007 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Mon, 18 Jun 2007 23:46:33 +0300 Subject: [ofa-general] disconnect implementation for rdma cm unconnected datagram service In-Reply-To: <4676DBBB.1010202@ichips.intel.com> References: <4676B60F.3090002@ichips.intel.com> <15ddcffd0706181154m26a61ad5u6fe82ff1df19ff4d@mail.gmail.com> <4676DBBB.1010202@ichips.intel.com> Message-ID: <15ddcffd0706181346r6c38fcc2qf8d050f88a9e4ddf@mail.gmail.com> On 6/18/07, Sean Hefty wrote: > > > was that done on purpose? is there (eg implementation or spec related) > > any problem to send DREQ through the CM? > > This is spec related - DREQ doesn't apply to UD QPs - only connected. I see. > I did not mean to totally hide from the app (eg to the extent of no need > > to re create the address vector), I just wonder if the mechanics to > > realize that an unconnected rdmacm id is not "connected" any more can be > > fully implemented within the rdmacm. > > I don't see a way to do this underneath within the existing spec. If > the IB CM tracked SIDR lookups, maintaining state information, then we > could make use of a DREQ type command to notify the remote side the the > local QP is going away. But this is outside of the spec, plus doesn't > solve all of the issues (like a remote system reboot). > > I don't think there's even an existing trap that we can use. I see. > Indeed, this is somehow not easily possible in all cases for us, as we > > are not always allowed to add a wire protocol on --this-- QP, but we are > > looking into that. Other solution we consider is "invalidate" the app > > level "address handle" (IB AH + remote QPN) every ten seconds or so and > > then re-connect, but this is not very much efficient. > > How does IPoIB handle this? Does it just time out the ARP entries every > x minutes, which requires a new lookup? its not IPoIB but rather the neighbouring subsystem of the IP stack, it sends unicast arp probes every n seconds, and if m probes fail, it sends a broadcast arp. n and m are parameters that can be changed where I think the default is n=20sec m=3 Is there some way that you could map LIDs to QPNs, and use the > SLID/src_qp data in the work completion to see if a remote service has > moved QPs? if the communication pattern is that both A sends to B and B sends to A, then there is some path to follow here, namely for each packet (work completion) A gets to B it checks if B's QPN has been changes, and if yes, it does re-connect. Or -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Mon Jun 18 16:40:35 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 18 Jun 2007 16:40:35 -0700 Subject: [ofa-general] hang at module removal with local sa patches applied In-Reply-To: <20070618114843.GA25428@mellanox.co.il> References: <20070618114843.GA25428@mellanox.co.il> Message-ID: <467717F3.9020806@ichips.intel.com> > [14897.168277] local_sa D 0000000000000001 0 8361 2 (L-TLB) > [14897.168280] ffff81007d0d3c10 0000000000000046 0000000000000000 800000ce00000000 > [14897.168283] 84000b0000000000 000000000000000a ffff81007e8f3420 ffff81007ff1f4a0 > [14897.168287] 00000d8431895ed4 0000000000000d33 ffff81007e8f35d0 800000ce00000000 > [14897.168290] Call Trace: > [14897.168294] [] __mutex_lock_slowpath+0x69/0xaa > [14897.168303] [] :ib_sa:port_work_handler+0x0/0x34 > [14897.168306] [] mutex_lock+0xe/0x10 > [14897.168311] [] :ib_sa:port_work_handler+0x1c/0x34 > [14897.168314] [] run_workqueue+0x85/0x10f > [14897.168317] [] flush_cpu_workqueue+0x28/0x7b > [14897.168320] [] flush_workqueue+0x43/0x5d > [14897.168326] [] :ib_sa:cleanup_port+0x25/0x7b > [14897.168331] [] :ib_sa:process_updates+0x61/0x336 > [14897.168335] [] thread_return+0x0/0xea > [14897.168341] [] :ib_sa:add_update+0x7a/0x83 > [14897.168347] [] :ib_sa:port_work_handler+0x0/0x34 > [14897.168352] [] :ib_sa:refresh_port_db+0x36/0x3b > [14897.168358] [] :ib_sa:port_work_handler+0x24/0x34 > [14897.168361] [] run_workqueue+0x85/0x10f > [14897.168363] [] worker_thread+0x0/0xe7 > [14897.168366] [] worker_thread+0xdc/0xe7 > [14897.168368] [] autoremove_wake_function+0x0/0x38 > [14897.168371] [] kthread+0x49/0x76 > [14897.168374] [] child_rip+0xa/0x12 > [14897.168377] [] kthread+0x0/0x76 > [14897.168379] [] child_rip+0x0/0x12 Reading through the code, I see two potential issues: * It's possible for flush_workqueue to be called from the workqueue thread. * We hold a mutex when calling flush_workqueue, and a queued work item will try to acquire that same mutex. I'll need to spend some time studying the thread synchronization to fix this. - Sean From atmdepartment at sys-vibes.com Mon Jun 18 17:13:06 2007 From: atmdepartment at sys-vibes.com (ATM OFFICE) Date: Mon, 18 Jun 2007 19:13:06 -0500 Subject: [ofa-general] ATM-822 Message-ID: <20070618191306.b7io8g71cg48swgc@64.40.144.173> OFFICE OF THE DIRECTOR OF OPERATION INTERNATIONAL CREDIT SETTLEMENT, ATM PAYMENT DEPARTMENT (CBN) CENTRAL BANK OF NIGERIA . DATE:18/06/2007 VERY URGENT ATTENTION!!! DEAR: BENEFICIARY This is to officially inform you that we the international credit settlement of central bank of Nigeria has verified your contract/inheritance file and found out that why you have not received your part payment of $16 million is because you have not fulfilled the obligations given to you in respect of your contract/inheritance payment. Secondly we have been informed that for you not to deal with the non officials in the bank and your entire entire attempt to secure the release of your fund to you will be in vane. So we wish to advise you that such an illegal act like these have to stop if you wish to receive your payment since we have decided to bring a solution to your problem. Right now we have arranged your payment through our swift card payment center Asia pacific that is the latest instruction from MR. PRESIDENT. CHIEF OLUSEGUN OBASANJO (GCFR) FEDERAL REPUBLIC OF NIGERIA . AND EFCC CHAIRMAN MALLAM NUHU RIBADU, which will not involve any fraudulent act or money laundering and because the CENTRAL BANK OF NIGERIA is running for the yearly payment thatâ??s why the order is given, As well as the INTERPOL and FBI in conjunction with HOMELAND SECURITY so you have absolutely nothing to be afraid of and due the previous scam and fraud act from imposters in our country we have mapped out that this card is sent to your personal address so, This card center will send you an ATM CARD which you will use to withdraw your money in any ATM MACHINE in any part of the world, but the maximum is twenty thousand dollars per day, so if you like to receive your fund this way please let us know by contacting the card payment center officer Dr Daniel Watac on his, Email address:atmoffice at mailmeasap.com Telephone/fax line: +2348080556207 And also send the following information: 1. Your full name 2. Phone and fax number 3. Addresses were you want them to send the atm card 4. Your age and current occupation 5. Attach copy of your identification The ATM CARD PAYMENT CENTER has been mandated to issue out $6,000,000.00 as part payment for this fiscal year 2006/2007 payment year. Also for your information, you have to stop any further communication with any other person(s) or office(s) to avoid any hitches in receiving your Payment. For oral discussion, call and email me back as soon as you receive this important message for further direction and also update me on any development from the above-mentioned office. Regards, Dr. Daniel Watac Director Atm Payment Department (CBN ATM OFFICE) From mst at dev.mellanox.co.il Mon Jun 18 22:58:41 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 19 Jun 2007 08:58:41 +0300 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: References: Message-ID: <20070619055841.GC7069@mellanox.co.il> > Quoting Roland Dreier : > Subject: [GIT PULL] please pull infiniband.git > > Linus, please pull from > > master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus > > This tree is also available from kernel.org mirrors at: > > git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus > > This will get a bunch of fixes to the new mlx4 driver. This pull is > bigger than I would have liked after -rc5, but Mellanox discovered a > problem that required a firmware change and also some driver help to > fix. Since this is a new driver for 2.6.22, which is for new hardware > that no one has in production yet, I think it's better to merge this > early even if it risks introducing a bug, rather than have a driver > in 2.6.22 that doesn't work at all with current adapter firmware. > > Jack Morgenstein (1): > IB/mlx4: Handle buffer wraparound in __mlx4_ib_cq_clean() > > Roland Dreier (6): > IB/mlx4: Fix handling of wq->tail for send completions > IB/mlx4: Fix warning in rounding up queue sizes > IB/mlx4: Handle new FW requirement for send request prefetching > IB/mlx4: Get rid of max_inline_data calculation > IB/mlx4: Handle FW command interface rev 3 > IB/mlx4: Make sure inline data segments don't cross a 64 byte boundary BTW, have you seen the patch for ipoib cm crasher race? I think we need it in 2.6.22 too. -- MST From erezz at voltaire.com Mon Jun 18 23:20:33 2007 From: erezz at voltaire.com (Erez Zilber) Date: Tue, 19 Jun 2007 09:20:33 +0300 Subject: [ofa-general] [Fwd: [PATCH 2/2] iscsi_iser: convert to use the data buffer accessors] Message-ID: <467775B1.4000208@voltaire.com> Roland, Can you add the patch below to 2.6.23? Thanks, Erez -------- Original Message -------- Subject: [PATCH 2/2] iscsi_iser: convert to use the data buffer accessors Date: Fri, 1 Jun 2007 12:56:21 +0300 From: FUJITA Tomonori Reply-To: To: CC: , , , References: iscsi_iser: convert to use the data buffer accessors - remove the unnecessary map_single path. - convert to use the new accessors for the sg lists and the parameters. TODO: use scsi_for_each_sg(). Signed-off-by: FUJITA Tomonori Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iscsi_iser.c | 4 ++-- drivers/infiniband/ulp/iser/iser_initiator.c | 14 ++++---------- 2 files changed, 6 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index 1bf173d..effdee2 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -210,10 +210,10 @@ iscsi_iser_ctask_xmit(struct iscsi_conn *conn, int error = 0; if (ctask->sc->sc_data_direction == DMA_TO_DEVICE) { - BUG_ON(ctask->sc->request_bufflen == 0); + BUG_ON(scsi_bufflen(ctask->sc) == 0); debug_scsi("cmd [itt %x total %d imm %d unsol_data %d\n", - ctask->itt, ctask->sc->request_bufflen, + ctask->itt, scsi_bufflen(ctask->sc), ctask->imm_count, ctask->unsol_count); } diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index 3651072..9ea5b9a 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -351,18 +351,12 @@ int iser_send_command(struct iscsi_conn *conn, else data_buf = &iser_ctask->data[ISER_DIR_OUT]; - if (sc->use_sg) { /* using a scatter list */ - data_buf->buf = sc->request_buffer; - data_buf->size = sc->use_sg; - } else if (sc->request_bufflen) { - /* using a single buffer - convert it into one entry SG */ - sg_init_one(&data_buf->sg_single, - sc->request_buffer, sc->request_bufflen); - data_buf->buf = &data_buf->sg_single; - data_buf->size = 1; + if (scsi_sg_count(sc)) { /* using a scatter list */ + data_buf->buf = scsi_sglist(sc); + data_buf->size = scsi_sg_count(sc); } - data_buf->data_len = sc->request_bufflen; + data_buf->data_len = scsi_bufflen(sc); if (hdr->flags & ISCSI_FLAG_CMD_READ) { err = iser_prepare_read_cmd(ctask, edtl); -- 1.4.4.4 --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi at googlegroups.com To unsubscribe from this group, send email to open-iscsi-unsubscribe at googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~--- From fujita.tomonori at lab.ntt.co.jp Mon Jun 18 23:32:04 2007 From: fujita.tomonori at lab.ntt.co.jp (FUJITA Tomonori) Date: Tue, 19 Jun 2007 15:32:04 +0900 Subject: [ofa-general] Re: [Fwd: [PATCH 2/2] iscsi_iser: convert to use the data buffer accessors] In-Reply-To: <467775B1.4000208@voltaire.com> References: <467775B1.4000208@voltaire.com> Message-ID: <20070619153204D.fujita.tomonori@lab.ntt.co.jp> From: Erez Zilber Subject: [Fwd: [PATCH 2/2] iscsi_iser: convert to use the data buffer accessors] Date: Tue, 19 Jun 2007 09:20:33 +0300 > > Roland, > > Can you add the patch below to 2.6.23? Thanks, but the patch was already added to James' scsi-misc tree (for 2.6.23). It's easier to add this to his tree since it depends on the patch to add the accessors in his tree. So you don't worry about it. From erezz at voltaire.com Mon Jun 18 23:47:15 2007 From: erezz at voltaire.com (Erez Zilber) Date: Tue, 19 Jun 2007 09:47:15 +0300 Subject: [ofa-general] Re: [Fwd: [PATCH 2/2] iscsi_iser: convert to use the data buffer accessors] In-Reply-To: <20070619153204D.fujita.tomonori@lab.ntt.co.jp> References: <467775B1.4000208@voltaire.com> <20070619153204D.fujita.tomonori@lab.ntt.co.jp> Message-ID: <46777BF3.7090805@voltaire.com> FUJITA Tomonori wrote: > From: Erez Zilber > Subject: [Fwd: [PATCH 2/2] iscsi_iser: convert to use the data buffer > accessors] > Date: Tue, 19 Jun 2007 09:20:33 +0300 > > > > > Roland, > > > > Can you add the patch below to 2.6.23? > > Thanks, but the patch was already added to James' scsi-misc tree (for > 2.6.23). It's easier to add this to his tree since it depends on the > patch to add the accessors in his tree. So you don't worry about it. > _______________________________________________ > OK. Roland - please ignore this patch. Erez From Lawandasigmaawhile at mbta.com Tue Jun 19 00:01:48 2007 From: Lawandasigmaawhile at mbta.com (Lina Rowell) Date: Tue, 19 Jun 2007 00:01:48 -0700 (PDT) Subject: [ofa-general] Would you like to be paying less each month? Message-ID: <20070619070149.A114DE603CA@openfabrics.org> As a business you have been preapproved to receive 43995 USD TODAY! No hassle at all, completely unsecured. There are no hidden costs or fees. Worried that your credit is less than perfect? Not an issue. Give us a ring, now.. 877.208.5661 Turn your dream, into a reality, is that not worth two minutes of your time? 877.208.5661 Eat it, Annie, suck on it, go on and eat it, be a Do-Bee and eat your book all up. But sometimes the sounds - like the pain - faded, and then there was only the haze. Lina Cullen From vlad at lists.openfabrics.org Tue Jun 19 02:42:50 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Tue, 19 Jun 2007 02:42:50 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070619-0200 daily build status Message-ID: <20070619094250.A068AE6083B@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.12 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.13 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ia64 with linux-2.6.14 Passed on powerpc with linux-2.6.15 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.14 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.15 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.12 Passed on powerpc with linux-2.6.13 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on x86_64 with linux-2.6.19 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From rdreier at cisco.com Tue Jun 19 03:09:02 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 19 Jun 2007 03:09:02 -0700 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git References: <20070619055841.GC7069@mellanox.co.il> Message-ID: > BTW, have you seen the patch for ipoib cm crasher race? > I think we need it in 2.6.22 too. I saw the discussion and it does seem like something to fix for 2.6.22. But I didn't see a final conclusion on which patch to use, and I don't think I ever saw a patch with a good changelog and signed-off-by line either... please resend if I missed it. - R. From mst at dev.mellanox.co.il Tue Jun 19 03:36:10 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 19 Jun 2007 13:36:10 +0300 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: References: <20070619055841.GC7069@mellanox.co.il> Message-ID: <20070619103610.GA15224@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [GIT PULL] please pull infiniband.git > > > BTW, have you seen the patch for ipoib cm crasher race? > > I think we need it in 2.6.22 too. > > I saw the discussion and it does seem like something to fix for > 2.6.22. But I didn't see a final conclusion on which patch to use, > and I don't think I ever saw a patch with a good changelog and > signed-off-by line either... please resend if I missed it. Resending. -- MST From mst at dev.mellanox.co.il Tue Jun 19 03:40:41 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 19 Jun 2007 13:40:41 +0300 Subject: [ofa-general] Re: [PATCH for-2.6.22] ipoib/cm: initialize RX before moving QP to RTR In-Reply-To: <20070618083240.GK14335@mellanox.co.il> References: <4672BE23.3050809@ichips.intel.com> <20070618083240.GK14335@mellanox.co.il> Message-ID: <20070619104041.GB15224@mellanox.co.il> Fix a crasher bug in IPoIB CM: once QP is in RTR, an RX completion (and even an asynchronous error) might be observed on this QP, so we have to initialize all RX fields beforehand. As an optimization (since modify_qp might take a long time), the jiffies update done when moving RX to the passive_ids list is also left in place to reduce the chance of the RX being mis-detected as stale. This fixes bug Signed-off-by: Michael S. Tsirkin --- Resending - Roland, is the changelog OK? Please consider this bugfix for 2.6.22. > > Quoting Woodruff, Robert J : > > Subject: RE: [ofa-general] crash in ipoib > > > > Sean wrote, > > >> And here's a version with error handling fixed. > > >> Sean, does this solve your crash? > > > > >We've been running this patch since yesterday and haven't seen any > > >crashes. We'll continue testing this over the week-end. > > > > >- Sean > > > > This looks like it fixed the panic. > > > > Should we try to put out a new RC with this latest ipoib fix ? > > I really think we need it in the release. If we could get another RC out > > today, > > that would only delay the release by a couple of more days and we could > > release on next Friday rather than wed. and still give people a week to > > test the final RC. > > > > woody > > OK, the following patch has been added to OFED 1.2. > Roland, please consider this bugfix for 2.6.22. diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 076a0bb..c64249f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -309,6 +309,11 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even return -ENOMEM; p->dev = dev; p->id = cm_id; + cm_id->context = p; + p->state = IPOIB_CM_RX_LIVE; + p->jiffies = jiffies; + INIT_LIST_HEAD(&p->list); + p->qp = ipoib_cm_create_rx_qp(dev, p); if (IS_ERR(p->qp)) { ret = PTR_ERR(p->qp); @@ -320,24 +325,24 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even if (ret) goto err_modify; + spin_lock_irq(&priv->lock); + queue_delayed_work(ipoib_workqueue, + &priv->cm.stale_task, IPOIB_CM_RX_DELAY); + /* Add this entry to passive ids list head, but do not re-add it + * if IB_EVENT_QP_LAST_WQE_REACHED has moved it to flush list. */ + p->jiffies = jiffies; + if (p->state == IPOIB_CM_RX_LIVE) + list_move(&p->list, &priv->cm.passive_ids); + spin_unlock_irq(&priv->lock); + ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); if (ret) { ipoib_warn(priv, "failed to send REP: %d\n", ret); - goto err_rep; + if (ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE)) + ipoib_warn(priv, "unable to move qp to error state\n"); } - - cm_id->context = p; - p->jiffies = jiffies; - p->state = IPOIB_CM_RX_LIVE; - spin_lock_irq(&priv->lock); - if (list_empty(&priv->cm.passive_ids)) - queue_delayed_work(ipoib_workqueue, - &priv->cm.stale_task, IPOIB_CM_RX_DELAY); - list_add(&p->list, &priv->cm.passive_ids); - spin_unlock_irq(&priv->lock); return 0; -err_rep: err_modify: ib_destroy_qp(p->qp); err_qp: -- MST From hanafim.ctr at asc.hpc.mil Tue Jun 19 06:31:29 2007 From: hanafim.ctr at asc.hpc.mil (MAHMOUD HANAFI) Date: Tue, 19 Jun 2007 09:31:29 -0400 Subject: [ofa-general] Build error 1.2rc5 Message-ID: <4677DAB1.2080002@asc.hpc.mil> Any one else seen these build error with 1.2rc5? RPM build errors: user vlad does not exist - using root group vlad does not exist - using root user vlad does not exist - using root group vlad does not exist - using root File listed twice: /usr/lib/libibverbs.so.1 File listed twice: /usr/lib/libibverbs.so.1.0.0 File listed twice: /usr/lib/libibverbs.a File listed twice: /usr/lib/libibverbs.so File listed twice: /usr/lib/libibcm.so.1 File listed twice: /usr/lib/libibcm.so.1.0 File listed twice: /usr/lib/libibcm.so.1.0.0 File listed twice: /usr/lib/libibcm.so File listed twice: /usr/lib/libmthca-rdmav2.so File listed twice: /usr/lib/libmthca.so File listed twice: /usr/lib/libmthca.a File listed twice: /usr/lib/libcxgb3-rdmav2.so File listed twice: /usr/lib/libcxgb3.so File listed twice: /usr/lib/libcxgb3.a File listed twice: /usr/lib/libipathverbs-rdmav2.so File listed twice: /usr/lib/libipathverbs.so File listed twice: /usr/lib/libipathverbs.a File listed twice: /usr/lib/libsdp.so File listed twice: /usr/lib/libsdp.so.1 File listed twice: /usr/lib/libsdp.so.1.0.0 File listed twice: /usr/lib/libibcommon.so.1 File listed twice: /usr/lib/libibcommon.so.1.0.0 File listed twice: /usr/lib/libibcommon.a File listed twice: /usr/lib/libibcommon.so File listed twice: /usr/lib/libibmad.so.1 File listed twice: /usr/lib/libibmad.so.1.2.0 File listed twice: /usr/lib/libibmad.a File listed twice: /usr/lib/libibmad.so File listed twice: /usr/lib/libibumad.so.1 File listed twice: /usr/lib/libibumad.so.1.0.0 File listed twice: /usr/lib/libibumad.a File listed twice: /usr/lib/libibumad.so File listed twice: /usr/lib/libosmcomp.so.1 File listed twice: /usr/lib/libosmcomp.so.1.0.1 File listed twice: /usr/lib/libosmcomp-2.1.3.so File listed twice: /usr/lib/libosmcomp.a File listed twice: /usr/lib/libosmcomp.so File listed twice: /usr/lib/libopensm.so.1 File listed twice: /usr/lib/libopensm.so.1.1.0 File listed twice: /usr/lib/libopensm-2.1.4.so File listed twice: /usr/lib/libopensm.a File listed twice: /usr/lib/libopensm.so File listed twice: /usr/lib/libosmvendor.so.2 File listed twice: /usr/lib/libosmvendor.so.2.0.0 File listed twice: /usr/lib/libosmvendor-2.1.3.so File listed twice: /usr/lib/libosmvendor.a File listed twice: /usr/lib/libosmvendor.so File listed twice: /usr/lib/libosmvendor_openib.so File listed twice: /usr/lib/librdmacm.so.1 File listed twice: /usr/lib/librdmacm.so.1.0.0 File listed twice: /usr/lib/librdmacm.so.1.0.1 File listed twice: /usr/lib/librdmacm.so File not found: /var/tmp/OFED/etc/dat.conf File listed twice: /usr/lib/libdaplcma.a File listed twice: /usr/lib/libdat.a ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-dapl --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libipathverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-qlvnictools --with-sdpnetstat --with-srptools --with-mstflint --with-perftest --with-tvflash --sysconfdir=/usr/etc --mandir=/usr/man' --define 'configure_options32 --with-dapl --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libipathverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-qlvnictools --with-sdpnetstat --with-srptools --sysconfdir=/usr/etc --mandir=/usr/man' --define 'build_32bit 1' --define '_mandir /usr/man' /tmp/OFED-1.2-rc5/SRPMS/ofa_user-1.2-rc5.src.rpm" -- Mahmoud Hanafi Senior System Administrator ASC/MSRC www.asc.hpc.mil 2435 5th Street WPAFB, OHIO 45433 (937) 255-1536 From jackm at dev.mellanox.co.il Tue Jun 19 06:41:52 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 19 Jun 2007 16:41:52 +0300 Subject: [ofa-general] [PATCH 1 of 2] net-mlx4: Show board_id string in sysfs under the pci device Message-ID: <200706191641.52831.jackm@dev.mellanox.co.il> Show the board_id string in sysfs under the pci device (not under the infiniband device, as with other HCAs). ConnectX will also have an enet device (which will not be under the infiniband class) and users of this device must also have access to the board_id string. This requires a small modification in the libibverbs example "ibv_devinfo"; the app must also look under the pci device for the board_id if it does not find it directly under the infiniband device. Signed-off-by: Jack Morgenstein Index: connectx_kernel/drivers/net/mlx4/main.c =================================================================== --- connectx_kernel.orig/drivers/net/mlx4/main.c 2007-05-07 18:36:02.000000000 +0300 +++ connectx_kernel/drivers/net/mlx4/main.c 2007-05-08 12:52:49.000000000 +0300 @@ -711,6 +711,18 @@ priv->eq_table.eq[i].irq = dev->pdev->irq; } +static ssize_t mlx4_show_board_id(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct mlx4_dev *mdev = dev->driver_data; + struct mlx4_priv *priv = mlx4_priv(mdev); + + return snprintf(buf, MLX4_BOARD_ID_LEN, "%s\n", (char *)priv->board_id); +} + +static DEVICE_ATTR(board_id, S_IRUGO, mlx4_show_board_id, NULL); + static int __devinit mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) { @@ -827,6 +839,7 @@ goto err_cleanup; pci_set_drvdata(pdev, dev); + device_create_file(&pdev->dev, &dev_attr_board_id); return 0; @@ -875,6 +888,7 @@ int p; if (dev) { + device_remove_file(&pdev->dev, &dev_attr_board_id); mlx4_unregister_device(dev); for (p = 1; p <= dev->caps.num_ports; ++p) From jackm at dev.mellanox.co.il Tue Jun 19 06:44:27 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 19 Jun 2007 16:44:27 +0300 Subject: [ofa-general] [PATCH 2 of 2] libibverbs: modify ibv_devinfo to look under pci device as well if it does not find board_id under the ib device Message-ID: <200706191644.27675.jackm@dev.mellanox.co.il> devinfo needs to look under the pci device directory for board_id if it does not find it under the infiniband device directory. Signed-off-by: Jack Morgenstein --- a/src/userspace/libibverbs/examples/devinfo.c 2007-05-01 11:15:29.409126000 +0300 +++ b/src/userspace/libibverbs/examples/devinfo.c 2007-05-08 14:56:02.000000000 +0300 @@ -195,6 +195,14 @@ static int print_hca_cap(struct ibv_devi if (ibv_read_sysfs_file(ib_dev->ibdev_path, "board_id", buf, sizeof buf) > 0) printf("\tboard_id:\t\t\t%s\n", buf); + else { + char syspath[256]; + strcpy((char *) syspath, ib_dev->ibdev_path); + strcat((char *) syspath, "/device"); + if (ibv_read_sysfs_file((char *) syspath, + "board_id", buf, sizeof buf) > 0) + printf("\tboard_id:\t\t\t%s\n", buf); + } printf("\tphys_port_cnt:\t\t\t%d\n", device_attr.phys_port_cnt); From erezz at voltaire.com Tue Jun 19 06:45:03 2007 From: erezz at voltaire.com (Erez Zilber) Date: Tue, 19 Jun 2007 16:45:03 +0300 Subject: [ofa-general] Build error 1.2rc5 In-Reply-To: <4677DAB1.2080002@asc.hpc.mil> References: <4677DAB1.2080002@asc.hpc.mil> Message-ID: <4677DDDF.9080109@voltaire.com> MAHMOUD HANAFI wrote: > Any one else seen these build error with 1.2rc5? Try to send this to ewg at lists.openfabrics.org. Erez From jackm at dev.mellanox.co.il Tue Jun 19 06:47:41 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 19 Jun 2007 16:47:41 +0300 Subject: [ofa-general] [PATCH] libmlx4: fix adjustments for minimum qp capabilities in mlx4_create_qp Message-ID: <200706191647.41336.jackm@dev.mellanox.co.il> Need to adjust minimum qp capability values prior to size and max resource calculations. Correct the rq values afterwards (as before) if have an srq. Signed-off-by: Jack Morgenstein Index: new_connectx_user/src/userspace/libmlx4/src/verbs.c =================================================================== --- new_connectx_user.orig/src/userspace/libmlx4/src/verbs.c 2007-06-18 09:33:04.000000000 +0300 +++ new_connectx_user/src/userspace/libmlx4/src/verbs.c 2007-06-19 09:47:10.000000000 +0300 @@ -355,6 +355,12 @@ struct ibv_qp *mlx4_create_qp(struct ibv if (!qp) return NULL; + /* adjust minimum cap values */ + attr->cap.max_recv_wr = attr->cap.max_recv_wr ? attr->cap.max_recv_wr : 1; + attr->cap.max_recv_sge = attr->cap.max_recv_sge ? attr->cap.max_recv_sge : 1; + attr->cap.max_send_wr = attr->cap.max_send_wr ? attr->cap.max_send_wr : 1; + attr->cap.max_send_sge = attr->cap.max_send_sge ? attr->cap.max_send_sge : 1; + mlx4_calc_sq_wqe_size(&attr->cap, attr->qp_type, qp); /* @@ -366,9 +372,7 @@ struct ibv_qp *mlx4_create_qp(struct ibv qp->rq.wqe_cnt = align_queue_size(attr->cap.max_recv_wr); if (attr->srq) - attr->cap.max_recv_wr = qp->rq.wqe_cnt = 0; - else if (attr->cap.max_recv_sge < 1) - attr->cap.max_recv_sge = 1; + attr->cap.max_recv_wr = attr->cap.max_recv_sge = qp->rq.wqe_cnt = 0; if (mlx4_alloc_qp_buf(pd, &attr->cap, attr->qp_type, qp)) goto err; From Eric.Baur at gs.com Tue Jun 19 07:25:52 2007 From: Eric.Baur at gs.com (Baur, Eric) Date: Tue, 19 Jun 2007 10:25:52 -0400 Subject: [ofa-general] Build error 1.2rc5 In-Reply-To: <4677DAB1.2080002@asc.hpc.mil> References: <4677DAB1.2080002@asc.hpc.mil> Message-ID: <4DCBAA39733E8048992FB7737126041902829564@gsmbnbp23es.firmwide.corp.gs.com> Yes. The issue seems to be caused by the fact that both 32-bit and 64-bit libs are written to /usr/lib rather than /usr/lib and /usr/lib64/. A quick workaround is to modify ofed.conf to only build 64 bit (build_32bit=0). -Eric -----Original Message----- From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of MAHMOUD HANAFI Sent: Tuesday, June 19, 2007 9:31 AM To: general at lists.openfabrics.org Subject: [ofa-general] Build error 1.2rc5 Any one else seen these build error with 1.2rc5? RPM build errors: user vlad does not exist - using root group vlad does not exist - using root user vlad does not exist - using root group vlad does not exist - using root File listed twice: /usr/lib/libibverbs.so.1 File listed twice: /usr/lib/libibverbs.so.1.0.0 File listed twice: /usr/lib/libibverbs.a File listed twice: /usr/lib/libibverbs.so File listed twice: /usr/lib/libibcm.so.1 File listed twice: /usr/lib/libibcm.so.1.0 File listed twice: /usr/lib/libibcm.so.1.0.0 File listed twice: /usr/lib/libibcm.so File listed twice: /usr/lib/libmthca-rdmav2.so File listed twice: /usr/lib/libmthca.so File listed twice: /usr/lib/libmthca.a File listed twice: /usr/lib/libcxgb3-rdmav2.so File listed twice: /usr/lib/libcxgb3.so File listed twice: /usr/lib/libcxgb3.a File listed twice: /usr/lib/libipathverbs-rdmav2.so File listed twice: /usr/lib/libipathverbs.so File listed twice: /usr/lib/libipathverbs.a File listed twice: /usr/lib/libsdp.so File listed twice: /usr/lib/libsdp.so.1 File listed twice: /usr/lib/libsdp.so.1.0.0 File listed twice: /usr/lib/libibcommon.so.1 File listed twice: /usr/lib/libibcommon.so.1.0.0 File listed twice: /usr/lib/libibcommon.a File listed twice: /usr/lib/libibcommon.so File listed twice: /usr/lib/libibmad.so.1 File listed twice: /usr/lib/libibmad.so.1.2.0 File listed twice: /usr/lib/libibmad.a File listed twice: /usr/lib/libibmad.so File listed twice: /usr/lib/libibumad.so.1 File listed twice: /usr/lib/libibumad.so.1.0.0 File listed twice: /usr/lib/libibumad.a File listed twice: /usr/lib/libibumad.so File listed twice: /usr/lib/libosmcomp.so.1 File listed twice: /usr/lib/libosmcomp.so.1.0.1 File listed twice: /usr/lib/libosmcomp-2.1.3.so File listed twice: /usr/lib/libosmcomp.a File listed twice: /usr/lib/libosmcomp.so File listed twice: /usr/lib/libopensm.so.1 File listed twice: /usr/lib/libopensm.so.1.1.0 File listed twice: /usr/lib/libopensm-2.1.4.so File listed twice: /usr/lib/libopensm.a File listed twice: /usr/lib/libopensm.so File listed twice: /usr/lib/libosmvendor.so.2 File listed twice: /usr/lib/libosmvendor.so.2.0.0 File listed twice: /usr/lib/libosmvendor-2.1.3.so File listed twice: /usr/lib/libosmvendor.a File listed twice: /usr/lib/libosmvendor.so File listed twice: /usr/lib/libosmvendor_openib.so File listed twice: /usr/lib/librdmacm.so.1 File listed twice: /usr/lib/librdmacm.so.1.0.0 File listed twice: /usr/lib/librdmacm.so.1.0.1 File listed twice: /usr/lib/librdmacm.so File not found: /var/tmp/OFED/etc/dat.conf File listed twice: /usr/lib/libdaplcma.a File listed twice: /usr/lib/libdat.a ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-dapl --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libipathverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-qlvnictools --with-sdpnetstat --with-srptools --with-mstflint --with-perftest --with-tvflash --sysconfdir=/usr/etc --mandir=/usr/man' --define 'configure_options32 --with-dapl --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libipathverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-qlvnictools --with-sdpnetstat --with-srptools --sysconfdir=/usr/etc --mandir=/usr/man' --define 'build_32bit 1' --define '_mandir /usr/man' /tmp/OFED-1.2-rc5/SRPMS/ofa_user-1.2-rc5.src.rpm" -- Mahmoud Hanafi Senior System Administrator ASC/MSRC www.asc.hpc.mil 2435 5th Street WPAFB, OHIO 45433 (937) 255-1536 _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From tziporet at mellanox.co.il Tue Jun 19 07:47:43 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 19 Jun 2007 17:47:43 +0300 Subject: [ofa-general] Anouncement: OFED 1.2 rc6 is avilable In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90156362A@mtlexch01.mtl.com> References: <43AA3CB3C1BF5A499F5AAD31CA5023AC06624A26@mtlexch01.mtl.com> <6C2C79E72C305246B504CBA17B5500C9015634B7@mtlexch01.mtl.com> <6C2C79E72C305246B504CBA17B5500C90156362A@mtlexch01.mtl.com> Message-ID: <6C2C79E72C305246B504CBA17B5500C9015636A9@mtlexch01.mtl.com> Hi, OFED 1.2-RC6 is available on http://www.openfabrics.org/builds/ofed-1.2/ File: OFED-1.2-rc6.tgz To get BUILD_ID run ofed_info Please report any issues in bugzilla https://bugs.openfabrics.org/ The GA release is expected this Friday (June 22) I attach the OFED RN - please review and send me comments to the final release Thanks, Tziporet ======================================================================== Release information: OS support: Novell: - SLES 9.0 SP3 - SLES10 - SLES10 SP1 RC5 Redhat: - Redhat EL4 up3, up4 and up5 - Redhat EL5 kernel.org: - 2.6.20 - 2.6.19 Note: Kernel 2.6.21, Fedora C6 and SuSE Pro 10 are not part of the official list. We keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. Systems: * x86_64 * x86 * ia64 * ppc64 Main changes from OFED-1.1-rc5: =============================== 1. Fixed 6 bugs (see attached for fixed issues) See bugzilla for all open issues. Tasks that should be completed for the GA release: 1. Complete all documentation (release notes, README, etc.) 2. Run all QA tests on all platforms -------------- next part -------------- A non-text attachment was scrubbed... Name: rc6_fixed_bugs.csv Type: application/octet-stream Size: 636 bytes Desc: rc6_fixed_bugs.csv URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: OFED_release_notes.txt URL: From mhanafi at csc.com Tue Jun 19 08:05:50 2007 From: mhanafi at csc.com (Mahmoud Hanafi) Date: Tue, 19 Jun 2007 11:05:50 -0400 Subject: [ofa-general] Anouncement: OFED 1.2 rc6 is avilable In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9015636A9@mtlexch01.mtl.com> Message-ID: Changing the default install from /usr to /usr/local/ofed1.2 these files are copied to ../usr/etc and using the default install location dat.conf is still copied to ../usr/etc RPM build errors: user vlad does not exist - using root group vlad does not exist - using root user vlad does not exist - using root group vlad does not exist - using root File not found: /var/tmp/OFED/usr/local/ofed1.2/etc/libibverbs.d/mthca.driver File not found: /var/tmp/OFED/usr/local/ofed1.2/etc/libibverbs.d/cxgb3.driver File not found: /var/tmp/OFED/usr/local/ofed1.2/etc/libibverbs.d/ipath.driver File not found: /var/tmp/OFED/usr/local/ofed1.2/etc/libsdp.conf File not found: /var/tmp/OFED/etc/dat.conf ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed1.2' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-dapl --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libipathverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-qlvnictools --with-sdpnetstat --with-srptools --with-mstflint --with-perftest --with-tvflash --sysconfdir=/usr/etc --mandir=/usr/man' --define 'configure_options32 %{nil} --sysconfdir=/usr/etc --mandir=/usr/man' --define 'build_32bit 0' --define '_mandir /usr/man' /tmp/OFED-1.2-rc5/SRPMS/ofa_user-1.2-rc5.src.rpm" -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- "Tziporet Koren" Sent by: general-bounces at lists.openfabrics.org 06/19/2007 10:47 AM To cc general at lists.openfabrics.org Subject [ofa-general] Anouncement: OFED 1.2 rc6 is avilable Hi, OFED 1.2-RC6 is available on http://www.openfabrics.org/builds/ofed-1.2/ File: OFED-1.2-rc6.tgz To get BUILD_ID run ofed_info Please report any issues in bugzilla https://bugs.openfabrics.org/ The GA release is expected this Friday (June 22) I attach the OFED RN - please review and send me comments to the final release Thanks, Tziporet ======================================================================== Release information: OS support: Novell: - SLES 9.0 SP3 - SLES10 - SLES10 SP1 RC5 Redhat: - Redhat EL4 up3, up4 and up5 - Redhat EL5 kernel.org: - 2.6.20 - 2.6.19 Note: Kernel 2.6.21, Fedora C6 and SuSE Pro 10 are not part of the official list. We keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. Systems: * x86_64 * x86 * ia64 * ppc64 Main changes from OFED-1.1-rc5: =============================== 1. Fixed 6 bugs (see attached for fixed issues) See bugzilla for all open issues. Tasks that should be completed for the GA release: 1. Complete all documentation (release notes, README, etc.) 2. Run all QA tests on all platforms _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rc6_fixed_bugs.csv Type: application/octet-stream Size: 636 bytes Desc: not available URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: OFED_release_notes.txt URL: From mhanafi at csc.com Tue Jun 19 08:15:02 2007 From: mhanafi at csc.com (Mahmoud Hanafi) Date: Tue, 19 Jun 2007 11:15:02 -0400 Subject: [ofa-general] Anouncement: OFED 1.2 rc6 is avilable In-Reply-To: Message-ID: Sorry, I got rc5 and rc6 mixed up. Here is the rc6 issue. (looks like base.h is missing) gcc -Wp,-MD,/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/scsi/.attribute_container.o.d -nostdinc -iwithprefix include -D__KERNEL__ -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.9_U4/include/ -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/include -Iinclude -Iinclude2 -I/usr/src/linux-2.6.9-42.0.10.EL_lustre.1.4.10/include -include include/linux/autoconf.h -include /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include/linux/autoconf.h -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/scsi -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Os -fomit-frame-pointer -Wdeclaration-after-statement -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/include -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drive rs/infiniband/ulp/ipoib -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/debug -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/cxgb3/core -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3 -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/rds -I/usr/src/linux-2.6.9-42.0.10.EL_lustre.1.4.10/kernel_addons/backport/2.6.9_U4/include/src/ -DMODULE -DKBUILD_BASENAME=attribute_container -DKBUILD_MODNAME=scsi_transport_iscsi -c -o /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/scsi/.tmp_attribute_container.o /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/scsi/attribute_container.c In file included from /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/scsi/attribute_container.c:1: /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include/../drivers/base/attribute_container.c:22:18: base.h: No such file or directory make[5]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/scsi/attribute_container.o] Error 1 make[4]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/scsi] Error 2 make[3]: *** [_module_/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2] Error 2 make[2]: *** [modules] Error 2 make[1]: *** [modules] Error 2 make[1]: Leaving directory `/usr/src/linux-2.6.9-42.0.10.EL_lustre.1.4.10-obj/x86_64/smp' make: *** [kernel] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.36297 (%install) RPM build errors: user vlad does not exist - using root group vlad does not exist - using root user vlad does not exist - using root group vlad does not exist - using root Bad exit status from /var/tmp/rpm-tmp.36297 (%install) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-cxgb3-mod --with-ipath_inf-mod --with-ipoib-mod --with-iser-mod --with-mthca-mod --with-sdp-mod --with-srp-mod --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-rds-mod --with-vnic-mod ' --define 'KVERSION 2.6.9-42.0.10.EL_lustre.1.4.10smp' --define 'KSRC /lib/modules/2.6.9-42.0.10.EL_lustre.1.4.10smp/build' --define 'build_kernel_ib 1' --define 'build_kernel_ib_devel 1' --define 'NETWORK_CONF_DIR /etc/sysconfig/network-scripts' --define 'modprobe_update 1' --define 'include_ipoib_conf 1' /root/OFED-1.2-rc6/SRPMS/ofa_kernel-1.2-rc6.src.rpm" -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Mahmoud Hanafi/DEF/CSC at CSC Sent by: general-bounces at lists.openfabrics.org 06/19/2007 11:05 AM To "Tziporet Koren" cc general-bounces at lists.openfabrics.org, ewg at lists.openfabrics.org, general at lists.openfabrics.org Subject Re: [ofa-general] Anouncement: OFED 1.2 rc6 is avilable Changing the default install from /usr to /usr/local/ofed1.2 these files are copied to ../usr/etc and using the default install location dat.conf is still copied to ../usr/etc RPM build errors: user vlad does not exist - using root group vlad does not exist - using root user vlad does not exist - using root group vlad does not exist - using root File not found: /var/tmp/OFED/usr/local/ofed1.2/etc/libibverbs.d/mthca.driver File not found: /var/tmp/OFED/usr/local/ofed1.2/etc/libibverbs.d/cxgb3.driver File not found: /var/tmp/OFED/usr/local/ofed1.2/etc/libibverbs.d/ipath.driver File not found: /var/tmp/OFED/usr/local/ofed1.2/etc/libsdp.conf File not found: /var/tmp/OFED/etc/dat.conf ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed1.2' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-dapl --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libipathverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-qlvnictools --with-sdpnetstat --with-srptools --with-mstflint --with-perftest --with-tvflash --sysconfdir=/usr/etc --mandir=/usr/man' --define 'configure_options32 %{nil} --sysconfdir=/usr/etc --mandir=/usr/man' --define 'build_32bit 0' --define '_mandir /usr/man' /tmp/OFED-1.2-rc5/SRPMS/ofa_user-1.2-rc5.src.rpm" -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- "Tziporet Koren" Sent by: general-bounces at lists.openfabrics.org 06/19/2007 10:47 AM To cc general at lists.openfabrics.org Subject [ofa-general] Anouncement: OFED 1.2 rc6 is avilable Hi, OFED 1.2-RC6 is available on http://www.openfabrics.org/builds/ofed-1.2/ File: OFED-1.2-rc6.tgz To get BUILD_ID run ofed_info Please report any issues in bugzilla https://bugs.openfabrics.org/ The GA release is expected this Friday (June 22) I attach the OFED RN - please review and send me comments to the final release Thanks, Tziporet ======================================================================== Release information: OS support: Novell: - SLES 9.0 SP3 - SLES10 - SLES10 SP1 RC5 Redhat: - Redhat EL4 up3, up4 and up5 - Redhat EL5 kernel.org: - 2.6.20 - 2.6.19 Note: Kernel 2.6.21, Fedora C6 and SuSE Pro 10 are not part of the official list. We keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. Systems: * x86_64 * x86 * ia64 * ppc64 Main changes from OFED-1.1-rc5: =============================== 1. Fixed 6 bugs (see attached for fixed issues) See bugzilla for all open issues. Tasks that should be completed for the GA release: 1. Complete all documentation (release notes, README, etc.) 2. Run all QA tests on all platforms _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rc6_fixed_bugs.csv Type: application/octet-stream Size: 636 bytes Desc: not available URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: OFED_release_notes.txt URL: From jackm at dev.mellanox.co.il Tue Jun 19 08:20:46 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 19 Jun 2007 18:20:46 +0300 Subject: [ofa-general] [PATCH] IB-mlx4: query_device needs to return one less srq wqe for max_srq_wr Message-ID: <200706191820.46443.jackm@dev.mellanox.co.il> Need to have 1 spare wqe for srq (so that there is always a "next wqe" available when posting). Found by Mellanox QA Signed-off-by: Jack Morgenstein diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 402f3a2..6cb0ba1 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -120,7 +120,7 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, props->max_qp_init_rd_atom = dev->dev->caps.max_qp_init_rdma; props->max_res_rd_atom = props->max_qp_rd_atom * props->max_qp; props->max_srq = dev->dev->caps.num_srqs - dev->dev->caps.reserved_srqs; - props->max_srq_wr = dev->dev->caps.max_srq_wqes; + props->max_srq_wr = dev->dev->caps.max_srq_wqes - 1; props->max_srq_sge = dev->dev->caps.max_srq_sge; props->local_ca_ack_delay = dev->dev->caps.local_ca_ack_delay; props->atomic_cap = dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_ATOMIC ? From john.russo at qlogic.com Tue Jun 19 08:19:46 2007 From: john.russo at qlogic.com (John Russo) Date: Tue, 19 Jun 2007 10:19:46 -0500 Subject: [ofa-general] Supported list of Kernels In-Reply-To: <20070619150629.E2CA7E60871@openfabrics.org> References: <20070619150629.E2CA7E60871@openfabrics.org> Message-ID: <99863D2ED484D449811D97A4C44C9CBD4239A1@EPEXCH2.qlogic.org> The list below shows the same kernel for 3 versions of RedHat - RedHat EL4 up4: 2.6.9-42.ELsmp - RedHat EL4 up5: 2.6.9-42.ELsmp - RedHat EL5: 2.6.9-42.ELsmp The kernels that exist "out of the box" for each release are - RedHat EL4 up4: 2.6.9-42.ELsmp (no change) - RedHat EL4 up5: 2.6.9-55.ELsmp - RedHat EL5: 2.6.18-8.ELsmp Is 2.6.9-42 really the only kernel supported/tested or is this a cut-and-paste mistake: -----Original Message----- From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of general-request at lists.openfabrics.org Sent: Tuesday, June 19, 2007 11:06 AM To: general at lists.openfabrics.org Subject: general Digest, Vol 5, Issue 67 Send general mailing list submissions to general at lists.openfabrics.org To subscribe or unsubscribe via the World Wide Web, visit http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general or, via email, send a message with subject or body 'help' to general-request at lists.openfabrics.org You can reach the person managing the list at general-owner at lists.openfabrics.org When replying, please edit your Subject line so it is more specific than "Re: Contents of general digest..." Today's Topics: 1. Re: Build error 1.2rc5 (Erez Zilber) 2. [PATCH] libmlx4: fix adjustments for minimum qp capabilities in mlx4_create_qp (Jack Morgenstein) 3. RE: Build error 1.2rc5 (Baur, Eric) 4. Anouncement: OFED 1.2 rc6 is avilable (Tziporet Koren) 5. Re: Anouncement: OFED 1.2 rc6 is avilable (Mahmoud Hanafi) ---------------------------------------------------------------------- Message: 1 Date: Tue, 19 Jun 2007 16:45:03 +0300 From: Erez Zilber Subject: Re: [ofa-general] Build error 1.2rc5 To: MAHMOUD HANAFI Cc: general at lists.openfabrics.org Message-ID: <4677DDDF.9080109 at voltaire.com> Content-Type: text/plain; charset=ISO-8859-1 MAHMOUD HANAFI wrote: > Any one else seen these build error with 1.2rc5? Try to send this to ewg at lists.openfabrics.org. Erez ------------------------------ Message: 2 Date: Tue, 19 Jun 2007 16:47:41 +0300 From: Jack Morgenstein Subject: [ofa-general] [PATCH] libmlx4: fix adjustments for minimum qp capabilities in mlx4_create_qp To: Roland Dreier Cc: general at lists.openfabrics.org Message-ID: <200706191647.41336.jackm at dev.mellanox.co.il> Content-Type: text/plain; charset="us-ascii" Need to adjust minimum qp capability values prior to size and max resource calculations. Correct the rq values afterwards (as before) if have an srq. Signed-off-by: Jack Morgenstein Index: new_connectx_user/src/userspace/libmlx4/src/verbs.c =================================================================== --- new_connectx_user.orig/src/userspace/libmlx4/src/verbs.c 2007-06-18 09:33:04.000000000 +0300 +++ new_connectx_user/src/userspace/libmlx4/src/verbs.c 2007-06-19 09:47:10.000000000 +0300 @@ -355,6 +355,12 @@ struct ibv_qp *mlx4_create_qp(struct ibv if (!qp) return NULL; + /* adjust minimum cap values */ + attr->cap.max_recv_wr = attr->cap.max_recv_wr ? attr->cap.max_recv_wr : 1; + attr->cap.max_recv_sge = attr->cap.max_recv_sge ? attr->cap.max_recv_sge : 1; + attr->cap.max_send_wr = attr->cap.max_send_wr ? attr->cap.max_send_wr : 1; + attr->cap.max_send_sge = attr->cap.max_send_sge ? attr->cap.max_send_sge : 1; + mlx4_calc_sq_wqe_size(&attr->cap, attr->qp_type, qp); /* @@ -366,9 +372,7 @@ struct ibv_qp *mlx4_create_qp(struct ibv qp->rq.wqe_cnt = align_queue_size(attr->cap.max_recv_wr); if (attr->srq) - attr->cap.max_recv_wr = qp->rq.wqe_cnt = 0; - else if (attr->cap.max_recv_sge < 1) - attr->cap.max_recv_sge = 1; + attr->cap.max_recv_wr = attr->cap.max_recv_sge = qp->rq.wqe_cnt = 0; if (mlx4_alloc_qp_buf(pd, &attr->cap, attr->qp_type, qp)) goto err; ------------------------------ Message: 3 Date: Tue, 19 Jun 2007 10:25:52 -0400 From: "Baur, Eric" Subject: RE: [ofa-general] Build error 1.2rc5 To: Message-ID: <4DCBAA39733E8048992FB7737126041902829564 at gsmbnbp23es.firmwide.corp.gs.c om> Content-Type: text/plain; charset="us-ascii" Yes. The issue seems to be caused by the fact that both 32-bit and 64-bit libs are written to /usr/lib rather than /usr/lib and /usr/lib64/. A quick workaround is to modify ofed.conf to only build 64 bit (build_32bit=0). -Eric -----Original Message----- From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of MAHMOUD HANAFI Sent: Tuesday, June 19, 2007 9:31 AM To: general at lists.openfabrics.org Subject: [ofa-general] Build error 1.2rc5 Any one else seen these build error with 1.2rc5? RPM build errors: user vlad does not exist - using root group vlad does not exist - using root user vlad does not exist - using root group vlad does not exist - using root File listed twice: /usr/lib/libibverbs.so.1 File listed twice: /usr/lib/libibverbs.so.1.0.0 File listed twice: /usr/lib/libibverbs.a File listed twice: /usr/lib/libibverbs.so File listed twice: /usr/lib/libibcm.so.1 File listed twice: /usr/lib/libibcm.so.1.0 File listed twice: /usr/lib/libibcm.so.1.0.0 File listed twice: /usr/lib/libibcm.so File listed twice: /usr/lib/libmthca-rdmav2.so File listed twice: /usr/lib/libmthca.so File listed twice: /usr/lib/libmthca.a File listed twice: /usr/lib/libcxgb3-rdmav2.so File listed twice: /usr/lib/libcxgb3.so File listed twice: /usr/lib/libcxgb3.a File listed twice: /usr/lib/libipathverbs-rdmav2.so File listed twice: /usr/lib/libipathverbs.so File listed twice: /usr/lib/libipathverbs.a File listed twice: /usr/lib/libsdp.so File listed twice: /usr/lib/libsdp.so.1 File listed twice: /usr/lib/libsdp.so.1.0.0 File listed twice: /usr/lib/libibcommon.so.1 File listed twice: /usr/lib/libibcommon.so.1.0.0 File listed twice: /usr/lib/libibcommon.a File listed twice: /usr/lib/libibcommon.so File listed twice: /usr/lib/libibmad.so.1 File listed twice: /usr/lib/libibmad.so.1.2.0 File listed twice: /usr/lib/libibmad.a File listed twice: /usr/lib/libibmad.so File listed twice: /usr/lib/libibumad.so.1 File listed twice: /usr/lib/libibumad.so.1.0.0 File listed twice: /usr/lib/libibumad.a File listed twice: /usr/lib/libibumad.so File listed twice: /usr/lib/libosmcomp.so.1 File listed twice: /usr/lib/libosmcomp.so.1.0.1 File listed twice: /usr/lib/libosmcomp-2.1.3.so File listed twice: /usr/lib/libosmcomp.a File listed twice: /usr/lib/libosmcomp.so File listed twice: /usr/lib/libopensm.so.1 File listed twice: /usr/lib/libopensm.so.1.1.0 File listed twice: /usr/lib/libopensm-2.1.4.so File listed twice: /usr/lib/libopensm.a File listed twice: /usr/lib/libopensm.so File listed twice: /usr/lib/libosmvendor.so.2 File listed twice: /usr/lib/libosmvendor.so.2.0.0 File listed twice: /usr/lib/libosmvendor-2.1.3.so File listed twice: /usr/lib/libosmvendor.a File listed twice: /usr/lib/libosmvendor.so File listed twice: /usr/lib/libosmvendor_openib.so File listed twice: /usr/lib/librdmacm.so.1 File listed twice: /usr/lib/librdmacm.so.1.0.0 File listed twice: /usr/lib/librdmacm.so.1.0.1 File listed twice: /usr/lib/librdmacm.so File not found: /var/tmp/OFED/etc/dat.conf File listed twice: /usr/lib/libdaplcma.a File listed twice: /usr/lib/libdat.a ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-dapl --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libipathverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-qlvnictools --with-sdpnetstat --with-srptools --with-mstflint --with-perftest --with-tvflash --sysconfdir=/usr/etc --mandir=/usr/man' --define 'configure_options32 --with-dapl --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libipathverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-qlvnictools --with-sdpnetstat --with-srptools --sysconfdir=/usr/etc --mandir=/usr/man' --define 'build_32bit 1' --define '_mandir /usr/man' /tmp/OFED-1.2-rc5/SRPMS/ofa_user-1.2-rc5.src.rpm" -- Mahmoud Hanafi Senior System Administrator ASC/MSRC www.asc.hpc.mil 2435 5th Street WPAFB, OHIO 45433 (937) 255-1536 _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ------------------------------ Message: 4 Date: Tue, 19 Jun 2007 17:47:43 +0300 From: "Tziporet Koren" Subject: [ofa-general] Anouncement: OFED 1.2 rc6 is avilable To: Cc: general at lists.openfabrics.org Message-ID: <6C2C79E72C305246B504CBA17B5500C9015636A9 at mtlexch01.mtl.com> Content-Type: text/plain; charset="us-ascii" Hi, OFED 1.2-RC6 is available on http://www.openfabrics.org/builds/ofed-1.2/ File: OFED-1.2-rc6.tgz To get BUILD_ID run ofed_info Please report any issues in bugzilla https://bugs.openfabrics.org/ The GA release is expected this Friday (June 22) I attach the OFED RN - please review and send me comments to the final release Thanks, Tziporet ======================================================================== Release information: OS support: Novell: - SLES 9.0 SP3 - SLES10 - SLES10 SP1 RC5 Redhat: - Redhat EL4 up3, up4 and up5 - Redhat EL5 kernel.org: - 2.6.20 - 2.6.19 Note: Kernel 2.6.21, Fedora C6 and SuSE Pro 10 are not part of the official list. We keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. Systems: * x86_64 * x86 * ia64 * ppc64 Main changes from OFED-1.1-rc5: =============================== 1. Fixed 6 bugs (see attached for fixed issues) See bugzilla for all open issues. Tasks that should be completed for the GA release: 1. Complete all documentation (release notes, README, etc.) 2. Run all QA tests on all platforms -------------- next part -------------- A non-text attachment was scrubbed... Name: rc6_fixed_bugs.csv Type: application/octet-stream Size: 636 bytes Desc: rc6_fixed_bugs.csv Url : http://lists.openfabrics.org/pipermail/general/attachments/20070619/ad94 d792/rc6_fixed_bugs-0001.obj -------------- next part -------------- Open Fabrics Enterprise Distribution (OFED) Version 1.2 Release Notes June 2007 ======================================================================== ======= Table of Contents ======================================================================== ======= 1. Overview, which includes: - OFED Distribution Rev 1.2 Contents - Supported Platforms and Operating Systems - Supported HCA and RNIC Adapter Cards and Firmware Versions - Tested Switch Platforms - Third party Test Packages - OFED sources 2. Main Changes from OFED 1.1 3. Fixed Bugs 4. Known Issues ======================================================================== ======= 1. Overview ======================================================================== ======= These are the release notes of OpenFabrics Enterprise Distribution (OFED) release 1.2. The OFED software package is composed of several software modules, and is intended for use on a computer cluster constructed as an InfiniBand subnet or iWARP network. Note: If you plan to upgrade the OFED package on your cluster, please upgrade all of its nodes to this new version. 1.1 OFED 1.2 Contents --------------------- The OFED package contains the following components: o OpenFabrics core and ULPs: - IB HCA drivers (mthca, ipath, ehca) - iWARP RNIC driver (cxgb3) - core - Upper Layer Protocols: IPoIB, SDP, SRP Initiator, iSER Host, RDS, uDAPL and VNIC. o OpenFabrics utilities: - OpenSM (OSM): InfiniBand Subnet Manager - Diagnostic tools - Performance tests o MPI: - OSU MPI stack supporting the InfiniBand and iWARP interface - Open MPI stack supporting the InfiniBand and iWARP interface - OSU MVAPICH2 stack supporting the InfiniBand and iWARP interface - MPI benchmark tests (OSU benchmarks, Intel MPI benchmarks, Presta) o Extra packages: - open-iscsi: open-iscsi initiator with iSER support - ib-bonding: Bonding driver for IPoIB interface o Sources of all software modules (under conditions mentioned in the modules' LICENSE files) o Documentation Notes: 1. The cxgb3 driver is in technology preview state. 2. The Virtual NIC (VNIC) driver is presented as a technology preview on OFED 1.2. 3. All other OFED components are of production quality. 4. See release notes for each package in the docs directory. 5. Any Topspin copyright belongs to Cisco Systems, Inc. 1.2 Supported Platforms and Operating Systems --------------------------------------------- o CPU architectures: - x86_64 - x86 - ia64 - ppc64 o Linux Operating Systems: - RedHat EL4 up3: 2.6.9-34.ELsmp - RedHat EL4 up4: 2.6.9-42.ELsmp - RedHat EL4 up5: 2.6.9-42.ELsmp - RedHat EL5: 2.6.9-42.ELsmp - SLES9 SP3: 2.6.5-7.244-smp - SLES10: 2.6.16.21-0.8-smp - kernel.org: 2.6.19.x and 2.6.20.x 1.3 HCAs and RNICs Supported ---------------------------- This release supports IB HCAs by Mellanox Technologies, Qlogic and IBM as well as iWARP RNICs by Chelsio Communications. o Mellanox Technologies HCAs: - InfiniHost (fw-23108 Rev 3.5.000) - InfiniHost III Ex (MemFree: fw-25218 Rev 5.2.000 with memory: fw-25208 Rev 4.8.200) - InfiniHost III Lx (fw-25204 Rev 1.2.000) The SDR and DDR modes of the InfiniHost III family are supported. For official firmware versions please see: http://www.mellanox.com/support/firmware_table.php o Qlogic HCAs: - QHT6040 (PathScale InfiniPath HT-460) - QHT6140 (PathScale InfiniPath HT-465) - QLE6140 (PathScale InfiniPath PE-880) o IBM HCAs: - GX Dual-port 4x IB HCA - GX Dual-port 12x IB HCA o Chelsio RNICs: - S310/S320 10GbE Storage Accelerators - R310E 10GbE iWARP Adapters 1.4 Switches Supported ---------------------- This release was tested with switches and gateways provided by the following companies: - Cisco - Voltaire - Qlogic - Flextronics 1.5 Third Party Packages ------------------------ The following third party packages have been tested with OFED 1.2: 1. Intel MPI, Version 3.0 - Package ID: l_mpi_p_3.0.043 2. HP MPI, Version 2.2.5 1.6 OFED Sources ---------------- Source repositories: http://www.openfabrics.org/git/ Kernel sources: ~vlad/ofed_1_2/.git User level Sources are located in all git trees starting with: ofed_1_2/ The kernel sources are based on Linux 2.6.20 mainline kernel. Its patches are included in the OFED sources directory. For details see HOWTO.build_ofed. ======================================================================== ======= 2. Main Changes from OFED 1.1 ======================================================================== ======= Note: For details regarding the various changes, please see the release notes for each package in the docs directory. 2.1 General changes o Kernel code based on 2.6.20 o New kernel modules: SA Cache, RDS, VNIC, bonding o High availability of SRP and IPoIB in GA level o Added iWARP support (with Chelsio driver) o MAN pages for libraries (libibverbs and librdmacm) 2.1 IPoIB o IPoIB Connected Mode o High availability support using the bonding module. 2.2 SDP o netstat is now available o Improved message BW - 10X for small messages - 5X for medium messages o Scalability - Added a memory consumption limit 2.3 SRP o High availability is now supported for all systems. 2.4 iSER o Testing more platforms (e.g., ppc64 and ia64) o Updated packages for ISCSI kernel & user components bundled with OFED. 2.5 uDAPL o Scalability features needed for Intel MPI 2.6 Libraries a. libibverbs 1.1 o Fork support (requires apps change) o Better low-level driver handling, including multiple drivers linked in statically o Documentation: man pages b. librdmacm (uCMA) 1.0 o Multicast joining from user space o UD support o Documentation: man pages 2.7 OSM o Routing improvements o Performance improvement to min hop and up/down of over an order of magnitude o New fat-tree and LASH algorithms o SA optional record support "virtually" complete o IB router enablement o SA database dump/restore 2.8 Management o Many diagnostic improvements since OFED 1.1 (see detailed RN) o ibdiagui: A GUI for ibdiagnet 2.9 Install o Default prefix directory is now /usr 2.6 MPI: a. OSU MVAPICH o Version was updated to 0.9.9 b. Open MPI o Version was updated to 1.2.1 o See http://www.open-mpi.org/svn/new.php for details c. OSU MVAPICH2 o MVAPICH2 version 0.98 was added to the OFED package. d. Common MPI setup sourcing Simple menu-driven interface to choose which MPI implementation to set as the default on a per-user and/or system-wide basis 2.7 iWARP Support o Chelsio NIC supported o Verbs and CMA APIs are the same as InfiniBand o ULPs supported - MPI (mvapich2 tested) - uDAPL o Basic Testing - uDAPL - mvapich2 - NFS-RDMA o Status: Beta ======================================================================== ======= 3. Fixed Bugs ======================================================================== ======= 1. OFED installation now supports installing lib32 on 64-bit systems. 2. Hotplug removal does not hang the system when the device is used by the uverbs interface. 3. MVAPICH now works on ppc64. 4. libibcm is now thread safe. Bugs fixed in each package are reported in the package's release notes. ======================================================================== ======= 4. Known Issues ======================================================================== ======= The following is a list of major limitations and known issues of the various components of the OFED 1.2 release. 1. Memory registration by theuser is limited according to the administrator setting. See "Pinning (Locking) User Memory Pages" in OFED_tips.txt for system configuration. 2. Fork support from kernel 2.6.12 and above is available provided that applications do not use threads. The fork() is supported as long as the parent process does not run before the child exits or calls exec(). The former can be achieved by calling wait(childpid), and the latter can be achieved by application specific means. The Posix system() call is supported. 3. The ipath driver is supported only on 64-bit platforms. 4. There are issues using Intel's MPI with the Qlogic card driver that cause failures. Note: See the release notes of each component for additional issues. ------------------------------ Message: 5 Date: Tue, 19 Jun 2007 11:05:50 -0400 From: Mahmoud Hanafi Subject: Re: [ofa-general] Anouncement: OFED 1.2 rc6 is avilable To: "Tziporet Koren" Cc: general-bounces at lists.openfabrics.org, ewg at lists.openfabrics.org, general at lists.openfabrics.org Message-ID: Content-Type: text/plain; charset="us-ascii" Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: rc6_fixed_bugs.csv Type: application/octet-stream Size: 636 bytes Desc: not available Url : http://lists.openfabrics.org/pipermail/general/attachments/20070619/8bf8 57f3/rc6_fixed_bugs.obj -------------- next part -------------- Open Fabrics Enterprise Distribution (OFED) Version 1.2 Release Notes June 2007 ======================================================================== ======= Table of Contents ======================================================================== ======= 1. Overview, which includes: - OFED Distribution Rev 1.2 Contents - Supported Platforms and Operating Systems - Supported HCA and RNIC Adapter Cards and Firmware Versions - Tested Switch Platforms - Third party Test Packages - OFED sources 2. Main Changes from OFED 1.1 3. Fixed Bugs 4. Known Issues ======================================================================== ======= 1. Overview ======================================================================== ======= These are the release notes of OpenFabrics Enterprise Distribution (OFED) release 1.2. The OFED software package is composed of several software modules, and is intended for use on a computer cluster constructed as an InfiniBand subnet or iWARP network. Note: If you plan to upgrade the OFED package on your cluster, please upgrade all of its nodes to this new version. 1.1 OFED 1.2 Contents --------------------- The OFED package contains the following components: o OpenFabrics core and ULPs: - IB HCA drivers (mthca, ipath, ehca) - iWARP RNIC driver (cxgb3) - core - Upper Layer Protocols: IPoIB, SDP, SRP Initiator, iSER Host, RDS, uDAPL and VNIC. o OpenFabrics utilities: - OpenSM (OSM): InfiniBand Subnet Manager - Diagnostic tools - Performance tests o MPI: - OSU MPI stack supporting the InfiniBand and iWARP interface - Open MPI stack supporting the InfiniBand and iWARP interface - OSU MVAPICH2 stack supporting the InfiniBand and iWARP interface - MPI benchmark tests (OSU benchmarks, Intel MPI benchmarks, Presta) o Extra packages: - open-iscsi: open-iscsi initiator with iSER support - ib-bonding: Bonding driver for IPoIB interface o Sources of all software modules (under conditions mentioned in the modules' LICENSE files) o Documentation Notes: 1. The cxgb3 driver is in technology preview state. 2. The Virtual NIC (VNIC) driver is presented as a technology preview on OFED 1.2. 3. All other OFED components are of production quality. 4. See release notes for each package in the docs directory. 5. Any Topspin copyright belongs to Cisco Systems, Inc. 1.2 Supported Platforms and Operating Systems --------------------------------------------- o CPU architectures: - x86_64 - x86 - ia64 - ppc64 o Linux Operating Systems: - RedHat EL4 up3: 2.6.9-34.ELsmp - RedHat EL4 up4: 2.6.9-42.ELsmp - RedHat EL4 up5: 2.6.9-42.ELsmp - RedHat EL5: 2.6.9-42.ELsmp - SLES9 SP3: 2.6.5-7.244-smp - SLES10: 2.6.16.21-0.8-smp - kernel.org: 2.6.19.x and 2.6.20.x 1.3 HCAs and RNICs Supported ---------------------------- This release supports IB HCAs by Mellanox Technologies, Qlogic and IBM as well as iWARP RNICs by Chelsio Communications. o Mellanox Technologies HCAs: - InfiniHost (fw-23108 Rev 3.5.000) - InfiniHost III Ex (MemFree: fw-25218 Rev 5.2.000 with memory: fw-25208 Rev 4.8.200) - InfiniHost III Lx (fw-25204 Rev 1.2.000) The SDR and DDR modes of the InfiniHost III family are supported. For official firmware versions please see: http://www.mellanox.com/support/firmware_table.php o Qlogic HCAs: - QHT6040 (PathScale InfiniPath HT-460) - QHT6140 (PathScale InfiniPath HT-465) - QLE6140 (PathScale InfiniPath PE-880) o IBM HCAs: - GX Dual-port 4x IB HCA - GX Dual-port 12x IB HCA o Chelsio RNICs: - S310/S320 10GbE Storage Accelerators - R310E 10GbE iWARP Adapters 1.4 Switches Supported ---------------------- This release was tested with switches and gateways provided by the following companies: - Cisco - Voltaire - Qlogic - Flextronics 1.5 Third Party Packages ------------------------ The following third party packages have been tested with OFED 1.2: 1. Intel MPI, Version 3.0 - Package ID: l_mpi_p_3.0.043 2. HP MPI, Version 2.2.5 1.6 OFED Sources ---------------- Source repositories: http://www.openfabrics.org/git/ Kernel sources: ~vlad/ofed_1_2/.git User level Sources are located in all git trees starting with: ofed_1_2/ The kernel sources are based on Linux 2.6.20 mainline kernel. Its patches are included in the OFED sources directory. For details see HOWTO.build_ofed. ======================================================================== ======= 2. Main Changes from OFED 1.1 ======================================================================== ======= Note: For details regarding the various changes, please see the release notes for each package in the docs directory. 2.1 General changes o Kernel code based on 2.6.20 o New kernel modules: SA Cache, RDS, VNIC, bonding o High availability of SRP and IPoIB in GA level o Added iWARP support (with Chelsio driver) o MAN pages for libraries (libibverbs and librdmacm) 2.1 IPoIB o IPoIB Connected Mode o High availability support using the bonding module. 2.2 SDP o netstat is now available o Improved message BW - 10X for small messages - 5X for medium messages o Scalability - Added a memory consumption limit 2.3 SRP o High availability is now supported for all systems. 2.4 iSER o Testing more platforms (e.g., ppc64 and ia64) o Updated packages for ISCSI kernel & user components bundled with OFED. 2.5 uDAPL o Scalability features needed for Intel MPI 2.6 Libraries a. libibverbs 1.1 o Fork support (requires apps change) o Better low-level driver handling, including multiple drivers linked in statically o Documentation: man pages b. librdmacm (uCMA) 1.0 o Multicast joining from user space o UD support o Documentation: man pages 2.7 OSM o Routing improvements o Performance improvement to min hop and up/down of over an order of magnitude o New fat-tree and LASH algorithms o SA optional record support "virtually" complete o IB router enablement o SA database dump/restore 2.8 Management o Many diagnostic improvements since OFED 1.1 (see detailed RN) o ibdiagui: A GUI for ibdiagnet 2.9 Install o Default prefix directory is now /usr 2.6 MPI: a. OSU MVAPICH o Version was updated to 0.9.9 b. Open MPI o Version was updated to 1.2.1 o See http://www.open-mpi.org/svn/new.php for details c. OSU MVAPICH2 o MVAPICH2 version 0.98 was added to the OFED package. d. Common MPI setup sourcing Simple menu-driven interface to choose which MPI implementation to set as the default on a per-user and/or system-wide basis 2.7 iWARP Support o Chelsio NIC supported o Verbs and CMA APIs are the same as InfiniBand o ULPs supported - MPI (mvapich2 tested) - uDAPL o Basic Testing - uDAPL - mvapich2 - NFS-RDMA o Status: Beta ======================================================================== ======= 3. Fixed Bugs ======================================================================== ======= 1. OFED installation now supports installing lib32 on 64-bit systems. 2. Hotplug removal does not hang the system when the device is used by the uverbs interface. 3. MVAPICH now works on ppc64. 4. libibcm is now thread safe. Bugs fixed in each package are reported in the package's release notes. ======================================================================== ======= 4. Known Issues ======================================================================== ======= The following is a list of major limitations and known issues of the various components of the OFED 1.2 release. 1. Memory registration by theuser is limited according to the administrator setting. See "Pinning (Locking) User Memory Pages" in OFED_tips.txt for system configuration. 2. Fork support from kernel 2.6.12 and above is available provided that applications do not use threads. The fork() is supported as long as the parent process does not run before the child exits or calls exec(). The former can be achieved by calling wait(childpid), and the latter can be achieved by application specific means. The Posix system() call is supported. 3. The ipath driver is supported only on 64-bit platforms. 4. There are issues using Intel's MPI with the Qlogic card driver that cause failures. Note: See the release notes of each component for additional issues. ------------------------------ _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general End of general Digest, Vol 5, Issue 67 ************************************** From mst at dev.mellanox.co.il Tue Jun 19 09:56:58 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 19 Jun 2007 19:56:58 +0300 Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git In-Reply-To: References: <20070619055841.GC7069@mellanox.co.il> Message-ID: <20070619165658.GA31019@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [GIT PULL] please pull infiniband.git > > > BTW, have you seen the patch for ipoib cm crasher race? > > I think we need it in 2.6.22 too. > > I saw the discussion and it does seem like something to fix for > 2.6.22. But I didn't see a final conclusion on which patch to use, > and I don't think I ever saw a patch with a good changelog and > signed-off-by line either... please resend if I missed it. Did you get the one I reposted? Is the log OK? -- MST From sean.hefty at intel.com Tue Jun 19 10:51:05 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 19 Jun 2007 10:51:05 -0700 Subject: [ofa-general] hang at module removal with local sa patches applied In-Reply-To: <467717F3.9020806@ichips.intel.com> Message-ID: <001001c7b29a$68c0ec10$9c98070a@amr.corp.intel.com> >* It's possible for flush_workqueue to be called from the workqueue thread. > >* We hold a mutex when calling flush_workqueue, and a queued work item >will try to acquire that same mutex. There's no need to call flush_workqueue unless we're destroying the port as a result of removing the device. Can you see if the following patch fixes your unload issue? (I wasn't able to reproduce the original problem.) Signed-off-by: Sean Hefty --- Btw, I will have the cache disabled by default when I request the pull for 2.6.23. diff --git a/drivers/infiniband/core/local_sa.c b/drivers/infiniband/core/local_sa.c index aac3f2d..7c9a922 100644 --- a/drivers/infiniband/core/local_sa.c +++ b/drivers/infiniband/core/local_sa.c @@ -633,7 +633,6 @@ static void unsubscribe_port(struct sa_db_port *port) static void cleanup_port(struct sa_db_port *port) { unsubscribe_port(port); - flush_workqueue(sa_wq); clean_update_list(port); remove_all_attrs(&port->paths); @@ -1173,6 +1172,7 @@ static void destroy_port(struct sa_db_port *port) ib_unregister_mad_agent(port->agent); cleanup_port(port); + flush_workqueue(sa_wq); } static void sa_db_add_dev(struct ib_device *device) From arthur.jones at qlogic.com Tue Jun 19 16:40:30 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:40:30 -0700 Subject: [ofa-general] [PATCH] IB/ipath -- changes in for-roland for 2.6.23 Message-ID: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> hi roland, sorry for the first false alarm! i had the wrong CC. here, again, is our current backlog of patches that we'd like to go upstream into 2.6.23. these changes are avail via git-pull from: git://git.qlogic.com/ipath-linux-2.6 for-roland which is based on the kernel.org linux-2.6 tree. i wasn't sure if i should spam the list with all the patches, as they are avail via the git server above. how would you like that done in the future? thanks... arthur From arthur.jones at qlogic.com Tue Jun 19 16:40:35 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:40:35 -0700 Subject: [ofa-general] [PATCH 01/28] IB/ipath: include to fix ppc64 build In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234035.3794.7544.stgit@bauxite.internal.keyresearch.com> From: Bryan O'Sullivan Signed-off-by: Bryan O'Sullivan --- drivers/infiniband/hw/ipath/ipath_iba6110.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c index 4171198..ba73dd0 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6110.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c @@ -36,6 +36,7 @@ * HT chip. */ +#include #include #include #include From arthur.jones at qlogic.com Tue Jun 19 16:40:40 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:40:40 -0700 Subject: [ofa-general] [PATCH 02/28] IB/ipath -- support blinking LEDs with an led_override file In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234040.3794.82782.stgit@bauxite.internal.keyresearch.com> From: Michael Albaugh When we want to find an InfiniPath HCA in a rack of nodes, it is often expeditious to blink the status LEDs via a userspace /sys file. A write-only led_override "file" is published per device. Writes to this file are interpreted as (string form) numbers, and the resulting value sent to ipath_set_led_override(). The upper eight bits are interpretted as a 4.4 fixed-point "frequency in Hertz", and the bottom two 4-bit values are alternately (D0..3, then D4..7) used by the board-specific LED-setting function to override the normal state. Signed-off-by: Michael Albaugh --- drivers/infiniband/hw/ipath/ipath_driver.c | 92 +++++++++++++++++++++++++++ drivers/infiniband/hw/ipath/ipath_iba6110.c | 10 +++ drivers/infiniband/hw/ipath/ipath_iba6120.c | 10 +++ drivers/infiniband/hw/ipath/ipath_kernel.h | 19 ++++++ drivers/infiniband/hw/ipath/ipath_sysfs.c | 19 ++++++ 5 files changed, 149 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index e3a2232..0975932 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -1846,6 +1846,87 @@ void ipath_write_kreg_port(const struct ipath_devdata *dd, ipath_kreg regno, ipath_write_kreg(dd, where, value); } +/* + * Following deal with the "obviously simple" task of overriding the state + * of the LEDS, which normally indicate link physical and logical status. + * The complications arise in dealing with different hardware mappings + * and the board-dependent routine being called from interrupts. + * and then there's the requirement to _flash_ them. + */ +#define LED_OVER_FREQ_SHIFT 8 +#define LED_OVER_FREQ_MASK (0xFF<ipath_flags & IPATH_INITTED)) + return; + + pidx = dd->ipath_led_override_phase++ & 1; + dd->ipath_led_override = dd->ipath_led_override_vals[pidx]; + timeoff = dd->ipath_led_override_timeoff; + + /* + * below potentially restores the LED values per current status, + * should also possibly setup the traffic-blink register, + * but leave that to per-chip functions. + */ + val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibcstatus); + ltstate = (val >> INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) & + INFINIPATH_IBCS_LINKTRAININGSTATE_MASK; + lstate = (val >> INFINIPATH_IBCS_LINKSTATE_SHIFT) & + INFINIPATH_IBCS_LINKSTATE_MASK; + + dd->ipath_f_setextled(dd, lstate, ltstate); + mod_timer(&dd->ipath_led_override_timer, jiffies + timeoff); +} + +void ipath_set_led_override(struct ipath_devdata *dd, unsigned int val) +{ + int timeoff, freq; + + if (!(dd->ipath_flags & IPATH_INITTED)) + return; + + /* First check if we are blinking. If not, use 1HZ polling */ + timeoff = HZ; + freq = (val & LED_OVER_FREQ_MASK) >> LED_OVER_FREQ_SHIFT; + + if (freq) { + /* For blink, set each phase from one nybble of val */ + dd->ipath_led_override_vals[0] = val & 0xF; + dd->ipath_led_override_vals[1] = (val >> 4) & 0xF; + timeoff = (HZ << 4)/freq; + } else { + /* Non-blink set both phases the same. */ + dd->ipath_led_override_vals[0] = val & 0xF; + dd->ipath_led_override_vals[1] = val & 0xF; + } + dd->ipath_led_override_timeoff = timeoff; + + /* + * If the timer has not already been started, do so. Use a "quick" + * timeout so the function will be called soon, to look at our request. + */ + if (atomic_inc_return(&dd->ipath_led_override_timer_active) == 1) { + /* Need to start timer */ + init_timer(&dd->ipath_led_override_timer); + dd->ipath_led_override_timer.function = + ipath_run_led_override; + dd->ipath_led_override_timer.data = (unsigned long) dd; + dd->ipath_led_override_timer.expires = jiffies + 1; + add_timer(&dd->ipath_led_override_timer); + } else { + atomic_dec(&dd->ipath_led_override_timer_active); + } +} + /** * ipath_shutdown_device - shut down a device * @dd: the infinipath device @@ -1909,7 +1990,6 @@ void ipath_shutdown_device(struct ipath_devdata *dd) * Turn the LEDs off explictly for the same reason. */ dd->ipath_f_quiet_serdes(dd); - dd->ipath_f_setextled(dd, 0, 0); if (dd->ipath_stats_timer_active) { del_timer_sync(&dd->ipath_stats_timer); @@ -2085,6 +2165,16 @@ int ipath_reset_device(int unit) goto bail; } + if (atomic_read(&dd->ipath_led_override_timer_active)) { + /* Need to stop LED timer, _then_ shut off LEDs */ + del_timer_sync(&dd->ipath_led_override_timer); + atomic_set(&dd->ipath_led_override_timer_active, 0); + } + + /* Shut off LEDs after we are sure timer is not running */ + dd->ipath_led_override = LED_OVER_BOTH_OFF; + dd->ipath_f_setextled(dd, 0, 0); + dev_info(&dd->pcidev->dev, "Reset on unit %u requested\n", unit); if (!dd->ipath_kregbase || !(dd->ipath_flags & IPATH_PRESENT)) { diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c index ba73dd0..4372c6c 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6110.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c @@ -1065,6 +1065,16 @@ static void ipath_setup_ht_setextled(struct ipath_devdata *dd, if (ipath_diag_inuse) return; + /* Allow override of LED display for, e.g. Locating system in rack */ + if (dd->ipath_led_override) { + ltst = (dd->ipath_led_override & IPATH_LED_PHYS) + ? INFINIPATH_IBCS_LT_STATE_LINKUP + : INFINIPATH_IBCS_LT_STATE_DISABLED; + lst = (dd->ipath_led_override & IPATH_LED_LOG) + ? INFINIPATH_IBCS_L_STATE_ACTIVE + : INFINIPATH_IBCS_L_STATE_DOWN; + } + /* * start by setting both LED control bits to off, then turn * on the appropriate bit(s). diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c index 4e2e3df..bcb70d6 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c @@ -797,6 +797,16 @@ static void ipath_setup_pe_setextled(struct ipath_devdata *dd, u64 lst, if (ipath_diag_inuse) return; + /* Allow override of LED display for, e.g. Locating system in rack */ + if (dd->ipath_led_override) { + ltst = (dd->ipath_led_override & IPATH_LED_PHYS) + ? INFINIPATH_IBCS_LT_STATE_LINKUP + : INFINIPATH_IBCS_LT_STATE_DISABLED; + lst = (dd->ipath_led_override & IPATH_LED_LOG) + ? INFINIPATH_IBCS_L_STATE_ACTIVE + : INFINIPATH_IBCS_L_STATE_DOWN; + } + extctl = dd->ipath_extctrl & ~(INFINIPATH_EXTC_LED1PRIPORT_ON | INFINIPATH_EXTC_LED2PRIPORT_ON); diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index 12194f3..2f39db7 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -575,6 +575,16 @@ struct ipath_devdata { u16 ipath_gpio_scl_num; u64 ipath_gpio_sda; u64 ipath_gpio_scl; + + /* used to override LED behavior */ + u8 ipath_led_override; /* Substituted for normal value, if non-zero */ + u16 ipath_led_override_timeoff; /* delta to next timer event */ + u8 ipath_led_override_vals[2]; /* Alternates per blink-frame */ + u8 ipath_led_override_phase; /* Just counts, LSB picks from vals[] */ + atomic_t ipath_led_override_timer_active; + /* Used to flash LEDs in override mode */ + struct timer_list ipath_led_override_timer; + }; /* Private data for file operations */ @@ -717,6 +727,15 @@ u64 ipath_snap_cntr(struct ipath_devdata *, ipath_creg); void ipath_disarm_senderrbufs(struct ipath_devdata *, int); /* + * Set LED override, only the two LSBs have "public" meaning, but + * any non-zero value substitutes them for the Link and LinkTrain + * LED states. + */ +#define IPATH_LED_PHYS 1 /* Physical (linktraining) GREEN LED */ +#define IPATH_LED_LOG 2 /* Logical (link) YELLOW LED */ +void ipath_set_led_override(struct ipath_devdata *dd, unsigned int val); + +/* * number of words used for protocol header if not set by ipath_userinit(); */ #define IPATH_DFLT_RCVHDRSIZE 9 diff --git a/drivers/infiniband/hw/ipath/ipath_sysfs.c b/drivers/infiniband/hw/ipath/ipath_sysfs.c index 4dc398d..17ec145 100644 --- a/drivers/infiniband/hw/ipath/ipath_sysfs.c +++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c @@ -596,6 +596,23 @@ bail: return ret; } +static ssize_t store_led_override(struct device *dev, + struct device_attribute *attr, + const char *buf, + size_t count) +{ + struct ipath_devdata *dd = dev_get_drvdata(dev); + int ret; + u16 val; + + ret = ipath_parse_ushort(buf, &val); + if (ret > 0) + ipath_set_led_override(dd, val); + else + ipath_dev_err(dd, "attempt to set invalid LED override\n"); + return ret; +} + static DRIVER_ATTR(num_units, S_IRUGO, show_num_units, NULL); static DRIVER_ATTR(version, S_IRUGO, show_version, NULL); @@ -625,6 +642,7 @@ static DEVICE_ATTR(status_str, S_IRUGO, show_status_str, NULL); static DEVICE_ATTR(boardversion, S_IRUGO, show_boardversion, NULL); static DEVICE_ATTR(unit, S_IRUGO, show_unit, NULL); static DEVICE_ATTR(rx_pol_inv, S_IWUSR, NULL, store_rx_pol_inv); +static DEVICE_ATTR(led_override, S_IWUSR, NULL, store_led_override); static struct attribute *dev_attributes[] = { &dev_attr_guid.attr, @@ -641,6 +659,7 @@ static struct attribute *dev_attributes[] = { &dev_attr_unit.attr, &dev_attr_enabled.attr, &dev_attr_rx_pol_inv.attr, + &dev_attr_led_override.attr, NULL }; From arthur.jones at qlogic.com Tue Jun 19 16:40:45 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:40:45 -0700 Subject: [ofa-general] [PATCH 03/28] IB/ipath -- lock and always use shadow copies of GPIO register In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234045.3794.92822.stgit@bauxite.internal.keyresearch.com> From: Michael Albaugh The new LED blinking interface adds more contention for the unprotected GPIO pins that were already shared, though not commonly at the same time. We add locks to the accesses to these pins so that Read-Modify-Write is now safe. Some of these locks are added at interrupt context, so we shadow the registers which drive and inspect these pins to avoid the mmio read/writes. This mitigates the effects of the locks and hastens us through the interrupt. Add locking and always use shadows, for registers controlling GPIO pins (That would be ExtCtrl and GPIOout). The use of shadows implies doing less I/O, which can make I2C operation too fast on some platforms. An explicit udelay(1) in SCL manipulation fixes that. Signed-off-by: Michael Albaugh --- drivers/infiniband/hw/ipath/ipath_eeprom.c | 68 +++++++++++++++---------- drivers/infiniband/hw/ipath/ipath_iba6110.c | 3 + drivers/infiniband/hw/ipath/ipath_iba6120.c | 3 + drivers/infiniband/hw/ipath/ipath_init_chip.c | 2 + drivers/infiniband/hw/ipath/ipath_kernel.h | 7 ++- 5 files changed, 53 insertions(+), 30 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_eeprom.c b/drivers/infiniband/hw/ipath/ipath_eeprom.c index 030185f..26daac9 100644 --- a/drivers/infiniband/hw/ipath/ipath_eeprom.c +++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c @@ -95,39 +95,37 @@ static int i2c_gpio_set(struct ipath_devdata *dd, enum i2c_type line, enum i2c_state new_line_state) { - u64 read_val, write_val, mask, *gpioval; + u64 out_mask, dir_mask, *gpioval; + unsigned long flags = 0; gpioval = &dd->ipath_gpio_out; - read_val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_extctrl); - if (line == i2c_line_scl) - mask = dd->ipath_gpio_scl; - else - mask = dd->ipath_gpio_sda; - if (new_line_state == i2c_line_high) + if (line == i2c_line_scl) { + dir_mask = dd->ipath_gpio_scl; + out_mask = (1UL << dd->ipath_gpio_scl_num); + } else { + dir_mask = dd->ipath_gpio_sda; + out_mask = (1UL << dd->ipath_gpio_sda_num); + } + + spin_lock_irqsave(&dd->ipath_gpio_lock, flags); + if (new_line_state == i2c_line_high) { /* tri-state the output rather than force high */ - write_val = read_val & ~mask; - else + dd->ipath_extctrl &= ~dir_mask; + } else { /* config line to be an output */ - write_val = read_val | mask; - ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, write_val); + dd->ipath_extctrl |= dir_mask; + } + ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, dd->ipath_extctrl); - /* set high and verify */ + /* set output as well (no real verify) */ if (new_line_state == i2c_line_high) - write_val = 0x1UL; + *gpioval |= out_mask; else - write_val = 0x0UL; + *gpioval &= ~out_mask; - if (line == i2c_line_scl) { - write_val <<= dd->ipath_gpio_scl_num; - *gpioval = *gpioval & ~(1UL << dd->ipath_gpio_scl_num); - *gpioval |= write_val; - } else { - write_val <<= dd->ipath_gpio_sda_num; - *gpioval = *gpioval & ~(1UL << dd->ipath_gpio_sda_num); - *gpioval |= write_val; - } ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_out, *gpioval); + spin_unlock_irqrestore(&dd->ipath_gpio_lock, flags); return 0; } @@ -145,8 +143,9 @@ static int i2c_gpio_get(struct ipath_devdata *dd, enum i2c_type line, enum i2c_state *curr_statep) { - u64 read_val, write_val, mask; + u64 read_val, mask; int ret; + unsigned long flags = 0; /* check args */ if (curr_statep == NULL) { @@ -154,15 +153,21 @@ static int i2c_gpio_get(struct ipath_devdata *dd, goto bail; } - read_val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_extctrl); /* config line to be an input */ if (line == i2c_line_scl) mask = dd->ipath_gpio_scl; else mask = dd->ipath_gpio_sda; - write_val = read_val & ~mask; - ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, write_val); + + spin_lock_irqsave(&dd->ipath_gpio_lock, flags); + dd->ipath_extctrl &= ~mask; + ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, dd->ipath_extctrl); + /* + * Below is very unlikely to reflect true input state if Output + * Enable actually changed. + */ read_val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_extstatus); + spin_unlock_irqrestore(&dd->ipath_gpio_lock, flags); if (read_val & mask) *curr_statep = i2c_line_high; @@ -192,6 +197,7 @@ static void i2c_wait_for_writes(struct ipath_devdata *dd) static void scl_out(struct ipath_devdata *dd, u8 bit) { + udelay(1); i2c_gpio_set(dd, i2c_line_scl, bit ? i2c_line_high : i2c_line_low); i2c_wait_for_writes(dd); @@ -314,12 +320,18 @@ static int eeprom_reset(struct ipath_devdata *dd) int clock_cycles_left = 9; u64 *gpioval = &dd->ipath_gpio_out; int ret; + unsigned long flags; - eeprom_init = 1; + spin_lock_irqsave(&dd->ipath_gpio_lock, flags); + /* Make sure shadows are consistent */ + dd->ipath_extctrl = ipath_read_kreg64(dd, dd->ipath_kregs->kr_extctrl); *gpioval = ipath_read_kreg64(dd, dd->ipath_kregs->kr_gpio_out); + spin_unlock_irqrestore(&dd->ipath_gpio_lock, flags); + ipath_cdbg(VERBOSE, "Resetting i2c eeprom; initial gpioout reg " "is %llx\n", (unsigned long long) *gpioval); + eeprom_init = 1; /* * This is to get the i2c into a known state, by first going low, * then tristate sda (and then tristate scl as first thing diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c index 4372c6c..8482ea3 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6110.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c @@ -1059,6 +1059,7 @@ static void ipath_setup_ht_setextled(struct ipath_devdata *dd, u64 lst, u64 ltst) { u64 extctl; + unsigned long flags = 0; /* the diags use the LED to indicate diag info, so we leave * the external LED alone when the diags are running */ @@ -1075,6 +1076,7 @@ static void ipath_setup_ht_setextled(struct ipath_devdata *dd, : INFINIPATH_IBCS_L_STATE_DOWN; } + spin_lock_irqsave(&dd->ipath_gpio_lock, flags); /* * start by setting both LED control bits to off, then turn * on the appropriate bit(s). @@ -1103,6 +1105,7 @@ static void ipath_setup_ht_setextled(struct ipath_devdata *dd, } dd->ipath_extctrl = extctl; ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, extctl); + spin_unlock_irqrestore(&dd->ipath_gpio_lock, flags); } static void ipath_init_ht_variables(struct ipath_devdata *dd) diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c index bcb70d6..2345bb0 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c @@ -791,6 +791,7 @@ static void ipath_setup_pe_setextled(struct ipath_devdata *dd, u64 lst, u64 ltst) { u64 extctl; + unsigned long flags = 0; /* the diags use the LED to indicate diag info, so we leave * the external LED alone when the diags are running */ @@ -807,6 +808,7 @@ static void ipath_setup_pe_setextled(struct ipath_devdata *dd, u64 lst, : INFINIPATH_IBCS_L_STATE_DOWN; } + spin_lock_irqsave(&dd->ipath_gpio_lock, flags); extctl = dd->ipath_extctrl & ~(INFINIPATH_EXTC_LED1PRIPORT_ON | INFINIPATH_EXTC_LED2PRIPORT_ON); @@ -816,6 +818,7 @@ static void ipath_setup_pe_setextled(struct ipath_devdata *dd, u64 lst, extctl |= INFINIPATH_EXTC_LED1PRIPORT_ON; dd->ipath_extctrl = extctl; ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, extctl); + spin_unlock_irqrestore(&dd->ipath_gpio_lock, flags); } /** diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index 7045ba6..f6ee7a8 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -340,6 +340,8 @@ static int init_chip_first(struct ipath_devdata *dd, spin_lock_init(&dd->ipath_tid_lock); + spin_lock_init(&dd->ipath_gpio_lock); + done: *pdp = pd; return ret; diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index 2f39db7..bd1088a 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -399,6 +399,8 @@ struct ipath_devdata { u64 ipath_gpio_out; /* shadow the gpio mask register */ u64 ipath_gpio_mask; + /* shadow the gpio output enable, etc... */ + u64 ipath_extctrl; /* kr_revision shadow */ u64 ipath_revision; /* @@ -473,8 +475,6 @@ struct ipath_devdata { u32 ipath_cregbase; /* shadow the control register contents */ u32 ipath_control; - /* shadow the gpio output contents */ - u32 ipath_extctrl; /* PCI revision register (HTC rev on FPGA) */ u32 ipath_pcirev; @@ -576,6 +576,9 @@ struct ipath_devdata { u64 ipath_gpio_sda; u64 ipath_gpio_scl; + /* lock for doing RMW of shadows/regs for ExtCtrl and GPIO */ + spinlock_t ipath_gpio_lock; + /* used to override LED behavior */ u8 ipath_led_override; /* Substituted for normal value, if non-zero */ u16 ipath_led_override_timeoff; /* delta to next timer event */ From arthur.jones at qlogic.com Tue Jun 19 16:40:51 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:40:51 -0700 Subject: [ofa-general] [PATCH 04/28] IB/ipath - remove incompletely implemented ipath_runtime flags and code In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234050.3794.74578.stgit@bauxite.internal.keyresearch.com> From: John Gregor The IPATH_RUNTIME_PBC_REWRITE and the IPATH_RUNTIME_LOOSE_DMA_ALIGN flags were not ever implemented correctly and did not turn out to be necessary. Remove the last vestiges of these flags but mark the spot with a comment to remind us to not reuse these flags in the interest of binary compatibility. The INFINIPATH_XGXS_SUPPRESS_ARMLAUNCH_ERR bit was also not found to be useful, so it was dropped in the cleanup as well. Signed-off-by: John Gregor Signed-off-by: Arthur Jones --- drivers/infiniband/hw/ipath/ipath_common.h | 3 +-- drivers/infiniband/hw/ipath/ipath_iba6120.c | 25 ------------------------- 2 files changed, 1 insertions(+), 27 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_common.h b/drivers/infiniband/hw/ipath/ipath_common.h index 10c008f..12e1349 100644 --- a/drivers/infiniband/hw/ipath/ipath_common.h +++ b/drivers/infiniband/hw/ipath/ipath_common.h @@ -189,8 +189,7 @@ typedef enum _ipath_ureg { #define IPATH_RUNTIME_FORCE_WC_ORDER 0x4 #define IPATH_RUNTIME_RCVHDR_COPY 0x8 #define IPATH_RUNTIME_MASTER 0x10 -#define IPATH_RUNTIME_PBC_REWRITE 0x20 -#define IPATH_RUNTIME_LOOSE_DMA_ALIGN 0x40 +/* 0x20 and 0x40 are no longer used, but are reserved for ABI compatibility */ /* * This structure is returned by ipath_userinit() immediately after diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c index 2345bb0..7115907 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c @@ -296,13 +296,6 @@ static const struct ipath_cregs ipath_pe_cregs = { #define IPATH_GPIO_SCL (1ULL << \ (_IPATH_GPIO_SCL_NUM+INFINIPATH_EXTC_GPIOOE_SHIFT)) -/* - * Rev2 silicon allows suppressing check for ArmLaunch errors. - * this can speed up short packet sends on systems that do - * not guaranteee write-order. - */ -#define INFINIPATH_XGXS_SUPPRESS_ARMLAUNCH_ERR (1ULL<<63) - /* 6120 specific hardware errors... */ static const struct ipath_hwerror_msgs ipath_6120_hwerror_msgs[] = { INFINIPATH_HWE_MSG(PCIEPOISONEDTLP, "PCIe Poisoned TLP"), @@ -680,17 +673,6 @@ static int ipath_pe_bringup_serdes(struct ipath_devdata *dd) val |= dd->ipath_rx_pol_inv << INFINIPATH_XGXS_RX_POL_SHIFT; } - if (dd->ipath_minrev >= 2) { - /* Rev 2. can tolerate multiple writes to PBC, and - * allowing them can provide lower latency on some - * CPUs, but this feature is off by default, only - * turned on by setting D63 of XGXSconfig reg. - * May want to make this conditional more - * fine-grained in future. This is not exactly - * related to XGXS, but where the bit ended up. - */ - val |= INFINIPATH_XGXS_SUPPRESS_ARMLAUNCH_ERR; - } if (val != prev_val) ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, val); @@ -1324,13 +1306,6 @@ static int ipath_pe_get_base_info(struct ipath_portdata *pd, void *kbase) dd = pd->port_dd; - if (dd != NULL && dd->ipath_minrev >= 2) { - ipath_cdbg(PROC, "IBA6120 Rev2, allow multiple PBC write\n"); - kinfo->spi_runtime_flags |= IPATH_RUNTIME_PBC_REWRITE; - ipath_cdbg(PROC, "IBA6120 Rev2, allow loose DMA alignment\n"); - kinfo->spi_runtime_flags |= IPATH_RUNTIME_LOOSE_DMA_ALIGN; - } - done: kinfo->spi_runtime_flags |= IPATH_RUNTIME_PCIE; return 0; From arthur.jones at qlogic.com Tue Jun 19 16:40:57 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:40:57 -0700 Subject: [ofa-general] [PATCH 05/28] IB/ipath -- Log "active" time and some errors to EEPROM In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234056.3794.46973.stgit@bauxite.internal.keyresearch.com> From: Michael Albaugh We currently track various errors, now we enhance that capability by logging some of them to EEPROM. We also now log a cumulative "active" time defined by traffic though the InfiniPath HCA beyond the normal SM traffic. Signed-off-by: Michael Albaugh --- drivers/infiniband/hw/ipath/ipath_driver.c | 3 drivers/infiniband/hw/ipath/ipath_eeprom.c | 233 ++++++++++++++++++++++++- drivers/infiniband/hw/ipath/ipath_iba6110.c | 22 ++ drivers/infiniband/hw/ipath/ipath_iba6120.c | 27 +++ drivers/infiniband/hw/ipath/ipath_init_chip.c | 2 drivers/infiniband/hw/ipath/ipath_intr.c | 8 + drivers/infiniband/hw/ipath/ipath_kernel.h | 38 ++++ drivers/infiniband/hw/ipath/ipath_stats.c | 23 ++ drivers/infiniband/hw/ipath/ipath_sysfs.c | 22 ++ 9 files changed, 370 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index 0975932..e963986 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -2005,6 +2005,9 @@ void ipath_shutdown_device(struct ipath_devdata *dd) ~0ULL & ~INFINIPATH_HWE_MEMBISTFAILED); ipath_write_kreg(dd, dd->ipath_kregs->kr_errorclear, -1LL); ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, -1LL); + + ipath_cdbg(VERBOSE, "Flush time and errors to EEPROM\n"); + ipath_update_eeprom_log(dd); } /** diff --git a/drivers/infiniband/hw/ipath/ipath_eeprom.c b/drivers/infiniband/hw/ipath/ipath_eeprom.c index 26daac9..9be1b9a 100644 --- a/drivers/infiniband/hw/ipath/ipath_eeprom.c +++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c @@ -367,8 +367,8 @@ bail: * @len: number of bytes to receive */ -int ipath_eeprom_read(struct ipath_devdata *dd, u8 eeprom_offset, - void *buffer, int len) +static int ipath_eeprom_internal_read(struct ipath_devdata *dd, + u8 eeprom_offset, void *buffer, int len) { /* compiler complains unless initialized */ u8 single_byte = 0; @@ -418,6 +418,7 @@ bail: return ret; } + /** * ipath_eeprom_write - writes data to the eeprom via I2C * @dd: the infinipath device @@ -425,8 +426,8 @@ bail: * @buffer: data to write * @len: number of bytes to write */ -int ipath_eeprom_write(struct ipath_devdata *dd, u8 eeprom_offset, - const void *buffer, int len) +int ipath_eeprom_internal_write(struct ipath_devdata *dd, u8 eeprom_offset, + const void *buffer, int len) { u8 single_byte; int sub_len; @@ -500,6 +501,38 @@ bail: return ret; } +/* + * The public entry-points ipath_eeprom_read() and ipath_eeprom_write() + * are now just wrappers around the internal functions. + */ +int ipath_eeprom_read(struct ipath_devdata *dd, u8 eeprom_offset, + void *buff, int len) +{ + int ret; + + ret = down_interruptible(&dd->ipath_eep_sem); + if (!ret) { + ret = ipath_eeprom_internal_read(dd, eeprom_offset, buff, len); + up(&dd->ipath_eep_sem); + } + + return ret; +} + +int ipath_eeprom_write(struct ipath_devdata *dd, u8 eeprom_offset, + const void *buff, int len) +{ + int ret; + + ret = down_interruptible(&dd->ipath_eep_sem); + if (!ret) { + ret = ipath_eeprom_internal_write(dd, eeprom_offset, buff, len); + up(&dd->ipath_eep_sem); + } + + return ret; +} + static u8 flash_csum(struct ipath_flash *ifp, int adjust) { u8 *ip = (u8 *) ifp; @@ -527,7 +560,7 @@ void ipath_get_eeprom_info(struct ipath_devdata *dd) void *buf; struct ipath_flash *ifp; __be64 guid; - int len; + int len, eep_stat; u8 csum, *bguid; int t = dd->ipath_unit; struct ipath_devdata *dd0 = ipath_lookup(0); @@ -571,7 +604,11 @@ void ipath_get_eeprom_info(struct ipath_devdata *dd) goto bail; } - if (ipath_eeprom_read(dd, 0, buf, len)) { + down(&dd->ipath_eep_sem); + eep_stat = ipath_eeprom_internal_read(dd, 0, buf, len); + up(&dd->ipath_eep_sem); + + if (eep_stat) { ipath_dev_err(dd, "Failed reading GUID from eeprom\n"); goto done; } @@ -646,8 +683,192 @@ void ipath_get_eeprom_info(struct ipath_devdata *dd) ipath_cdbg(VERBOSE, "Initted GUID to %llx from eeprom\n", (unsigned long long) be64_to_cpu(dd->ipath_guid)); + memcpy(&dd->ipath_eep_st_errs, &ifp->if_errcntp, IPATH_EEP_LOG_CNT); + /* + * Power-on (actually "active") hours are kept as little-endian value + * in EEPROM, but as seconds in a (possibly as small as 24-bit) + * atomic_t while running. + */ + atomic_set(&dd->ipath_active_time, 0); + dd->ipath_eep_hrs = ifp->if_powerhour[0] | (ifp->if_powerhour[1] << 8); + done: vfree(buf); bail:; } + +/** + * ipath_update_eeprom_log - copy active-time and error counters to eeprom + * @dd: the infinipath device + * + * Although the time is kept as seconds in the ipath_devdata struct, it is + * rounded to hours for re-write, as we have only 16 bits in EEPROM. + * First-cut code reads whole (expected) struct ipath_flash, modifies, + * re-writes. Future direction: read/write only what we need, assuming + * that the EEPROM had to have been "good enough" for driver init, and + * if not, we aren't making it worse. + * + */ + +int ipath_update_eeprom_log(struct ipath_devdata *dd) +{ + void *buf; + struct ipath_flash *ifp; + int len, hi_water; + uint32_t new_time, new_hrs; + u8 csum; + int ret, idx; + unsigned long flags; + + /* first, check if we actually need to do anything. */ + ret = 0; + for (idx = 0; idx < IPATH_EEP_LOG_CNT; ++idx) { + if (dd->ipath_eep_st_new_errs[idx]) { + ret = 1; + break; + } + } + new_time = atomic_read(&dd->ipath_active_time); + + if (ret == 0 && new_time < 3600) + return 0; + + /* + * The quick-check above determined that there is something worthy + * of logging, so get current contents and do a more detailed idea. + */ + len = offsetof(struct ipath_flash, if_future); + buf = vmalloc(len); + ret = 1; + if (!buf) { + ipath_dev_err(dd, "Couldn't allocate memory to read %u " + "bytes from eeprom for logging\n", len); + goto bail; + } + + /* Grab semaphore and read current EEPROM. If we get an + * error, let go, but if not, keep it until we finish write. + */ + ret = down_interruptible(&dd->ipath_eep_sem); + if (ret) { + ipath_dev_err(dd, "Unable to acquire EEPROM for logging\n"); + goto free_bail; + } + ret = ipath_eeprom_internal_read(dd, 0, buf, len); + if (ret) { + up(&dd->ipath_eep_sem); + ipath_dev_err(dd, "Unable read EEPROM for logging\n"); + goto free_bail; + } + ifp = (struct ipath_flash *)buf; + + csum = flash_csum(ifp, 0); + if (csum != ifp->if_csum) { + up(&dd->ipath_eep_sem); + ipath_dev_err(dd, "EEPROM cks err (0x%02X, S/B 0x%02X)\n", + csum, ifp->if_csum); + ret = 1; + goto free_bail; + } + hi_water = 0; + spin_lock_irqsave(&dd->ipath_eep_st_lock, flags); + for (idx = 0; idx < IPATH_EEP_LOG_CNT; ++idx) { + int new_val = dd->ipath_eep_st_new_errs[idx]; + if (new_val) { + /* + * If we have seen any errors, add to EEPROM values + * We need to saturate at 0xFF (255) and we also + * would need to adjust the checksum if we were + * trying to minimize EEPROM traffic + * Note that we add to actual current count in EEPROM, + * in case it was altered while we were running. + */ + new_val += ifp->if_errcntp[idx]; + if (new_val > 0xFF) + new_val = 0xFF; + if (ifp->if_errcntp[idx] != new_val) { + ifp->if_errcntp[idx] = new_val; + hi_water = offsetof(struct ipath_flash, + if_errcntp) + idx; + } + /* + * update our shadow (used to minimize EEPROM + * traffic), to match what we are about to write. + */ + dd->ipath_eep_st_errs[idx] = new_val; + dd->ipath_eep_st_new_errs[idx] = 0; + } + } + /* + * now update active-time. We would like to round to the nearest hour + * but unless atomic_t are sure to be proper signed ints we cannot, + * because we need to account for what we "transfer" to EEPROM and + * if we log an hour at 31 minutes, then we would need to set + * active_time to -29 to accurately count the _next_ hour. + */ + if (new_time > 3600) { + new_hrs = new_time / 3600; + atomic_sub((new_hrs * 3600), &dd->ipath_active_time); + new_hrs += dd->ipath_eep_hrs; + if (new_hrs > 0xFFFF) + new_hrs = 0xFFFF; + dd->ipath_eep_hrs = new_hrs; + if ((new_hrs & 0xFF) != ifp->if_powerhour[0]) { + ifp->if_powerhour[0] = new_hrs & 0xFF; + hi_water = offsetof(struct ipath_flash, if_powerhour); + } + if ((new_hrs >> 8) != ifp->if_powerhour[1]) { + ifp->if_powerhour[1] = new_hrs >> 8; + hi_water = offsetof(struct ipath_flash, if_powerhour) + + 1; + } + } + /* + * There is a tiny possibility that we could somehow fail to write + * the EEPROM after updating our shadows, but problems from holding + * the spinlock too long are a much bigger issue. + */ + spin_unlock_irqrestore(&dd->ipath_eep_st_lock, flags); + if (hi_water) { + /* we made some change to the data, uopdate cksum and write */ + csum = flash_csum(ifp, 1); + ret = ipath_eeprom_internal_write(dd, 0, buf, hi_water + 1); + } + up(&dd->ipath_eep_sem); + if (ret) + ipath_dev_err(dd, "Failed updating EEPROM\n"); + +free_bail: + vfree(buf); +bail: + return ret; + +} + +/** + * ipath_inc_eeprom_err - increment one of the four error counters + * that are logged to EEPROM. + * @dd: the infinipath device + * @eidx: 0..3, the counter to increment + * @incr: how much to add + * + * Each counter is 8-bits, and saturates at 255 (0xFF). They + * are copied to the EEPROM (aka flash) whenever ipath_update_eeprom_log() + * is called, but it can only be called in a context that allows sleep. + * This function can be called even at interrupt level. + */ + +void ipath_inc_eeprom_err(struct ipath_devdata *dd, u32 eidx, u32 incr) +{ + uint new_val; + unsigned long flags; + + spin_lock_irqsave(&dd->ipath_eep_st_lock, flags); + new_val = dd->ipath_eep_st_new_errs[eidx] + incr; + if (new_val > 255) + new_val = 255; + dd->ipath_eep_st_new_errs[eidx] = new_val; + spin_unlock_irqrestore(&dd->ipath_eep_st_lock, flags); + return; +} diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c index 8482ea3..85f408d 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6110.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c @@ -440,6 +440,7 @@ static void ipath_ht_handle_hwerrors(struct ipath_devdata *dd, char *msg, u32 bits, ctrl; int isfatal = 0; char bitsmsg[64]; + int log_idx; hwerrs = ipath_read_kreg64(dd, dd->ipath_kregs->kr_hwerrstatus); @@ -468,6 +469,11 @@ static void ipath_ht_handle_hwerrors(struct ipath_devdata *dd, char *msg, hwerrs &= dd->ipath_hwerrmask; + /* We log some errors to EEPROM, check if we have any of those. */ + for (log_idx = 0; log_idx < IPATH_EEP_LOG_CNT; ++log_idx) + if (hwerrs & dd->ipath_eep_st_masks[log_idx].hwerrs_to_log) + ipath_inc_eeprom_err(dd, log_idx, 1); + /* * make sure we get this much out, unless told to be quiet, * it's a parity error we may recover from, @@ -1171,6 +1177,22 @@ static void ipath_init_ht_variables(struct ipath_devdata *dd) dd->ipath_i_rcvavail_mask = INFINIPATH_I_RCVAVAIL_MASK; dd->ipath_i_rcvurg_mask = INFINIPATH_I_RCVURG_MASK; + + /* + * EEPROM error log 0 is TXE Parity errors. 1 is RXE Parity. + * 2 is Some Misc, 3 is reserved for future. + */ + dd->ipath_eep_st_masks[0].hwerrs_to_log = + INFINIPATH_HWE_TXEMEMPARITYERR_MASK << + INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT; + + dd->ipath_eep_st_masks[1].hwerrs_to_log = + INFINIPATH_HWE_RXEMEMPARITYERR_MASK << + INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT; + + dd->ipath_eep_st_masks[2].errs_to_log = + INFINIPATH_E_INVALIDADDR | INFINIPATH_E_RESET; + } /** diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c index 7115907..207323a 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c @@ -340,6 +340,7 @@ static void ipath_pe_handle_hwerrors(struct ipath_devdata *dd, char *msg, u32 bits, ctrl; int isfatal = 0; char bitsmsg[64]; + int log_idx; hwerrs = ipath_read_kreg64(dd, dd->ipath_kregs->kr_hwerrstatus); if (!hwerrs) { @@ -367,6 +368,11 @@ static void ipath_pe_handle_hwerrors(struct ipath_devdata *dd, char *msg, hwerrs &= dd->ipath_hwerrmask; + /* We log some errors to EEPROM, check if we have any of those. */ + for (log_idx = 0; log_idx < IPATH_EEP_LOG_CNT; ++log_idx) + if (hwerrs & dd->ipath_eep_st_masks[log_idx].hwerrs_to_log) + ipath_inc_eeprom_err(dd, log_idx, 1); + /* * make sure we get this much out, unless told to be quiet, * or it's occurred within the last 5 seconds @@ -950,6 +956,27 @@ static void ipath_init_pe_variables(struct ipath_devdata *dd) dd->ipath_i_rcvavail_mask = INFINIPATH_I_RCVAVAIL_MASK; dd->ipath_i_rcvurg_mask = INFINIPATH_I_RCVURG_MASK; + + /* + * EEPROM error log 0 is TXE Parity errors. 1 is RXE Parity. + * 2 is Some Misc, 3 is reserved for future. + */ + dd->ipath_eep_st_masks[0].hwerrs_to_log = + INFINIPATH_HWE_TXEMEMPARITYERR_MASK << + INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT; + + /* Ignore errors in PIO/PBC on systems with unordered write-combining */ + if (ipath_unordered_wc()) + dd->ipath_eep_st_masks[0].hwerrs_to_log &= ~TXE_PIO_PARITY; + + dd->ipath_eep_st_masks[1].hwerrs_to_log = + INFINIPATH_HWE_RXEMEMPARITYERR_MASK << + INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT; + + dd->ipath_eep_st_masks[2].errs_to_log = + INFINIPATH_E_INVALIDADDR | INFINIPATH_E_RESET; + + } /* setup the MSI stuff again after a reset. I'd like to just call diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index f6ee7a8..ee83934 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -341,6 +341,8 @@ static int init_chip_first(struct ipath_devdata *dd, spin_lock_init(&dd->ipath_tid_lock); spin_lock_init(&dd->ipath_gpio_lock); + spin_lock_init(&dd->ipath_eep_st_lock); + sema_init(&dd->ipath_eep_sem, 1); done: *pdp = pd; diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c index a90d3b5..d9cdd00 100644 --- a/drivers/infiniband/hw/ipath/ipath_intr.c +++ b/drivers/infiniband/hw/ipath/ipath_intr.c @@ -505,6 +505,7 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs) int i, iserr = 0; int chkerrpkts = 0, noprint = 0; unsigned supp_msgs; + int log_idx; supp_msgs = handle_frequent_errors(dd, errs, msg, &noprint); @@ -518,6 +519,13 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs) if (errs & INFINIPATH_E_HARDWARE) { /* reuse same msg buf */ dd->ipath_f_handle_hwerrors(dd, msg, sizeof msg); + } else { + u64 mask; + for (log_idx = 0; log_idx < IPATH_EEP_LOG_CNT; ++log_idx) { + mask = dd->ipath_eep_st_masks[log_idx].errs_to_log; + if (errs & mask) + ipath_inc_eeprom_err(dd, log_idx, 1); + } } if (!noprint && (errs & ~dd->ipath_e_bitsextant)) diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index bd1088a..2a4414b 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -57,6 +57,24 @@ extern struct infinipath_stats ipath_stats; #define IPATH_CHIP_SWVERSION IPATH_CHIP_VERS_MAJ +/* + * First-cut critierion for "device is active" is + * two thousand dwords combined Tx, Rx traffic per + * 5-second interval. SMA packets are 64 dwords, + * and occur "a few per second", presumably each way. + */ +#define IPATH_TRAFFIC_ACTIVE_THRESHOLD (2000) +/* + * Struct used to indicate which errors are logged in each of the + * error-counters that are logged to EEPROM. A counter is incremented + * _once_ (saturating at 255) for each event with any bits set in + * the error or hwerror register masks below. + */ +#define IPATH_EEP_LOG_CNT (4) +struct ipath_eep_log_mask { + u64 errs_to_log; + u64 hwerrs_to_log; +}; struct ipath_portdata { void **port_rcvegrbuf; @@ -588,6 +606,24 @@ struct ipath_devdata { /* Used to flash LEDs in override mode */ struct timer_list ipath_led_override_timer; + /* Support (including locks) for EEPROM logging of errors and time */ + /* control access to actual counters, timer */ + spinlock_t ipath_eep_st_lock; + /* control high-level access to EEPROM */ + struct semaphore ipath_eep_sem; + /* Below inc'd by ipath_snap_cntrs(), locked by ipath_eep_st_lock */ + uint64_t ipath_traffic_wds; + /* active time is kept in seconds, but logged in hours */ + atomic_t ipath_active_time; + /* Below are nominal shadow of EEPROM, new since last EEPROM update */ + uint8_t ipath_eep_st_errs[IPATH_EEP_LOG_CNT]; + uint8_t ipath_eep_st_new_errs[IPATH_EEP_LOG_CNT]; + uint16_t ipath_eep_hrs; + /* + * masks for which bits of errs, hwerrs that cause + * each of the counters to increment. + */ + struct ipath_eep_log_mask ipath_eep_st_masks[IPATH_EEP_LOG_CNT]; }; /* Private data for file operations */ @@ -726,6 +762,8 @@ u32 __iomem *ipath_getpiobuf(struct ipath_devdata *, u32 *); void ipath_init_iba6120_funcs(struct ipath_devdata *); void ipath_init_iba6110_funcs(struct ipath_devdata *); void ipath_get_eeprom_info(struct ipath_devdata *); +int ipath_update_eeprom_log(struct ipath_devdata *dd); +void ipath_inc_eeprom_err(struct ipath_devdata *dd, u32 eidx, u32 incr); u64 ipath_snap_cntr(struct ipath_devdata *, ipath_creg); void ipath_disarm_senderrbufs(struct ipath_devdata *, int); diff --git a/drivers/infiniband/hw/ipath/ipath_stats.c b/drivers/infiniband/hw/ipath/ipath_stats.c index d8b5e4c..2955f36 100644 --- a/drivers/infiniband/hw/ipath/ipath_stats.c +++ b/drivers/infiniband/hw/ipath/ipath_stats.c @@ -55,6 +55,7 @@ u64 ipath_snap_cntr(struct ipath_devdata *dd, ipath_creg creg) u64 val64; unsigned long t0, t1; u64 ret; + unsigned long flags; t0 = jiffies; /* If fast increment counters are only 32 bits, snapshot them, @@ -91,12 +92,18 @@ u64 ipath_snap_cntr(struct ipath_devdata *dd, ipath_creg creg) if (creg == dd->ipath_cregs->cr_wordsendcnt) { if (val != dd->ipath_lastsword) { dd->ipath_sword += val - dd->ipath_lastsword; + spin_lock_irqsave(&dd->ipath_eep_st_lock, flags); + dd->ipath_traffic_wds += val - dd->ipath_lastsword; + spin_unlock_irqrestore(&dd->ipath_eep_st_lock, flags); dd->ipath_lastsword = val; } val64 = dd->ipath_sword; } else if (creg == dd->ipath_cregs->cr_wordrcvcnt) { if (val != dd->ipath_lastrword) { dd->ipath_rword += val - dd->ipath_lastrword; + spin_lock_irqsave(&dd->ipath_eep_st_lock, flags); + dd->ipath_traffic_wds += val - dd->ipath_lastrword; + spin_unlock_irqrestore(&dd->ipath_eep_st_lock, flags); dd->ipath_lastrword = val; } val64 = dd->ipath_rword; @@ -200,6 +207,7 @@ void ipath_get_faststats(unsigned long opaque) struct ipath_devdata *dd = (struct ipath_devdata *) opaque; u32 val; static unsigned cnt; + unsigned long flags; /* * don't access the chip while running diags, or memory diags can @@ -210,9 +218,20 @@ void ipath_get_faststats(unsigned long opaque) /* but re-arm the timer, for diags case; won't hurt other */ goto done; + /* + * We now try to maintain a "active timer", based on traffic + * exceeding a threshold, so we need to check the word-counts + * even if they are 64-bit. + */ + ipath_snap_cntr(dd, dd->ipath_cregs->cr_wordsendcnt); + ipath_snap_cntr(dd, dd->ipath_cregs->cr_wordrcvcnt); + spin_lock_irqsave(&dd->ipath_eep_st_lock, flags); + if (dd->ipath_traffic_wds >= IPATH_TRAFFIC_ACTIVE_THRESHOLD) + atomic_add(5, &dd->ipath_active_time); /* S/B #define */ + dd->ipath_traffic_wds = 0; + spin_unlock_irqrestore(&dd->ipath_eep_st_lock, flags); + if (dd->ipath_flags & IPATH_32BITCOUNTERS) { - ipath_snap_cntr(dd, dd->ipath_cregs->cr_wordsendcnt); - ipath_snap_cntr(dd, dd->ipath_cregs->cr_wordrcvcnt); ipath_snap_cntr(dd, dd->ipath_cregs->cr_pktsendcnt); ipath_snap_cntr(dd, dd->ipath_cregs->cr_pktrcvcnt); } diff --git a/drivers/infiniband/hw/ipath/ipath_sysfs.c b/drivers/infiniband/hw/ipath/ipath_sysfs.c index 17ec145..ab34d3e 100644 --- a/drivers/infiniband/hw/ipath/ipath_sysfs.c +++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c @@ -613,6 +613,26 @@ static ssize_t store_led_override(struct device *dev, return ret; } +static ssize_t show_logged_errs(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct ipath_devdata *dd = dev_get_drvdata(dev); + int idx, count; + + /* force consistency with actual EEPROM */ + if (ipath_update_eeprom_log(dd) != 0) + return -ENXIO; + + count = 0; + for (idx = 0; idx < IPATH_EEP_LOG_CNT; ++idx) { + count += scnprintf(buf + count, PAGE_SIZE - count, "%d%c", + dd->ipath_eep_st_errs[idx], + idx == (IPATH_EEP_LOG_CNT - 1) ? '\n' : ' '); + } + + return count; +} static DRIVER_ATTR(num_units, S_IRUGO, show_num_units, NULL); static DRIVER_ATTR(version, S_IRUGO, show_version, NULL); @@ -643,6 +663,7 @@ static DEVICE_ATTR(boardversion, S_IRUGO, show_boardversion, NULL); static DEVICE_ATTR(unit, S_IRUGO, show_unit, NULL); static DEVICE_ATTR(rx_pol_inv, S_IWUSR, NULL, store_rx_pol_inv); static DEVICE_ATTR(led_override, S_IWUSR, NULL, store_led_override); +static DEVICE_ATTR(logged_errors, S_IRUGO, show_logged_errs, NULL); static struct attribute *dev_attributes[] = { &dev_attr_guid.attr, @@ -660,6 +681,7 @@ static struct attribute *dev_attributes[] = { &dev_attr_enabled.attr, &dev_attr_rx_pol_inv.attr, &dev_attr_led_override.attr, + &dev_attr_logged_errors.attr, NULL }; From arthur.jones at qlogic.com Tue Jun 19 16:41:03 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:41:03 -0700 Subject: [ofa-general] [PATCH 06/28] IB/ipath - Support the IBA6110 revision 4 In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234102.3794.86911.stgit@bauxite.internal.keyresearch.com> From: Dave Olson Recognize IBA 6110 Revision 4, same feature set, etc. as earlier revisions. Signed-off-by: Dave Olson --- drivers/infiniband/hw/ipath/ipath_iba6110.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c index 85f408d..0479985 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6110.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c @@ -680,9 +680,9 @@ static int ipath_ht_boardname(struct ipath_devdata *dd, char *name, snprintf(name, namelen, "%s", n); if (dd->ipath_majrev != 3 || (dd->ipath_minrev < 2 || - dd->ipath_minrev > 3)) { + dd->ipath_minrev > 4)) { /* - * This version of the driver only supports Rev 3.2 and 3.3 + * This version of the driver only supports Rev 3.2 - 3.4 */ ipath_dev_err(dd, "Unsupported InfiniPath hardware revision %u.%u!\n", From arthur.jones at qlogic.com Tue Jun 19 16:41:09 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:41:09 -0700 Subject: [ofa-general] [PATCH 07/28] IB/ipath - fix maximum MTU reporting In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234108.3794.33107.stgit@bauxite.internal.keyresearch.com> From: Robert Walsh Although our chip supports 4K MTUs, our driver doesn't yet support this feature, so limit the maximum MTU to 2K until we get support for 4K MTUs implemented. Signed-off-by: Robert Walsh --- drivers/infiniband/hw/ipath/ipath_fs.c | 7 ++++++- drivers/infiniband/hw/ipath/ipath_init_chip.c | 7 ++++++- drivers/infiniband/hw/ipath/ipath_mad.c | 7 ++++++- drivers/infiniband/hw/ipath/ipath_qp.c | 7 ++++++- drivers/infiniband/hw/ipath/ipath_verbs.c | 7 ++++++- 5 files changed, 30 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_fs.c b/drivers/infiniband/hw/ipath/ipath_fs.c index ebd5c7b..40cf1bc 100644 --- a/drivers/infiniband/hw/ipath/ipath_fs.c +++ b/drivers/infiniband/hw/ipath/ipath_fs.c @@ -257,9 +257,14 @@ static ssize_t atomic_port_info_read(struct file *file, char __user *buf, /* Notimpl InitType (actually, an SMA decision) */ /* VLHighLimit is 0 (only one VL) */ ; /* VLArbitrationHighCap is 0 (only one VL) */ + /* + * Note: the chips support a maximum MTU of 4096, but the driver + * hasn't implemented this feature yet, so set the maximum + * to 2048. + */ portinfo[10] = /* VLArbitrationLowCap is 0 (only one VL) */ /* InitTypeReply is SMA decision */ - (5 << 16) /* MTUCap 4096 */ + (4 << 16) /* MTUCap 2048 */ | (7 << 13) /* VLStallCount */ | (0x1f << 8) /* HOQLife */ | (1 << 4) diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index ee83934..bdfda62 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -310,7 +310,12 @@ static int init_chip_first(struct ipath_devdata *dd, val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_sendpiosize); dd->ipath_piosize2k = val & ~0U; dd->ipath_piosize4k = val >> 32; - dd->ipath_ibmtu = 4096; /* default to largest legal MTU */ + /* + * Note: the chips support a maximum MTU of 4096, but the driver + * hasn't implemented this feature yet, so set the initial value + * to 2048. + */ + dd->ipath_ibmtu = 2048; val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_sendpiobufcnt); dd->ipath_piobcnt2k = val & ~0U; dd->ipath_piobcnt4k = val >> 32; diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c b/drivers/infiniband/hw/ipath/ipath_mad.c index 25908b0..2e9e161 100644 --- a/drivers/infiniband/hw/ipath/ipath_mad.c +++ b/drivers/infiniband/hw/ipath/ipath_mad.c @@ -292,7 +292,12 @@ static int recv_subn_get_portinfo(struct ib_smp *smp, /* pip->vl_arb_high_cap; // only one VL */ /* pip->vl_arb_low_cap; // only one VL */ /* InitTypeReply = 0 */ - pip->inittypereply_mtucap = IB_MTU_4096; + /* + * Note: the chips support a maximum MTU of 4096, but the driver + * hasn't implemented this feature yet, so set the maximum value + * to 2048. + */ + pip->inittypereply_mtucap = IB_MTU_2048; // HCAs ignore VLStallCount and HOQLife /* pip->vlstallcnt_hoqlife; */ pip->operationalvl_pei_peo_fpi_fpo = 0x10; /* OVLs = 1 */ diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c b/drivers/infiniband/hw/ipath/ipath_qp.c index bfef08e..9e07abb 100644 --- a/drivers/infiniband/hw/ipath/ipath_qp.c +++ b/drivers/infiniband/hw/ipath/ipath_qp.c @@ -507,8 +507,13 @@ int ipath_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, attr->port_num > ibqp->device->phys_port_cnt) goto inval; + /* + * Note: the chips support a maximum MTU of 4096, but the driver + * hasn't implemented this feature yet, so don't allow Path MTU + * values greater than 2048. + */ if (attr_mask & IB_QP_PATH_MTU) - if (attr->path_mtu > IB_MTU_4096) + if (attr->path_mtu > IB_MTU_2048) goto inval; if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC) diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index bb70845..980b64a 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -1051,7 +1051,12 @@ static int ipath_query_port(struct ib_device *ibdev, props->max_vl_num = 1; /* VLCap = VL0 */ props->init_type_reply = 0; - props->max_mtu = IB_MTU_4096; + /* + * Note: the chips support a maximum MTU of 4096, but the driver + * hasn't implemented this feature yet, so set the maximum value + * to 2048. + */ + props->max_mtu = IB_MTU_2048; switch (dev->dd->ipath_ibmtu) { case 4096: mtu = IB_MTU_4096; From arthur.jones at qlogic.com Tue Jun 19 16:41:14 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:41:14 -0700 Subject: [ofa-general] [PATCH 08/28] IB/ipath -- fill in some missing FMR-related fields In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234114.3794.6698.stgit@bauxite.internal.keyresearch.com> From: Robert Walsh In ipath_query_device(), some of the struct ib_device_attr fields were not being initialized. Signed-off-by: Robert Walsh --- drivers/infiniband/hw/ipath/ipath_verbs.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index 980b64a..04294ca 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -981,6 +981,8 @@ static int ipath_query_device(struct ib_device *ibdev, props->max_ah = ib_ipath_max_ahs; props->max_cqe = ib_ipath_max_cqes; props->max_mr = dev->lk_table.max; + props->max_fmr = dev->lk_table.max; + props->max_map_per_fmr = 32767; props->max_pd = ib_ipath_max_pds; props->max_qp_rd_atom = IPATH_MAX_RDMA_ATOMIC; props->max_qp_init_rd_atom = 255; From arthur.jones at qlogic.com Tue Jun 19 16:41:20 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:41:20 -0700 Subject: [ofa-general] [PATCH 09/28] IB/ipath - fix problem with next WQE after a UC completion In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234119.3794.43684.stgit@bauxite.internal.keyresearch.com> From: Ralph Campbell This patch fixes a bug introduced when moving some code around for readability. Setting the wqe pointer at the end of the function is a NOP since it isn't used. Move it back to where it is used. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_uc.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_uc.c b/drivers/infiniband/hw/ipath/ipath_uc.c index 1c2b03c..49d650c 100644 --- a/drivers/infiniband/hw/ipath/ipath_uc.c +++ b/drivers/infiniband/hw/ipath/ipath_uc.c @@ -58,7 +58,6 @@ static void complete_last_send(struct ipath_qp *qp, struct ipath_swqe *wqe, wc->port_num = 0; ipath_cq_enter(to_icq(qp->ibqp.send_cq), wc, 0); } - wqe = get_swqe_ptr(qp, qp->s_last); } /** @@ -97,8 +96,10 @@ int ipath_make_uc_req(struct ipath_qp *qp, * Signal the completion of the last send * (if there is one). */ - if (qp->s_last != qp->s_tail) + if (qp->s_last != qp->s_tail) { complete_last_send(qp, wqe, &wc); + wqe = get_swqe_ptr(qp, qp->s_last); + } /* Check if send work queue is empty. */ if (qp->s_tail == qp->s_head) From arthur.jones at qlogic.com Tue Jun 19 16:41:26 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:41:26 -0700 Subject: [ofa-general] [PATCH 10/28] IB/ipath - fix local loopback bug when waiting for resources In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234125.3794.25265.stgit@bauxite.internal.keyresearch.com> From: Ralph Campbell This patch fixes a minor bug where the wrong QP was checked for a send work request which should wait for an RNR timeout. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_ruc.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c index d9c2a9b..8c5d20a 100644 --- a/drivers/infiniband/hw/ipath/ipath_ruc.c +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c @@ -267,7 +267,7 @@ again: spin_lock_irqsave(&sqp->s_lock, flags); if (!(ib_ipath_state_ops[sqp->state] & IPATH_PROCESS_SEND_OK) || - qp->s_rnr_timeout) { + sqp->s_rnr_timeout) { spin_unlock_irqrestore(&sqp->s_lock, flags); goto done; } From arthur.jones at qlogic.com Tue Jun 19 16:41:32 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:41:32 -0700 Subject: [ofa-general] [PATCH 11/28] IB/ipath - set M bit in BTH according to IB spec. In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234131.3794.4718.stgit@bauxite.internal.keyresearch.com> From: Ralph Campbell According to ch. 17.2.8.1.1, QPs start in the migrated state and should send packets with the M bit set in the BTH. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_rc.c | 6 +++--- drivers/infiniband/hw/ipath/ipath_uc.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c index 1915771..9ba80d1 100644 --- a/drivers/infiniband/hw/ipath/ipath_rc.c +++ b/drivers/infiniband/hw/ipath/ipath_rc.c @@ -188,7 +188,7 @@ static int ipath_make_rc_ack(struct ipath_qp *qp, } qp->s_hdrwords = hwords; qp->s_cur_size = len; - *bth0p = bth0; + *bth0p = bth0 | (1 << 22); /* Set M bit */ *bth2p = bth2; return 1; @@ -240,7 +240,7 @@ int ipath_make_rc_req(struct ipath_qp *qp, /* header size in 32-bit words LRH+BTH = (8+12)/4. */ hwords = 5; - bth0 = 0; + bth0 = 1 << 22; /* Set M bit */ /* Send a request. */ wqe = get_swqe_ptr(qp, qp->s_cur); @@ -604,7 +604,7 @@ static void send_rc_ack(struct ipath_qp *qp) } /* read pkey_index w/o lock (its atomic) */ bth0 = ipath_get_pkey(dev->dd, qp->s_pkey_index) | - OP(ACKNOWLEDGE) << 24; + (OP(ACKNOWLEDGE) << 24) | (1 << 22); if (qp->r_nak_state) ohdr->u.aeth = cpu_to_be32((qp->r_msn & IPATH_MSN_MASK) | (qp->r_nak_state << diff --git a/drivers/infiniband/hw/ipath/ipath_uc.c b/drivers/infiniband/hw/ipath/ipath_uc.c index 49d650c..243d7c6 100644 --- a/drivers/infiniband/hw/ipath/ipath_uc.c +++ b/drivers/infiniband/hw/ipath/ipath_uc.c @@ -86,7 +86,7 @@ int ipath_make_uc_req(struct ipath_qp *qp, /* header size in 32-bit words LRH+BTH = (8+12)/4. */ hwords = 5; - bth0 = 0; + bth0 = 1 << 22; /* Set M bit */ /* Get the next send request. */ wqe = get_swqe_ptr(qp, qp->s_last); From arthur.jones at qlogic.com Tue Jun 19 16:41:38 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:41:38 -0700 Subject: [ofa-general] [PATCH 12/28] IB/ipath - Change use of constants for TID type to defined values In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234137.3794.42065.stgit@bauxite.internal.keyresearch.com> From: Joan Eslinger Define pkt rcvd 'type' in a way consistent w/ h/w spec and chips The hardware considers received packets of type 0 to be expected, and type 1 to be eager. The driver was calling the ipath_f_put_tid functions using a variable called 'type' set to 0 for eager and to 1 for expected packets. Worse, the iba6110 and iba6120 drivers used those values inconsistently. This was quite confusing. Now everything is consistent with the hardware. Signed-off-by: Dave Olson --- drivers/infiniband/hw/ipath/ipath_file_ops.c | 12 ++++++++---- drivers/infiniband/hw/ipath/ipath_iba6110.c | 10 ++++++---- drivers/infiniband/hw/ipath/ipath_iba6120.c | 14 ++++++++------ drivers/infiniband/hw/ipath/ipath_init_chip.c | 3 ++- 4 files changed, 24 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c index 1272aaf..931802b 100644 --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c @@ -396,7 +396,8 @@ static int ipath_tid_update(struct ipath_portdata *pd, struct file *fp, "TID %u, vaddr %lx, physaddr %llx pgp %p\n", tid, vaddr, (unsigned long long) physaddr, pagep[i]); - dd->ipath_f_put_tid(dd, &tidbase[tid], 1, physaddr); + dd->ipath_f_put_tid(dd, &tidbase[tid], RCVHQ_RCV_TYPE_EXPECTED, + physaddr); /* * don't check this tid in ipath_portshadow, since we * just filled it in; start with the next one. @@ -422,7 +423,8 @@ static int ipath_tid_update(struct ipath_portdata *pd, struct file *fp, if (dd->ipath_pageshadow[porttid + tid]) { ipath_cdbg(VERBOSE, "Freeing TID %u\n", tid); - dd->ipath_f_put_tid(dd, &tidbase[tid], 1, + dd->ipath_f_put_tid(dd, &tidbase[tid], + RCVHQ_RCV_TYPE_EXPECTED, dd->ipath_tidinvalid); pci_unmap_page(dd->pcidev, dd->ipath_physshadow[porttid + tid], @@ -538,7 +540,8 @@ static int ipath_tid_free(struct ipath_portdata *pd, unsigned subport, if (dd->ipath_pageshadow[porttid + tid]) { ipath_cdbg(VERBOSE, "PID %u freeing TID %u\n", pd->port_pid, tid); - dd->ipath_f_put_tid(dd, &tidbase[tid], 1, + dd->ipath_f_put_tid(dd, &tidbase[tid], + RCVHQ_RCV_TYPE_EXPECTED, dd->ipath_tidinvalid); pci_unmap_page(dd->pcidev, dd->ipath_physshadow[porttid + tid], @@ -921,7 +924,8 @@ static int ipath_create_user_egr(struct ipath_portdata *pd) (u64 __iomem *) ((char __iomem *) dd->ipath_kregbase + - dd->ipath_rcvegrbase), 0, pa); + dd->ipath_rcvegrbase), + RCVHQ_RCV_TYPE_EAGER, pa); pa += egrsize; } cond_resched(); /* don't hog the cpu */ diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c index 0479985..d8ac9f1 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6110.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c @@ -1408,7 +1408,7 @@ static void ipath_ht_quiet_serdes(struct ipath_devdata *dd) * ipath_pe_put_tid - write a TID in chip * @dd: the infinipath device * @tidptr: pointer to the expected TID (in chip) to udpate - * @tidtype: 0 for eager, 1 for expected + * @tidtype: RCVHQ_RCV_TYPE_EAGER (1) for eager, RCVHQ_RCV_TYPE_EXPECTED (0) for expected * @pa: physical address of in memory buffer; ipath_tidinvalid if freeing * * This exists as a separate routine to allow for special locking etc. @@ -1429,7 +1429,7 @@ static void ipath_ht_put_tid(struct ipath_devdata *dd, "40 bits, using only 40!!!\n", pa); pa &= INFINIPATH_RT_ADDR_MASK; } - if (type == 0) + if (type == RCVHQ_RCV_TYPE_EAGER) pa |= dd->ipath_tidtemplate; else { /* in words (fixed, full page). */ @@ -1469,7 +1469,8 @@ static void ipath_ht_clear_tids(struct ipath_devdata *dd, unsigned port) port * dd->ipath_rcvtidcnt * sizeof(*tidbase)); for (i = 0; i < dd->ipath_rcvtidcnt; i++) - ipath_ht_put_tid(dd, &tidbase[i], 1, dd->ipath_tidinvalid); + ipath_ht_put_tid(dd, &tidbase[i], RCVHQ_RCV_TYPE_EXPECTED, + dd->ipath_tidinvalid); tidbase = (u64 __iomem *) ((char __iomem *)(dd->ipath_kregbase) + dd->ipath_rcvegrbase + @@ -1477,7 +1478,8 @@ static void ipath_ht_clear_tids(struct ipath_devdata *dd, unsigned port) sizeof(*tidbase)); for (i = 0; i < dd->ipath_rcvegrcnt; i++) - ipath_ht_put_tid(dd, &tidbase[i], 0, dd->ipath_tidinvalid); + ipath_ht_put_tid(dd, &tidbase[i], RCVHQ_RCV_TYPE_EAGER, + dd->ipath_tidinvalid); } /** diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c index 207323a..b931057 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c @@ -1104,7 +1104,7 @@ bail: * ipath_pe_put_tid - write a TID in chip * @dd: the infinipath device * @tidptr: pointer to the expected TID (in chip) to udpate - * @tidtype: 0 for eager, 1 for expected + * @tidtype: RCVHQ_RCV_TYPE_EAGER (1) for eager, RCVHQ_RCV_TYPE_EXPECTED (0) for expected * @pa: physical address of in memory buffer; ipath_tidinvalid if freeing * * This exists as a separate routine to allow for special locking etc. @@ -1130,7 +1130,7 @@ static void ipath_pe_put_tid(struct ipath_devdata *dd, u64 __iomem *tidptr, "BUG: Physical page address 0x%lx " "has bits set in 31-29\n", pa); - if (type == 0) + if (type == RCVHQ_RCV_TYPE_EAGER) pa |= dd->ipath_tidtemplate; else /* for now, always full 4KB page */ pa |= 2 << 29; @@ -1154,7 +1154,7 @@ static void ipath_pe_put_tid(struct ipath_devdata *dd, u64 __iomem *tidptr, * ipath_pe_put_tid_2 - write a TID in chip, Revision 2 or higher * @dd: the infinipath device * @tidptr: pointer to the expected TID (in chip) to udpate - * @tidtype: 0 for eager, 1 for expected + * @tidtype: RCVHQ_RCV_TYPE_EAGER (1) for eager, RCVHQ_RCV_TYPE_EXPECTED (0) for expected * @pa: physical address of in memory buffer; ipath_tidinvalid if freeing * * This exists as a separate routine to allow for selection of the @@ -1179,7 +1179,7 @@ static void ipath_pe_put_tid_2(struct ipath_devdata *dd, u64 __iomem *tidptr, "BUG: Physical page address 0x%lx " "has bits set in 31-29\n", pa); - if (type == 0) + if (type == RCVHQ_RCV_TYPE_EAGER) pa |= dd->ipath_tidtemplate; else /* for now, always full 4KB page */ pa |= 2 << 29; @@ -1218,7 +1218,8 @@ static void ipath_pe_clear_tids(struct ipath_devdata *dd, unsigned port) port * dd->ipath_rcvtidcnt * sizeof(*tidbase)); for (i = 0; i < dd->ipath_rcvtidcnt; i++) - ipath_pe_put_tid(dd, &tidbase[i], 0, tidinv); + ipath_pe_put_tid(dd, &tidbase[i], RCVHQ_RCV_TYPE_EXPECTED, + tidinv); tidbase = (u64 __iomem *) ((char __iomem *)(dd->ipath_kregbase) + @@ -1226,7 +1227,8 @@ static void ipath_pe_clear_tids(struct ipath_devdata *dd, unsigned port) port * dd->ipath_rcvegrcnt * sizeof(*tidbase)); for (i = 0; i < dd->ipath_rcvegrcnt; i++) - ipath_pe_put_tid(dd, &tidbase[i], 1, tidinv); + ipath_pe_put_tid(dd, &tidbase[i], RCVHQ_RCV_TYPE_EAGER, + tidinv); } /** diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index bdfda62..9f61155 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -133,7 +133,8 @@ static int create_port0_egr(struct ipath_devdata *dd) dd->ipath_ibmaxlen, PCI_DMA_FROMDEVICE); dd->ipath_f_put_tid(dd, e + (u64 __iomem *) ((char __iomem *) dd->ipath_kregbase + - dd->ipath_rcvegrbase), 0, + dd->ipath_rcvegrbase), + RCVHQ_RCV_TYPE_EAGER, dd->ipath_port0_skbinfo[e].phys); } From arthur.jones at qlogic.com Tue Jun 19 16:41:45 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:41:45 -0700 Subject: [ofa-general] [PATCH 13/28] IB/ipath - Fix the mtrr_add args for chips with 2 buffer sizes In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234144.3794.6370.stgit@bauxite.internal.keyresearch.com> From: Dave Olson The values passed have never been right for iba 6120 chips, but just happened to work. We needed to select the right buffer offset in the chip (both are in same register), and the total length was wrong also, but was covered by the rounding up. Signed-off-by: Dave Olson --- drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 27 ++++++++++++++++++++----- 1 files changed, 22 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c index 04696e6..9f409fd 100644 --- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c +++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c @@ -63,12 +63,29 @@ int ipath_enable_wc(struct ipath_devdata *dd) * of 2 address matching the length (which has to be a power of 2). * For rev1, that means the base address, for rev2, it will be just * the PIO buffers themselves. + * For chips with two sets of buffers, the calculations are + * somewhat more complicated; we need to sum, and the piobufbase + * register has both offsets, 2K in low 32 bits, 4K in high 32 bits. + * The buffers are still packed, so a single range covers both. */ - pioaddr = addr + dd->ipath_piobufbase; - piolen = (dd->ipath_piobcnt2k + - dd->ipath_piobcnt4k) * - ALIGN(dd->ipath_piobcnt2k + - dd->ipath_piobcnt4k, dd->ipath_palign); + if (dd->ipath_piobcnt2k && dd->ipath_piobcnt4k) { /* 2 sizes */ + unsigned long pio2kbase, pio4kbase; + pio2kbase = dd->ipath_piobufbase & 0xffffffffUL; + pio4kbase = (dd->ipath_piobufbase >> 32) & 0xffffffffUL; + if (pio2kbase < pio4kbase) { /* all, for now */ + pioaddr = addr + pio2kbase; + piolen = pio4kbase - pio2kbase + + dd->ipath_piobcnt4k * dd->ipath_4kalign; + } else { + pioaddr = addr + pio4kbase; + piolen = pio2kbase - pio4kbase + + dd->ipath_piobcnt2k * dd->ipath_palign; + } + } else { /* single buffer size (2K, currently) */ + pioaddr = addr + dd->ipath_piobufbase; + piolen = dd->ipath_piobcnt2k * dd->ipath_palign + + dd->ipath_piobcnt4k * dd->ipath_4kalign; + } for (bits = 0; !(piolen & (1ULL << bits)); bits++) /* do nothing */ ; From arthur.jones at qlogic.com Tue Jun 19 16:41:51 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:41:51 -0700 Subject: [ofa-general] [PATCH 14/28] IB/ipath - Use S_ABORT not cancel and abort on exit freeze mode after recovery In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234150.3794.43464.stgit@bauxite.internal.keyresearch.com> From: Dave Olson This centralizes the use of the abort functionality, removes the unneeded buffer cancel (abort does the same thing), sets up to ignore launch errors after abort, same as cancel. We need abort on exit from freeze mode to avoid having buffers stuck in the busy state, if a user process happened to complete the send while we were in freeze mode doing the recovery. Signed-off-by: Dave Olson --- drivers/infiniband/hw/ipath/ipath_driver.c | 57 ++++++++++++++++--------- drivers/infiniband/hw/ipath/ipath_iba6110.c | 13 +++--- drivers/infiniband/hw/ipath/ipath_iba6120.c | 16 ++++++- drivers/infiniband/hw/ipath/ipath_init_chip.c | 6 +++ drivers/infiniband/hw/ipath/ipath_intr.c | 13 ++---- drivers/infiniband/hw/ipath/ipath_kernel.h | 1 6 files changed, 68 insertions(+), 38 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index e963986..8b61179 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -706,9 +706,9 @@ void ipath_disarm_piobufs(struct ipath_devdata *dd, unsigned first, u64 sendctrl, sendorig; ipath_cdbg(PKT, "disarm %u PIObufs first=%u\n", cnt, first); - sendorig = dd->ipath_sendctrl | INFINIPATH_S_DISARM; + sendorig = dd->ipath_sendctrl; for (i = first; i < last; i++) { - sendctrl = sendorig | + sendctrl = sendorig | INFINIPATH_S_DISARM | (i << INFINIPATH_S_DISARMPIOBUF_SHIFT); ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, sendctrl); @@ -719,12 +719,12 @@ void ipath_disarm_piobufs(struct ipath_devdata *dd, unsigned first, * while we were looping; no critical bits that would require * locking. * - * Write a 0, and then the original value, reading scratch in + * disable PIOAVAILUPD, then re-enable, reading scratch in * between. This seems to avoid a chip timing race that causes * pioavail updates to memory to stop. */ ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, - 0); + sendorig & ~IPATH_S_PIOBUFAVAILUPD); sendorig = ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl); @@ -1596,6 +1596,35 @@ int ipath_waitfor_mdio_cmdready(struct ipath_devdata *dd) return ret; } + +/* + * Flush all sends that might be in the ready to send state, as well as any + * that are in the process of being sent. Used whenever we need to be + * sure the send side is idle. Cleans up all buffer state by canceling + * all pio buffers, and issuing an abort, which cleans up anything in the + * launch fifo. The cancel is superfluous on some chip versions, but + * it's safer to always do it. + * PIOAvail bits are updated by the chip as if normal send had happened. + */ +void ipath_cancel_sends(struct ipath_devdata *dd) +{ + ipath_dbg("Cancelling all in-progress send buffers\n"); + dd->ipath_lastcancel = jiffies+HZ/2; /* skip armlaunch errs a bit */ + /* + * the abort bit is auto-clearing. We read scratch to be sure + * that cancels and the abort have taken effect in the chip. + */ + ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, + INFINIPATH_S_ABORT); + ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); + ipath_disarm_piobufs(dd, 0, + (unsigned)(dd->ipath_piobcnt2k + dd->ipath_piobcnt4k)); + + /* and again, be sure all have hit the chip */ + ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch); +} + + static void ipath_set_ib_lstate(struct ipath_devdata *dd, int which) { static const char *what[4] = { @@ -1617,14 +1646,8 @@ static void ipath_set_ib_lstate(struct ipath_devdata *dd, int which) INFINIPATH_IBCS_LINKTRAININGSTATE_MASK]); /* flush all queued sends when going to DOWN or INIT, to be sure that * they don't block MAD packets */ - if (!linkcmd || linkcmd == INFINIPATH_IBCC_LINKCMD_INIT) { - ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, - INFINIPATH_S_ABORT); - ipath_disarm_piobufs(dd, dd->ipath_lastport_piobuf, - (unsigned)(dd->ipath_piobcnt2k + - dd->ipath_piobcnt4k) - - dd->ipath_lastport_piobuf); - } + if (!linkcmd || linkcmd == INFINIPATH_IBCC_LINKCMD_INIT) + ipath_cancel_sends(dd); ipath_write_kreg(dd, dd->ipath_kregs->kr_ibcctrl, dd->ipath_ibcctrl | which); @@ -1967,17 +1990,9 @@ void ipath_shutdown_device(struct ipath_devdata *dd) */ udelay(5); - /* - * abort any armed or launched PIO buffers that didn't go. (self - * clearing). Will cause any packet currently being transmitted to - * go out with an EBP, and may also cause a short packet error on - * the receiver. - */ - ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, - INFINIPATH_S_ABORT); - ipath_set_ib_lstate(dd, INFINIPATH_IBCC_LINKINITCMD_DISABLE << INFINIPATH_IBCC_LINKINITCMD_SHIFT); + ipath_cancel_sends(dd); /* disable IBC */ dd->ipath_control &= ~INFINIPATH_C_LINKENABLE; diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c index d8ac9f1..34d159a 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6110.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c @@ -509,6 +509,13 @@ static void ipath_ht_handle_hwerrors(struct ipath_devdata *dd, char *msg, if (!hwerrs) { ipath_dbg("Clearing freezemode on ignored or " "recovered hardware error\n"); + /* + * clear all sends, becauase they have may been + * completed by usercode while in freeze mode, and + * therefore would not be sent, and eventually + * might cause the process to run out of bufs + */ + ipath_cancel_sends(dd); ctrl &= ~INFINIPATH_C_FREEZEMODE; ipath_write_kreg(dd, dd->ipath_kregs->kr_control, ctrl); @@ -1566,11 +1573,6 @@ static int ipath_ht_early_init(struct ipath_devdata *dd) writel(16, piobuf); piobuf += pioincr; } - /* - * self-clearing - */ - ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, - INFINIPATH_S_ABORT); ipath_get_eeprom_info(dd); if (dd->ipath_boardrev == 5 && dd->ipath_serial[0] == '1' && @@ -1599,7 +1601,6 @@ static int ipath_ht_txe_recover(struct ipath_devdata *dd) } dev_info(&dd->pcidev->dev, "Recovering from TXE PIO parity error\n"); - ipath_disarm_senderrbufs(dd, 1); return 1; } diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c index b931057..0c34555 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c @@ -430,8 +430,19 @@ static void ipath_pe_handle_hwerrors(struct ipath_devdata *dd, char *msg, *dd->ipath_statusp |= IPATH_STATUS_HWERROR; dd->ipath_flags &= ~IPATH_INITTED; } else { - ipath_dbg("Clearing freezemode on ignored hardware " - "error\n"); + static u32 freeze_cnt; + + freeze_cnt++; + ipath_dbg("Clearing freezemode on ignored or recovered " + "hardware error (%u)\n", freeze_cnt); + /* + * clear all sends, becauase they have may been + * completed by usercode while in freeze mode, and + * therefore would not be sent, and eventually + * might cause the process to run out of bufs + */ + ipath_cancel_sends(dd); + ctrl &= ~INFINIPATH_C_FREEZEMODE; ipath_write_kreg(dd, dd->ipath_kregs->kr_control, dd->ipath_control); } @@ -1371,7 +1382,6 @@ static int ipath_pe_txe_recover(struct ipath_devdata *dd) dev_info(&dd->pcidev->dev, "Recovering from TXE PIO parity error\n"); } - ipath_disarm_senderrbufs(dd, 1); return 1; } diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index 9f61155..5193d69 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -777,6 +777,12 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit) piobufs, dd->ipath_pbufsport, uports); dd->ipath_f_early_init(dd); + /* + * cancel any possible active sends from early driver load. + * Follows early_init because some chips have to initialize + * PIO buffers in early_init to avoid false parity errors. + */ + ipath_cancel_sends(dd); /* early_init sets rcvhdrentsize and rcvhdrsize, so this must be * done after early_init */ diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c index d9cdd00..948091f 100644 --- a/drivers/infiniband/hw/ipath/ipath_intr.c +++ b/drivers/infiniband/hw/ipath/ipath_intr.c @@ -93,7 +93,8 @@ void ipath_disarm_senderrbufs(struct ipath_devdata *dd, int rewrite) if (sbuf[0] || sbuf[1] || (piobcnt > 128 && (sbuf[2] || sbuf[3]))) { int i; - if (ipath_debug & (__IPATH_PKTDBG|__IPATH_DBG)) { + if (ipath_debug & (__IPATH_PKTDBG|__IPATH_DBG) && + dd->ipath_lastcancel > jiffies) { __IPATH_DBG_WHICH(__IPATH_PKTDBG|__IPATH_DBG, "SendbufErrs %lx %lx", sbuf[0], sbuf[1]); @@ -108,7 +109,8 @@ void ipath_disarm_senderrbufs(struct ipath_devdata *dd, int rewrite) ipath_clrpiobuf(dd, i); ipath_disarm_piobufs(dd, i, 1); } - dd->ipath_lastcancel = jiffies+3; /* no armlaunch for a bit */ + /* ignore armlaunch errs for a bit */ + dd->ipath_lastcancel = jiffies+3; } } @@ -290,12 +292,7 @@ static void handle_e_ibstatuschanged(struct ipath_devdata *dd, * Flush all queued sends when link went to DOWN or INIT, * to be sure that they don't block SMA and other MAD packets */ - ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, - INFINIPATH_S_ABORT); - ipath_disarm_piobufs(dd, dd->ipath_lastport_piobuf, - (unsigned)(dd->ipath_piobcnt2k + - dd->ipath_piobcnt4k) - - dd->ipath_lastport_piobuf); + ipath_cancel_sends(dd); } else if (lstate == IPATH_IBSTATE_INIT || lstate == IPATH_IBSTATE_ARM || lstate == IPATH_IBSTATE_ACTIVE) { diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index 2a4414b..2e85aec 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -676,6 +676,7 @@ int ipath_unordered_wc(void); void ipath_disarm_piobufs(struct ipath_devdata *, unsigned first, unsigned cnt); +void ipath_cancel_sends(struct ipath_devdata *); int ipath_create_rcvhdrq(struct ipath_devdata *, struct ipath_portdata *); void ipath_free_pddata(struct ipath_devdata *, struct ipath_portdata *); From arthur.jones at qlogic.com Tue Jun 19 16:41:57 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:41:57 -0700 Subject: [ofa-general] [PATCH 15/28] IB/ipath - add barrier before updating WC head in shared memory In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234156.3794.26440.stgit@bauxite.internal.keyresearch.com> From: Ralph Campbell Add a barrier to make sure the CPU doesn't reorder writes to memory since user programs can be polling on the head index update and the entry should be written before that. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_cq.c | 3 ++- drivers/infiniband/hw/ipath/ipath_ruc.c | 1 + drivers/infiniband/hw/ipath/ipath_srq.c | 1 + drivers/infiniband/hw/ipath/ipath_ud.c | 1 + drivers/infiniband/hw/ipath/ipath_verbs.c | 1 + 5 files changed, 6 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c b/drivers/infiniband/hw/ipath/ipath_cq.c index 3e9241b..8a2a774 100644 --- a/drivers/infiniband/hw/ipath/ipath_cq.c +++ b/drivers/infiniband/hw/ipath/ipath_cq.c @@ -90,6 +90,7 @@ void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int solicited) wc->queue[head].sl = entry->sl; wc->queue[head].dlid_path_bits = entry->dlid_path_bits; wc->queue[head].port_num = entry->port_num; + wmb(); wc->head = next; if (cq->notify == IB_CQ_NEXT_COMP || @@ -139,7 +140,7 @@ int ipath_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry) if (tail == wc->head) break; - + rmb(); qp = ipath_lookup_qpn(&to_idev(cq->ibcq.device)->qp_table, wc->queue[tail].qp_num); entry->qp = &qp->ibqp; diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c index 8c5d20a..103dea0 100644 --- a/drivers/infiniband/hw/ipath/ipath_ruc.c +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c @@ -194,6 +194,7 @@ int ipath_get_rwqe(struct ipath_qp *qp, int wr_id_only) ret = 0; goto bail; } + rmb(); wqe = get_rwqe_ptr(rq, tail); if (++tail >= rq->size) tail = 0; diff --git a/drivers/infiniband/hw/ipath/ipath_srq.c b/drivers/infiniband/hw/ipath/ipath_srq.c index 03acae6..4b4214e 100644 --- a/drivers/infiniband/hw/ipath/ipath_srq.c +++ b/drivers/infiniband/hw/ipath/ipath_srq.c @@ -80,6 +80,7 @@ int ipath_post_srq_receive(struct ib_srq *ibsrq, struct ib_recv_wr *wr, wqe->num_sge = wr->num_sge; for (i = 0; i < wr->num_sge; i++) wqe->sg_list[i] = wr->sg_list[i]; + wmb(); wq->head = next; spin_unlock_irqrestore(&srq->rq.lock, flags); } diff --git a/drivers/infiniband/hw/ipath/ipath_ud.c b/drivers/infiniband/hw/ipath/ipath_ud.c index a518f7c..eee54c7 100644 --- a/drivers/infiniband/hw/ipath/ipath_ud.c +++ b/drivers/infiniband/hw/ipath/ipath_ud.c @@ -176,6 +176,7 @@ static void ipath_ud_loopback(struct ipath_qp *sqp, dev->n_pkt_drops++; goto bail_sge; } + rmb(); wqe = get_rwqe_ptr(rq, tail); if (++tail >= rq->size) tail = 0; diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index 04294ca..b92006a 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -323,6 +323,7 @@ static int ipath_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr, wqe->num_sge = wr->num_sge; for (i = 0; i < wr->num_sge; i++) wqe->sg_list[i] = wr->sg_list[i]; + wmb(); wq->head = next; spin_unlock_irqrestore(&qp->r_rq.lock, flags); } From arthur.jones at qlogic.com Tue Jun 19 16:42:03 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:42:03 -0700 Subject: [ofa-general] [PATCH 16/28] IB/ipath - Fix RDMA read retry code In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234202.3794.36576.stgit@bauxite.internal.keyresearch.com> From: Ralph Campbell A RDMA read response or atomic response can ACK earlier sends and RDMA writes. In this case, the wrong work request pointer was being used to store the read first response or atomic result. Also, if a RDMA read request is retried, the code to compute which request to resend was incorrect. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_rc.c | 57 +++++++++++++++++++++----------- 1 files changed, 38 insertions(+), 19 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c index 9ba80d1..014d811 100644 --- a/drivers/infiniband/hw/ipath/ipath_rc.c +++ b/drivers/infiniband/hw/ipath/ipath_rc.c @@ -806,13 +806,15 @@ static inline void update_last_psn(struct ipath_qp *qp, u32 psn) * Called at interrupt level with the QP s_lock held and interrupts disabled. * Returns 1 if OK, 0 if current operation should be aborted (NAK). */ -static int do_rc_ack(struct ipath_qp *qp, u32 aeth, u32 psn, int opcode) +static int do_rc_ack(struct ipath_qp *qp, u32 aeth, u32 psn, int opcode, + u64 val) { struct ipath_ibdev *dev = to_idev(qp->ibqp.device); struct ib_wc wc; struct ipath_swqe *wqe; int ret = 0; u32 ack_psn; + int diff; /* * Remove the QP from the timeout queue (or RNR timeout queue). @@ -840,7 +842,19 @@ static int do_rc_ack(struct ipath_qp *qp, u32 aeth, u32 psn, int opcode) * The MSN might be for a later WQE than the PSN indicates so * only complete WQEs that the PSN finishes. */ - while (ipath_cmp24(ack_psn, wqe->lpsn) >= 0) { + while ((diff = ipath_cmp24(ack_psn, wqe->lpsn)) >= 0) { + /* + * RDMA_READ_RESPONSE_ONLY is a special case since + * we want to generate completion events for everything + * before the RDMA read, copy the data, then generate + * the completion for the read. + */ + if (wqe->wr.opcode == IB_WR_RDMA_READ && + opcode == OP(RDMA_READ_RESPONSE_ONLY) && + diff == 0) { + ret = 1; + goto bail; + } /* * If this request is a RDMA read or atomic, and the ACK is * for a later operation, this ACK NAKs the RDMA read or @@ -851,12 +865,10 @@ static int do_rc_ack(struct ipath_qp *qp, u32 aeth, u32 psn, int opcode) * is sent but before the response is received. */ if ((wqe->wr.opcode == IB_WR_RDMA_READ && - (opcode != OP(RDMA_READ_RESPONSE_LAST) || - ipath_cmp24(ack_psn, wqe->lpsn) != 0)) || + (opcode != OP(RDMA_READ_RESPONSE_LAST) || diff != 0)) || ((wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP || wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD) && - (opcode != OP(ATOMIC_ACKNOWLEDGE) || - ipath_cmp24(wqe->psn, psn) != 0))) { + (opcode != OP(ATOMIC_ACKNOWLEDGE) || diff != 0))) { /* * The last valid PSN seen is the previous * request's. @@ -870,6 +882,9 @@ static int do_rc_ack(struct ipath_qp *qp, u32 aeth, u32 psn, int opcode) */ goto bail; } + if (wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP || + wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD) + *(u64 *) wqe->sg_list[0].vaddr = val; if (qp->s_num_rd_atomic && (wqe->wr.opcode == IB_WR_RDMA_READ || wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP || @@ -1079,6 +1094,7 @@ static inline void ipath_rc_rcv_resp(struct ipath_ibdev *dev, int diff; u32 pad; u32 aeth; + u64 val; spin_lock_irqsave(&qp->s_lock, flags); @@ -1118,8 +1134,6 @@ static inline void ipath_rc_rcv_resp(struct ipath_ibdev *dev, data += sizeof(__be32); } if (opcode == OP(ATOMIC_ACKNOWLEDGE)) { - u64 val; - if (!header_in_data) { __be32 *p = ohdr->u.at.atomic_ack_eth; @@ -1127,12 +1141,13 @@ static inline void ipath_rc_rcv_resp(struct ipath_ibdev *dev, be32_to_cpu(p[1]); } else val = be64_to_cpu(((__be64 *) data)[0]); - *(u64 *) wqe->sg_list[0].vaddr = val; - } - if (!do_rc_ack(qp, aeth, psn, opcode) || + } else + val = 0; + if (!do_rc_ack(qp, aeth, psn, opcode, val) || opcode != OP(RDMA_READ_RESPONSE_FIRST)) goto ack_done; hdrsize += 4; + wqe = get_swqe_ptr(qp, qp->s_last); if (unlikely(wqe->wr.opcode != IB_WR_RDMA_READ)) goto ack_op_err; /* @@ -1176,13 +1191,12 @@ static inline void ipath_rc_rcv_resp(struct ipath_ibdev *dev, goto bail; case OP(RDMA_READ_RESPONSE_ONLY): - if (unlikely(ipath_cmp24(psn, qp->s_last_psn + 1))) { - dev->n_rdma_seq++; - ipath_restart_rc(qp, qp->s_last_psn + 1, &wc); + if (!header_in_data) + aeth = be32_to_cpu(ohdr->u.aeth); + else + aeth = be32_to_cpu(((__be32 *) data)[0]); + if (!do_rc_ack(qp, aeth, psn, opcode, 0)) goto ack_done; - } - if (unlikely(wqe->wr.opcode != IB_WR_RDMA_READ)) - goto ack_op_err; /* Get the number of bytes the message was padded by. */ pad = (be32_to_cpu(ohdr->bth[0]) >> 20) & 3; /* @@ -1197,6 +1211,7 @@ static inline void ipath_rc_rcv_resp(struct ipath_ibdev *dev, * have to be careful to copy the data to the right * location. */ + wqe = get_swqe_ptr(qp, qp->s_last); qp->s_rdma_read_len = restart_sge(&qp->s_rdma_read_sge, wqe, psn, pmtu); goto read_last; @@ -1230,7 +1245,8 @@ static inline void ipath_rc_rcv_resp(struct ipath_ibdev *dev, data += sizeof(__be32); } ipath_copy_sge(&qp->s_rdma_read_sge, data, tlen); - (void) do_rc_ack(qp, aeth, psn, OP(RDMA_READ_RESPONSE_LAST)); + (void) do_rc_ack(qp, aeth, psn, + OP(RDMA_READ_RESPONSE_LAST), 0); goto ack_done; } @@ -1344,8 +1360,11 @@ static inline int ipath_rc_rcv_error(struct ipath_ibdev *dev, e = NULL; break; } - if (ipath_cmp24(psn, e->psn) >= 0) + if (ipath_cmp24(psn, e->psn) >= 0) { + if (prev == qp->s_tail_ack_queue) + old_req = 0; break; + } } switch (opcode) { case OP(RDMA_READ_REQUEST): { From arthur.jones at qlogic.com Tue Jun 19 16:42:09 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:42:09 -0700 Subject: [ofa-general] [PATCH 17/28] IB/ipath - wait for PIO available interrupt In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234208.3794.75336.stgit@bauxite.internal.keyresearch.com> From: Ralph Campbell The send function is called when posting new send work requests. There is no point in trying to send a packet if the QP is already waiting for a HW send buffer so don't clear the busy bit until the buffer available interrupt happens. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_ruc.c | 6 ++---- drivers/infiniband/hw/ipath/ipath_verbs.c | 1 + 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c index 103dea0..7d09f5b 100644 --- a/drivers/infiniband/hw/ipath/ipath_ruc.c +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c @@ -504,11 +504,9 @@ void ipath_no_bufs_available(struct ipath_qp *qp, struct ipath_ibdev *dev) * could be called. If we are still in the tasklet function, * tasklet_hi_schedule() will not call us until the next time * tasklet_hi_schedule() is called. - * We clear the tasklet flag now since we are committing to return - * from the tasklet function. + * We leave the busy flag set so that another post send doesn't + * try to put the same QP on the piowait list again. */ - clear_bit(IPATH_S_BUSY, &qp->s_busy); - tasklet_unlock(&qp->s_task); want_buffer(dev->dd); dev->n_piowait++; } diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index b92006a..68952be 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -949,6 +949,7 @@ int ipath_ib_piobufavail(struct ipath_ibdev *dev) qp = list_entry(dev->piowait.next, struct ipath_qp, piowait); list_del_init(&qp->piowait); + clear_bit(IPATH_S_BUSY, &qp->s_busy); tasklet_hi_schedule(&qp->s_task); } spin_unlock_irqrestore(&dev->pending_lock, flags); From arthur.jones at qlogic.com Tue Jun 19 16:42:15 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:42:15 -0700 Subject: [ofa-general] [PATCH 18/28] IB/ipath - Possible data corruption if multiple SGEs used for receive In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234214.3794.13513.stgit@bauxite.internal.keyresearch.com> From: Ralph Campbell The code to copy data from the receive queue buffers to the IB SGEs doesn't check the SGE length, only the memory region/page length when copying data. This could overwrite parts of the user's memory that were not intended to be written. It can only happen if multiple SGEs are used to describe a receive buffer which almost never happens in practice. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_ruc.c | 2 ++ drivers/infiniband/hw/ipath/ipath_ud.c | 2 ++ drivers/infiniband/hw/ipath/ipath_verbs.c | 8 ++++++-- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c index 7d09f5b..1a5afaf 100644 --- a/drivers/infiniband/hw/ipath/ipath_ruc.c +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c @@ -397,6 +397,8 @@ again: if (len > sge->length) len = sge->length; + if (len > sge->sge_length) + len = sge->sge_length; BUG_ON(len == 0); ipath_copy_sge(&qp->r_sge, sge->vaddr, len); sge->vaddr += len; diff --git a/drivers/infiniband/hw/ipath/ipath_ud.c b/drivers/infiniband/hw/ipath/ipath_ud.c index eee54c7..26171e5 100644 --- a/drivers/infiniband/hw/ipath/ipath_ud.c +++ b/drivers/infiniband/hw/ipath/ipath_ud.c @@ -232,6 +232,8 @@ static void ipath_ud_loopback(struct ipath_qp *sqp, if (len > length) len = length; + if (len > sge->sge_length) + len = sge->sge_length; BUG_ON(len == 0); ipath_copy_sge(&rsge, sge->vaddr, len); sge->vaddr += len; diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index 68952be..6753f7d 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -164,9 +164,11 @@ void ipath_copy_sge(struct ipath_sge_state *ss, void *data, u32 length) while (length) { u32 len = sge->length; - BUG_ON(len == 0); if (len > length) len = length; + if (len > sge->sge_length) + len = sge->sge_length; + BUG_ON(len == 0); memcpy(sge->vaddr, data, len); sge->vaddr += len; sge->length -= len; @@ -202,9 +204,11 @@ void ipath_skip_sge(struct ipath_sge_state *ss, u32 length) while (length) { u32 len = sge->length; - BUG_ON(len == 0); if (len > length) len = length; + if (len > sge->sge_length) + len = sge->sge_length; + BUG_ON(len == 0); sge->vaddr += len; sge->length -= len; sge->sge_length -= len; From arthur.jones at qlogic.com Tue Jun 19 16:42:21 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:42:21 -0700 Subject: [ofa-general] [PATCH 19/28] IB/ipath - Duplicate RDMA reads can cause responder to NAK inappropriately In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234220.3794.7662.stgit@bauxite.internal.keyresearch.com> From: Ralph Campbell A duplicate RDMA read request can fool the responder into NAKing a new RDMA read request because the responder wasn't keeping track of whether the queue of RDMA read requests had been sent at least once. For example, requester sends 4 2K byte RDMA read requests, times out, and resends the first, then sees the 4 responses, then sends a 5th RDMA read or atomic operation. The responder sees the 4 requests, sends 4 responses, sees the resent 1st request, rewinds the queue, then sees the 5th request but thinks the queue is full and that the requester is invalidly sending a 5th new request. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_rc.c | 38 +++++++++++++++++++++++++---- drivers/infiniband/hw/ipath/ipath_verbs.h | 1 + 2 files changed, 34 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c index 014d811..9e71239 100644 --- a/drivers/infiniband/hw/ipath/ipath_rc.c +++ b/drivers/infiniband/hw/ipath/ipath_rc.c @@ -125,8 +125,10 @@ static int ipath_make_rc_ack(struct ipath_qp *qp, if (len > pmtu) { len = pmtu; qp->s_ack_state = OP(RDMA_READ_RESPONSE_FIRST); - } else + } else { qp->s_ack_state = OP(RDMA_READ_RESPONSE_ONLY); + e->sent = 1; + } ohdr->u.aeth = ipath_compute_aeth(qp); hwords++; qp->s_ack_rdma_psn = e->psn; @@ -143,6 +145,7 @@ static int ipath_make_rc_ack(struct ipath_qp *qp, cpu_to_be32(e->atomic_data); hwords += sizeof(ohdr->u.at) / sizeof(u32); bth2 = e->psn; + e->sent = 1; } bth0 = qp->s_ack_state << 24; break; @@ -158,6 +161,7 @@ static int ipath_make_rc_ack(struct ipath_qp *qp, ohdr->u.aeth = ipath_compute_aeth(qp); hwords++; qp->s_ack_state = OP(RDMA_READ_RESPONSE_LAST); + qp->s_ack_queue[qp->s_tail_ack_queue].sent = 1; } bth0 = qp->s_ack_state << 24; bth2 = qp->s_ack_rdma_psn++ & IPATH_PSN_MASK; @@ -1479,6 +1483,22 @@ static void ipath_rc_error(struct ipath_qp *qp, enum ib_wc_status err) spin_unlock_irqrestore(&qp->s_lock, flags); } +static inline void ipath_update_ack_queue(struct ipath_qp *qp, unsigned n) +{ + unsigned long flags; + unsigned next; + + next = n + 1; + if (next > IPATH_MAX_RDMA_ATOMIC) + next = 0; + spin_lock_irqsave(&qp->s_lock, flags); + if (n == qp->s_tail_ack_queue) { + qp->s_tail_ack_queue = next; + qp->s_ack_state = OP(ACKNOWLEDGE); + } + spin_unlock_irqrestore(&qp->s_lock, flags); +} + /** * ipath_rc_rcv - process an incoming RC packet * @dev: the device this packet came in on @@ -1741,8 +1761,11 @@ void ipath_rc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr, next = qp->r_head_ack_queue + 1; if (next > IPATH_MAX_RDMA_ATOMIC) next = 0; - if (unlikely(next == qp->s_tail_ack_queue)) - goto nack_inv; + if (unlikely(next == qp->s_tail_ack_queue)) { + if (!qp->s_ack_queue[next].sent) + goto nack_inv; + ipath_update_ack_queue(qp, next); + } e = &qp->s_ack_queue[qp->r_head_ack_queue]; /* RETH comes after BTH */ if (!header_in_data) @@ -1777,6 +1800,7 @@ void ipath_rc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr, e->rdma_sge.sge.sge_length = 0; } e->opcode = opcode; + e->sent = 0; e->psn = psn; /* * We need to increment the MSN here instead of when we @@ -1812,8 +1836,11 @@ void ipath_rc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr, next = qp->r_head_ack_queue + 1; if (next > IPATH_MAX_RDMA_ATOMIC) next = 0; - if (unlikely(next == qp->s_tail_ack_queue)) - goto nack_inv; + if (unlikely(next == qp->s_tail_ack_queue)) { + if (!qp->s_ack_queue[next].sent) + goto nack_inv; + ipath_update_ack_queue(qp, next); + } if (!header_in_data) ateth = &ohdr->u.atomic_eth; else @@ -1838,6 +1865,7 @@ void ipath_rc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr, be64_to_cpu(ateth->compare_data), sdata); e->opcode = opcode; + e->sent = 0; e->psn = psn & IPATH_PSN_MASK; qp->r_msn++; qp->r_psn++; diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h index 088b837..458f822 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.h +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h @@ -321,6 +321,7 @@ struct ipath_sge_state { */ struct ipath_ack_entry { u8 opcode; + u8 sent; u32 psn; union { struct ipath_sge_state rdma_sge; From arthur.jones at qlogic.com Tue Jun 19 16:42:27 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:42:27 -0700 Subject: [ofa-general] [PATCH 20/28] IB/ipath - Correct checking of swminor version field when using subports In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234226.3794.45007.stgit@bauxite.internal.keyresearch.com> From: Mark Debbage When subports are required to run a program, this patch checks that the driver and the user-space library have compatible subport implementations. This is achieved through checks on the swminor version field built into the driver and user-space library. Bad combinations are reported through syslog and result in an error when opening the port. Signed-off-by: Mark Debbage --- drivers/infiniband/hw/ipath/ipath_file_ops.c | 64 ++++++++++++++++++++++---- 1 files changed, 55 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c index 931802b..fc83f40 100644 --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c @@ -1403,6 +1403,38 @@ bail: return pollflag; } +static int ipath_supports_subports(int user_swmajor, int user_swminor) +{ + /* no subport implementation prior to software version 1.3 */ + return (user_swmajor > 1) || (user_swminor >= 3); +} + +static int ipath_compatible_subports(int user_swmajor, int user_swminor) +{ + /* this code is written long-hand for clarity */ + if (IPATH_USER_SWMAJOR != user_swmajor) { + /* no promise of compatibility if major mismatch */ + return 0; + } + if (IPATH_USER_SWMAJOR == 1) { + switch (IPATH_USER_SWMINOR) { + case 0: + case 1: + case 2: + /* no subport implementation so cannot be compatible */ + return 0; + case 3: + /* 3 is only compatible with itself */ + return user_swminor == 3; + default: + /* >= 4 are compatible (or are expected to be) */ + return user_swminor >= 4; + } + } + /* make no promises yet for future major versions */ + return 0; +} + static int init_subports(struct ipath_devdata *dd, struct ipath_portdata *pd, const struct ipath_user_info *uinfo) @@ -1418,14 +1450,26 @@ static int init_subports(struct ipath_devdata *dd, if (uinfo->spu_subport_cnt <= 1) goto bail; - /* Old user binaries don't know about new subport implementation */ - if ((uinfo->spu_userversion & 0xffff) != IPATH_USER_SWMINOR) { + /* Self-consistency check for ipath_compatible_subports() */ + if (ipath_supports_subports(IPATH_USER_SWMAJOR, IPATH_USER_SWMINOR) && + !ipath_compatible_subports(IPATH_USER_SWMAJOR, + IPATH_USER_SWMINOR)) { dev_info(&dd->pcidev->dev, - "Mismatched user minor version (%d) and driver " - "minor version (%d) while port sharing. Ensure " + "Inconsistent ipath_compatible_subports()\n"); + goto bail; + } + + /* Check for subport compatibility */ + if (!ipath_compatible_subports(uinfo->spu_userversion >> 16, + uinfo->spu_userversion & 0xffff)) { + dev_info(&dd->pcidev->dev, + "Mismatched user version (%d.%d) and driver " + "version (%d.%d) while port sharing. Ensure " "that driver and library are from the same " "release.\n", + (int) (uinfo->spu_userversion >> 16), (int) (uinfo->spu_userversion & 0xffff), + IPATH_USER_SWMAJOR, IPATH_USER_SWMINOR); goto bail; } @@ -1729,14 +1773,13 @@ static int ipath_open(struct inode *in, struct file *fp) return fp->private_data ? 0 : -ENOMEM; } - /* Get port early, so can set affinity prior to memory allocation */ static int ipath_assign_port(struct file *fp, const struct ipath_user_info *uinfo) { int ret; int i_minor; - unsigned swminor; + unsigned swmajor, swminor; /* Check to be sure we haven't already initialized this file */ if (port_fp(fp)) { @@ -1745,7 +1788,8 @@ static int ipath_assign_port(struct file *fp, } /* for now, if major version is different, bail */ - if ((uinfo->spu_userversion >> 16) != IPATH_USER_SWMAJOR) { + swmajor = uinfo->spu_userversion >> 16; + if (swmajor != IPATH_USER_SWMAJOR) { ipath_dbg("User major version %d not same as driver " "major %d\n", uinfo->spu_userversion >> 16, IPATH_USER_SWMAJOR); @@ -1760,7 +1804,8 @@ static int ipath_assign_port(struct file *fp, mutex_lock(&ipath_mutex); - if (swminor == IPATH_USER_SWMINOR && uinfo->spu_subport_cnt && + if (ipath_compatible_subports(swmajor, swminor) && + uinfo->spu_subport_cnt && (ret = find_shared_port(fp, uinfo))) { mutex_unlock(&ipath_mutex); if (ret > 0) @@ -2024,7 +2069,8 @@ static int ipath_port_info(struct ipath_portdata *pd, u16 subport, info.port = pd->port_port; info.subport = subport; /* Don't return new fields if old library opened the port. */ - if ((pd->userversion & 0xffff) == IPATH_USER_SWMINOR) { + if (ipath_supports_subports(pd->userversion >> 16, + pd->userversion & 0xffff)) { /* Number of user ports available for this device. */ info.num_ports = pd->port_dd->ipath_cfgports - 1; info.num_subports = pd->port_subport_cnt; From arthur.jones at qlogic.com Tue Jun 19 16:42:34 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:42:34 -0700 Subject: [ofa-general] [PATCH 21/28] IB/ipath - Consistent handling for one subport In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234232.3794.65280.stgit@bauxite.internal.keyresearch.com> From: Mark Debbage Previously the driver and user-space code handled the case of 1 subport somewhat inconsistently. The new interpretation of this situation is that if one subport is requested, the driver turns on the subport mechanism and arranges for the port to be "shared" by one process. In normal use the user-space library does not use this configuration and instead arranges for the port not to be shared at all. This particular idiom can be useful for testing purposes. Signed-off-by: Mark Debbage --- drivers/infiniband/hw/ipath/ipath_file_ops.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c index fc83f40..a474796 100644 --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c @@ -1444,10 +1444,10 @@ static int init_subports(struct ipath_devdata *dd, size_t size; /* - * If the user is requesting zero or one port, + * If the user is requesting zero subports, * skip the subport allocation. */ - if (uinfo->spu_subport_cnt <= 1) + if (uinfo->spu_subport_cnt <= 0) goto bail; /* Self-consistency check for ipath_compatible_subports() */ From arthur.jones at qlogic.com Tue Jun 19 16:42:41 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:42:41 -0700 Subject: [ofa-general] [PATCH 22/28] IB/ipath - Add capability to modify PBC word In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234240.3794.882.stgit@bauxite.internal.keyresearch.com> From: Michael Albaugh During compliance testing and when debugging some interconnect issues, it is very useful to be able to send malformed packets, without having the device signal them as malformed (drop, or terminate with EBP). The hardware supports this, but the driver "diagnostic packet" interface did not. Extend capability to send specific malformed packets for testing. Signed-off-by: Michael Albaugh --- drivers/infiniband/hw/ipath/ipath_common.h | 19 +++++++++++++- drivers/infiniband/hw/ipath/ipath_diag.c | 39 ++++++++++++++++++++++++---- 2 files changed, 52 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_common.h b/drivers/infiniband/hw/ipath/ipath_common.h index 12e1349..f70788c 100644 --- a/drivers/infiniband/hw/ipath/ipath_common.h +++ b/drivers/infiniband/hw/ipath/ipath_common.h @@ -501,13 +501,30 @@ struct __ipath_sendpkt { struct ipath_iovec sps_iov[4]; }; -/* Passed into diag data special file's ->write method. */ +/* + * diagnostics can send a packet by "writing" one of the following + * two structs to diag data special file + * The first is the legacy version for backward compatibility + */ struct ipath_diag_pkt { __u32 unit; __u64 data; __u32 len; }; +/* The second diag_pkt struct is the expanded version that allows + * more control over the packet, specifically, by allowing a custom + * pbc (+ extra) qword, so that special modes and deliberate + * changes to CRCs can be used. The elements were also re-ordered + * for better alignment and to avoid padding issues. + */ +struct ipath_diag_xpkt { + __u64 data; + __u64 pbc_wd; + __u32 unit; + __u32 len; +}; + /* * Data layout in I2C flash (for GUID, etc.) * All fields are little-endian binary unless otherwise stated diff --git a/drivers/infiniband/hw/ipath/ipath_diag.c b/drivers/infiniband/hw/ipath/ipath_diag.c index 63e8368..aab21c1 100644 --- a/drivers/infiniband/hw/ipath/ipath_diag.c +++ b/drivers/infiniband/hw/ipath/ipath_diag.c @@ -323,13 +323,14 @@ static ssize_t ipath_diagpkt_write(struct file *fp, { u32 __iomem *piobuf; u32 plen, clen, pbufn; - struct ipath_diag_pkt dp; + struct ipath_diag_pkt odp; + struct ipath_diag_xpkt dp; u32 *tmpbuf = NULL; struct ipath_devdata *dd; ssize_t ret = 0; u64 val; - if (count < sizeof(dp)) { + if (count != sizeof(dp)) { ret = -EINVAL; goto bail; } @@ -339,6 +340,29 @@ static ssize_t ipath_diagpkt_write(struct file *fp, goto bail; } + /* + * Due to padding/alignment issues (lessened with new struct) + * the old and new structs are the same length. We need to + * disambiguate them, which we can do because odp.len has never + * been less than the total of LRH+BTH+DETH so far, while + * dp.unit (same offset) unit is unlikely to get that high. + * Similarly, dp.data, the pointer to user at the same offset + * as odp.unit, is almost certainly at least one (512byte)page + * "above" NULL. The if-block below can be omitted if compatibility + * between a new driver and older diagnostic code is unimportant. + * compatibility the other direction (new diags, old driver) is + * handled in the diagnostic code, with a warning. + */ + if (dp.unit >= 20 && dp.data < 512) { + /* very probable version mismatch. Fix it up */ + memcpy(&odp, &dp, sizeof(odp)); + /* We got a legacy dp, copy elements to dp */ + dp.unit = odp.unit; + dp.data = odp.data; + dp.len = odp.len; + dp.pbc_wd = 0; /* Indicate we need to compute PBC wd */ + } + /* send count must be an exact number of dwords */ if (dp.len & 3) { ret = -EINVAL; @@ -371,9 +395,10 @@ static ssize_t ipath_diagpkt_write(struct file *fp, ret = -ENODEV; goto bail; } + /* Check link state, but not if we have custom PBC */ val = dd->ipath_lastibcstat & IPATH_IBSTATE_MASK; - if (val != IPATH_IBSTATE_INIT && val != IPATH_IBSTATE_ARM && - val != IPATH_IBSTATE_ACTIVE) { + if (!dp.pbc_wd && val != IPATH_IBSTATE_INIT && + val != IPATH_IBSTATE_ARM && val != IPATH_IBSTATE_ACTIVE) { ipath_cdbg(VERBOSE, "unit %u not ready (state %llx)\n", dd->ipath_unit, (unsigned long long) val); ret = -EINVAL; @@ -419,9 +444,13 @@ static ssize_t ipath_diagpkt_write(struct file *fp, ipath_cdbg(VERBOSE, "unit %u 0x%x+1w pio%d\n", dd->ipath_unit, plen - 1, pbufn); + if (dp.pbc_wd == 0) + /* Legacy operation, use computed pbc_wd */ + dp.pbc_wd = plen; + /* we have to flush after the PBC for correctness on some cpus * or WC buffer can be written out of order */ - writeq(plen, piobuf); + writeq(dp.pbc_wd, piobuf); ipath_flush_wc(); /* copy all by the trigger word, then flush, so it's written * to chip before trigger word, then write trigger word, then From arthur.jones at qlogic.com Tue Jun 19 16:42:47 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:42:47 -0700 Subject: [ofa-general] [PATCH 23/28] IB/ipath - send ACK invalid where appropriate In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234246.3794.44838.stgit@bauxite.internal.keyresearch.com> From: Robert Walsh The IB specification ch. 9.9.3 table 58 says that a QP which isn't set up for the operation should return a NAK invalid request. Signed-off-by: Robert Walsh --- drivers/infiniband/hw/ipath/ipath_rc.c | 13 +++++++------ drivers/infiniband/hw/ipath/ipath_ruc.c | 22 ++++++++++++++++++---- 2 files changed, 25 insertions(+), 10 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c index 9e71239..6423d9e 100644 --- a/drivers/infiniband/hw/ipath/ipath_rc.c +++ b/drivers/infiniband/hw/ipath/ipath_rc.c @@ -1711,6 +1711,9 @@ void ipath_rc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr, case OP(RDMA_WRITE_FIRST): case OP(RDMA_WRITE_ONLY): case OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE): + if (unlikely(!(qp->qp_access_flags & + IB_ACCESS_REMOTE_WRITE))) + goto nack_inv; /* consume RWQE */ /* RETH comes after BTH */ if (!header_in_data) @@ -1740,9 +1743,6 @@ void ipath_rc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr, qp->r_sge.sge.length = 0; qp->r_sge.sge.sge_length = 0; } - if (unlikely(!(qp->qp_access_flags & - IB_ACCESS_REMOTE_WRITE))) - goto nack_acc; if (opcode == OP(RDMA_WRITE_FIRST)) goto send_middle; else if (opcode == OP(RDMA_WRITE_ONLY)) @@ -1756,8 +1756,9 @@ void ipath_rc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr, u32 len; u8 next; - if (unlikely(!(qp->qp_access_flags & IB_ACCESS_REMOTE_READ))) - goto nack_acc; + if (unlikely(!(qp->qp_access_flags & + IB_ACCESS_REMOTE_READ))) + goto nack_inv; next = qp->r_head_ack_queue + 1; if (next > IPATH_MAX_RDMA_ATOMIC) next = 0; @@ -1832,7 +1833,7 @@ void ipath_rc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr, if (unlikely(!(qp->qp_access_flags & IB_ACCESS_REMOTE_ATOMIC))) - goto nack_acc; + goto nack_inv; next = qp->r_head_ack_queue + 1; if (next > IPATH_MAX_RDMA_ATOMIC) next = 0; diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c index 1a5afaf..c44e015 100644 --- a/drivers/infiniband/hw/ipath/ipath_ruc.c +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c @@ -320,12 +320,22 @@ again: break; case IB_WR_RDMA_WRITE_WITH_IMM: + if (unlikely(!(qp->qp_access_flags & + IB_ACCESS_REMOTE_WRITE))) { + wc.status = IB_WC_REM_INV_REQ_ERR; + goto err; + } wc.wc_flags = IB_WC_WITH_IMM; wc.imm_data = wqe->wr.imm_data; if (!ipath_get_rwqe(qp, 1)) goto rnr_nak; /* FALLTHROUGH */ case IB_WR_RDMA_WRITE: + if (unlikely(!(qp->qp_access_flags & + IB_ACCESS_REMOTE_WRITE))) { + wc.status = IB_WC_REM_INV_REQ_ERR; + goto err; + } if (wqe->length == 0) break; if (unlikely(!ipath_rkey_ok(qp, &qp->r_sge, wqe->length, @@ -355,8 +365,10 @@ again: case IB_WR_RDMA_READ: if (unlikely(!(qp->qp_access_flags & - IB_ACCESS_REMOTE_READ))) - goto acc_err; + IB_ACCESS_REMOTE_READ))) { + wc.status = IB_WC_REM_INV_REQ_ERR; + goto err; + } if (unlikely(!ipath_rkey_ok(qp, &sqp->s_sge, wqe->length, wqe->wr.wr.rdma.remote_addr, wqe->wr.wr.rdma.rkey, @@ -370,8 +382,10 @@ again: case IB_WR_ATOMIC_CMP_AND_SWP: case IB_WR_ATOMIC_FETCH_AND_ADD: if (unlikely(!(qp->qp_access_flags & - IB_ACCESS_REMOTE_ATOMIC))) - goto acc_err; + IB_ACCESS_REMOTE_ATOMIC))) { + wc.status = IB_WC_REM_INV_REQ_ERR; + goto err; + } if (unlikely(!ipath_rkey_ok(qp, &qp->r_sge, sizeof(u64), wqe->wr.wr.atomic.remote_addr, wqe->wr.wr.atomic.rkey, From arthur.jones at qlogic.com Tue Jun 19 16:42:52 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:42:52 -0700 Subject: [ofa-general] [PATCH 24/28] IB/ipath - ipath_poll fixups and enhancements In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com> From: Robert Walsh Fix ipath_poll and enhance it so we can poll for urgent packets or regular packets and receive notifications of when a header queue overflows. Signed-off-by: Robert Walsh --- drivers/infiniband/hw/ipath/ipath_common.h | 11 ++ drivers/infiniband/hw/ipath/ipath_file_ops.c | 125 +++++++++++++++++--------- drivers/infiniband/hw/ipath/ipath_intr.c | 38 ++++++-- drivers/infiniband/hw/ipath/ipath_kernel.h | 8 ++ 4 files changed, 131 insertions(+), 51 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_common.h b/drivers/infiniband/hw/ipath/ipath_common.h index f70788c..b4b786d 100644 --- a/drivers/infiniband/hw/ipath/ipath_common.h +++ b/drivers/infiniband/hw/ipath/ipath_common.h @@ -431,8 +431,15 @@ struct ipath_user_info { #define IPATH_CMD_UNUSED_1 25 #define IPATH_CMD_UNUSED_2 26 #define IPATH_CMD_PIOAVAILUPD 27 /* force an update of PIOAvail reg */ +#define IPATH_CMD_POLL_TYPE 28 /* set the kind of polling we want */ -#define IPATH_CMD_MAX 27 +#define IPATH_CMD_MAX 28 + +/* + * Poll types + */ +#define IPATH_POLL_TYPE_URGENT 0x01 +#define IPATH_POLL_TYPE_OVERFLOW 0x02 struct ipath_port_info { __u32 num_active; /* number of active units */ @@ -473,6 +480,8 @@ struct ipath_cmd { __u16 part_key; /* user address of __u32 bitmask of active slaves */ __u64 slave_mask_addr; + /* type of polling we want */ + __u16 poll_type; } cmd; }; diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c index a474796..33ab0d6 100644 --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c @@ -1341,65 +1341,98 @@ bail: return ret; } -static unsigned int ipath_poll(struct file *fp, - struct poll_table_struct *pt) +static unsigned int ipath_poll_urgent(struct ipath_portdata *pd, + struct file *fp, + struct poll_table_struct *pt) { - struct ipath_portdata *pd; - u32 head, tail; - int bit; unsigned pollflag = 0; struct ipath_devdata *dd; - pd = port_fp(fp); - if (!pd) - goto bail; dd = pd->port_dd; - bit = pd->port_port + INFINIPATH_R_INTRAVAIL_SHIFT; - set_bit(bit, &dd->ipath_rcvctrl); + if (test_bit(IPATH_PORT_WAITING_OVERFLOW, &pd->int_flag)) { + pollflag |= POLLERR; + clear_bit(IPATH_PORT_WAITING_OVERFLOW, &pd->int_flag); + } - /* - * Before blocking, make sure that head is still == tail, - * reading from the chip, so we can be sure the interrupt - * enable has made it to the chip. If not equal, disable - * interrupt again and return immediately. This avoids races, - * and the overhead of the chip read doesn't matter much at - * this point, since we are waiting for something anyway. - */ + if (test_bit(IPATH_PORT_WAITING_URG, &pd->int_flag)) { + pollflag |= POLLIN | POLLRDNORM; + clear_bit(IPATH_PORT_WAITING_URG, &pd->int_flag); + } - ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, - dd->ipath_rcvctrl); + if (!pollflag) { + set_bit(IPATH_PORT_WAITING_URG, &pd->port_flag); + if (pd->poll_type & IPATH_POLL_TYPE_OVERFLOW) + set_bit(IPATH_PORT_WAITING_OVERFLOW, + &pd->port_flag); + + poll_wait(fp, &pd->port_wait, pt); + } + + return pollflag; +} + +static unsigned int ipath_poll_next(struct ipath_portdata *pd, + struct file *fp, + struct poll_table_struct *pt) +{ + u32 head, tail; + unsigned pollflag = 0; + struct ipath_devdata *dd; + + dd = pd->port_dd; head = ipath_read_ureg32(dd, ur_rcvhdrhead, pd->port_port); - tail = ipath_read_ureg32(dd, ur_rcvhdrtail, pd->port_port); + tail = *(volatile u64 *)pd->port_rcvhdrtail_kvaddr; + + if (test_bit(IPATH_PORT_WAITING_OVERFLOW, &pd->int_flag)) { + pollflag |= POLLERR; + clear_bit(IPATH_PORT_WAITING_OVERFLOW, &pd->int_flag); + } - if (tail == head) { + if (tail != head || + test_bit(IPATH_PORT_WAITING_RCV, &pd->int_flag)) { + pollflag |= POLLIN | POLLRDNORM; + clear_bit(IPATH_PORT_WAITING_RCV, &pd->int_flag); + } + + if (!pollflag) { set_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag); + if (pd->poll_type & IPATH_POLL_TYPE_OVERFLOW) + set_bit(IPATH_PORT_WAITING_OVERFLOW, + &pd->port_flag); + + set_bit(pd->port_port + INFINIPATH_R_INTRAVAIL_SHIFT, + &dd->ipath_rcvctrl); + + ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, + dd->ipath_rcvctrl); + if (dd->ipath_rhdrhead_intr_off) /* arm rcv interrupt */ - (void)ipath_write_ureg(dd, ur_rcvhdrhead, - dd->ipath_rhdrhead_intr_off - | head, pd->port_port); - poll_wait(fp, &pd->port_wait, pt); + ipath_write_ureg(dd, ur_rcvhdrhead, + dd->ipath_rhdrhead_intr_off | head, + pd->port_port); - if (test_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag)) { - /* timed out, no packets received */ - clear_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag); - pd->port_rcvwait_to++; - } - else - pollflag = POLLIN | POLLRDNORM; - } - else { - /* it's already happened; don't do wait_event overhead */ - pollflag = POLLIN | POLLRDNORM; - pd->port_rcvnowait++; + poll_wait(fp, &pd->port_wait, pt); } - clear_bit(bit, &dd->ipath_rcvctrl); - ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, - dd->ipath_rcvctrl); + return pollflag; +} + +static unsigned int ipath_poll(struct file *fp, + struct poll_table_struct *pt) +{ + struct ipath_portdata *pd; + unsigned pollflag; + + pd = port_fp(fp); + if (!pd) + pollflag = 0; + else if (pd->poll_type & IPATH_POLL_TYPE_URGENT) + pollflag = ipath_poll_urgent(pd, fp, pt); + else + pollflag = ipath_poll_next(pd, fp, pt); -bail: return pollflag; } @@ -2173,6 +2206,11 @@ static ssize_t ipath_write(struct file *fp, const char __user *data, src = NULL; dest = NULL; break; + case IPATH_CMD_POLL_TYPE: + copy = sizeof(cmd.cmd.poll_type); + dest = &cmd.cmd.poll_type; + src = &ucmd->cmd.poll_type; + break; default: ret = -EINVAL; goto bail; @@ -2245,6 +2283,9 @@ static ssize_t ipath_write(struct file *fp, const char __user *data, case IPATH_CMD_PIOAVAILUPD: ret = ipath_force_pio_avail_update(pd->port_dd); break; + case IPATH_CMD_POLL_TYPE: + pd->poll_type = cmd.cmd.poll_type; + break; } if (ret >= 0) diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c index 948091f..f8aac8e 100644 --- a/drivers/infiniband/hw/ipath/ipath_intr.c +++ b/drivers/infiniband/hw/ipath/ipath_intr.c @@ -680,6 +680,17 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs) chkerrpkts = 1; dd->ipath_lastrcvhdrqtails[i] = tl; pd->port_hdrqfull++; + if (test_bit(IPATH_PORT_WAITING_OVERFLOW, + &pd->port_flag)) { + clear_bit( + IPATH_PORT_WAITING_OVERFLOW, + &pd->port_flag); + set_bit( + IPATH_PORT_WAITING_OVERFLOW, + &pd->int_flag); + wake_up_interruptible( + &pd->port_wait); + } } } } @@ -877,14 +888,25 @@ static void handle_urcv(struct ipath_devdata *dd, u32 istat) dd->ipath_i_rcvurg_mask); for (i = 1; i < dd->ipath_cfgports; i++) { struct ipath_portdata *pd = dd->ipath_pd[i]; - if (portr & (1 << i) && pd && pd->port_cnt && - test_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag)) { - clear_bit(IPATH_PORT_WAITING_RCV, - &pd->port_flag); - clear_bit(i + INFINIPATH_R_INTRAVAIL_SHIFT, - &dd->ipath_rcvctrl); - wake_up_interruptible(&pd->port_wait); - rcvdint = 1; + if (portr & (1 << i) && pd && pd->port_cnt) { + if (test_bit(IPATH_PORT_WAITING_RCV, + &pd->port_flag)) { + clear_bit(IPATH_PORT_WAITING_RCV, + &pd->port_flag); + set_bit(IPATH_PORT_WAITING_RCV, + &pd->int_flag); + clear_bit(i + INFINIPATH_R_INTRAVAIL_SHIFT, + &dd->ipath_rcvctrl); + wake_up_interruptible(&pd->port_wait); + rcvdint = 1; + } else if (test_bit(IPATH_PORT_WAITING_URG, + &pd->port_flag)) { + clear_bit(IPATH_PORT_WAITING_URG, + &pd->port_flag); + set_bit(IPATH_PORT_WAITING_URG, + &pd->int_flag); + wake_up_interruptible(&pd->port_wait); + } } } if (rcvdint) { diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index 2e85aec..034c283 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -127,6 +127,8 @@ struct ipath_portdata { u32 port_tidcursor; /* next expected TID to check */ unsigned long port_flag; + /* what happened */ + unsigned long int_flag; /* WAIT_RCV that timed out, no interrupt */ u32 port_rcvwait_to; /* WAIT_PIO that timed out, no interrupt */ @@ -155,6 +157,8 @@ struct ipath_portdata { u32 userversion; /* Bitmask of active slaves */ u32 active_slaves; + /* Type of packets or conditions we want to poll for */ + u16 poll_type; }; struct sk_buff; @@ -754,6 +758,10 @@ int ipath_set_rx_pol_inv(struct ipath_devdata *dd, u8 new_pol_inv); #define IPATH_PORT_WAITING_PIO 3 /* master has not finished initializing */ #define IPATH_PORT_MASTER_UNINIT 4 + /* waiting for an urgent packet to arrive */ +#define IPATH_PORT_WAITING_URG 5 + /* waiting for a header overflow */ +#define IPATH_PORT_WAITING_OVERFLOW 6 /* free up any allocated data at closes */ void ipath_free_data(struct ipath_portdata *dd); From arthur.jones at qlogic.com Tue Jun 19 16:42:58 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:42:58 -0700 Subject: [ofa-general] [PATCH 25/28] IB/ipath - clean send flags properly on QP reset. In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234257.3794.52010.stgit@bauxite.internal.keyresearch.com> From: Robert Walsh Signed-off-by: Robert Walsh Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_qp.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c b/drivers/infiniband/hw/ipath/ipath_qp.c index 9e07abb..bfd39c9 100644 --- a/drivers/infiniband/hw/ipath/ipath_qp.c +++ b/drivers/infiniband/hw/ipath/ipath_qp.c @@ -336,7 +336,7 @@ static void ipath_reset_qp(struct ipath_qp *qp) qp->qkey = 0; qp->qp_access_flags = 0; qp->s_busy = 0; - qp->s_flags &= ~IPATH_S_SIGNAL_REQ_WR; + qp->s_flags &= IPATH_S_SIGNAL_REQ_WR; qp->s_hdrwords = 0; qp->s_psn = 0; qp->r_psn = 0; From arthur.jones at qlogic.com Tue Jun 19 16:43:04 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:43:04 -0700 Subject: [ofa-general] [PATCH 26/28] IB/ipath - print warning if LID not acquired and link ACTIVE within one minute In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234303.3794.75856.stgit@bauxite.internal.keyresearch.com> From: Robert Walsh Signed-off-by: Robert Walsh Signed-off-by: Bryan O'Sullivan --- drivers/infiniband/hw/ipath/ipath_driver.c | 45 ++++++++++++++++++++++++++++ drivers/infiniband/hw/ipath/ipath_kernel.h | 3 ++ 2 files changed, 48 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index 8b61179..1d2369b 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -104,6 +104,13 @@ static int __devinit ipath_init_one(struct pci_dev *, #define PCI_DEVICE_ID_INFINIPATH_HT 0xd #define PCI_DEVICE_ID_INFINIPATH_PE800 0x10 +/* + * Number of seconds before we complain about not getting a LID + * assignment. + */ + +#define LID_TIMEOUT 60 + static const struct pci_device_id ipath_pci_tbl[] = { { PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, PCI_DEVICE_ID_INFINIPATH_HT) }, { PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, PCI_DEVICE_ID_INFINIPATH_PE800) }, @@ -119,6 +126,32 @@ static struct pci_driver ipath_driver = { .id_table = ipath_pci_tbl, }; +static void check_link_status(struct work_struct *work) +{ + struct ipath_devdata *dd = container_of(work, struct ipath_devdata, + link_work); + + /* + * If we're in the NOCABLE state, try again in another minute. + */ + + if (*dd->ipath_statusp & IPATH_STATUS_IB_NOCABLE) { + schedule_delayed_work(&dd->link_work, HZ * LID_TIMEOUT); + return; + } + + /* + * If we don't have a LID, let the user know and don't bother + * checking again. + */ + + if (dd->ipath_lid == 0) + dev_info(&dd->pcidev->dev, + "We don't have a LID yet (no subnet manager?)\n"); + else if (!(*dd->ipath_statusp & IPATH_STATUS_IB_READY)) + dev_info(&dd->pcidev->dev, + "LID assigned, but IB link is not ACTIVE\n"); +} static inline void read_bars(struct ipath_devdata *dd, struct pci_dev *dev, u32 *bar0, u32 *bar1) @@ -187,6 +220,8 @@ static struct ipath_devdata *ipath_alloc_devdata(struct pci_dev *pdev) dd->pcidev = pdev; pci_set_drvdata(pdev, dd); + INIT_DELAYED_WORK(&dd->link_work, check_link_status); + list_add(&dd->ipath_list, &ipath_dev_list); bail_unlock: @@ -511,6 +546,9 @@ static int __devinit ipath_init_one(struct pci_dev *pdev, ipath_diag_add(dd); ipath_register_ib_device(dd); + /* Check that we have a LID in LID_TIMEOUT seconds. */ + schedule_delayed_work(&dd->link_work, HZ * LID_TIMEOUT); + goto bail; bail_irqsetup: @@ -638,6 +676,9 @@ static void __devexit ipath_remove_one(struct pci_dev *pdev) */ ipath_shutdown_device(dd); + cancel_delayed_work(&dd->link_work); + flush_scheduled_work(); + if (dd->verbs_dev) ipath_unregister_ib_device(dd->verbs_dev); @@ -1840,6 +1881,10 @@ int ipath_set_lid(struct ipath_devdata *dd, u32 arg, u8 lmc) dd->ipath_lid = arg; dd->ipath_lmc = lmc; + ipath_layer_lid_changed(dd); + + dev_info(&dd->pcidev->dev, "We got a lid: 0x%x\n", arg); + return 0; } diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index 034c283..f261af1 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -574,6 +574,9 @@ struct ipath_devdata { u32 ipath_overrun_thresh_errs; u32 ipath_lli_errs; + /* Link status check work */ + struct delayed_work link_work; + /* * Not all devices managed by a driver instance are the same * type, so these fields must be per-device. From arthur.jones at qlogic.com Tue Jun 19 16:43:10 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:43:10 -0700 Subject: [ofa-general] [PATCH 27/28] IB/ipath - when we check for LID availability, check for lack of interrupts too. In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234309.3794.784.stgit@bauxite.internal.keyresearch.com> All too often, interrupts do not get enabled for our card due to bios misconfiguration and other issues. This patch checks for that condition on startup when checking for LID availability and warns the user. Signed-off-by: Arthur Jones --- drivers/infiniband/hw/ipath/ipath_driver.c | 8 +++++--- drivers/infiniband/hw/ipath/ipath_intr.c | 3 +++ drivers/infiniband/hw/ipath/ipath_kernel.h | 3 +++ 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index 1d2369b..825ed4d 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -141,11 +141,13 @@ static void check_link_status(struct work_struct *work) } /* - * If we don't have a LID, let the user know and don't bother - * checking again. + * If we don't have a LID or interrupts, let the user know and + * don't bother checking again. */ - if (dd->ipath_lid == 0) + if (dd->ipath_int_counter == 0) + dev_err(&dd->pcidev->dev, "No interrupts detected.\n"); + else if (dd->ipath_lid == 0) dev_info(&dd->pcidev->dev, "We don't have a LID yet (no subnet manager?)\n"); else if (!(*dd->ipath_statusp & IPATH_STATUS_IB_READY)) diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c index f8aac8e..ced591d 100644 --- a/drivers/infiniband/hw/ipath/ipath_intr.c +++ b/drivers/infiniband/hw/ipath/ipath_intr.c @@ -932,6 +932,9 @@ irqreturn_t ipath_intr(int irq, void *data) ipath_stats.sps_ints++; + if (dd->ipath_int_counter != (u32) -1) + dd->ipath_int_counter++; + if (!(dd->ipath_flags & IPATH_PRESENT)) { /* * This return value is not great, but we do not want the diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index f261af1..381c97e 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -297,6 +297,9 @@ struct ipath_devdata { u32 ipath_lastport_piobuf; /* is a stats timer active */ u32 ipath_stats_timer_active; + atomic_t ipath_rewrite_timer_active; + /* number of interrupts for this device -- saturates... */ + u32 ipath_int_counter; /* dwords sent read from counter */ u32 ipath_lastsword; /* dwords received read from counter */ From arthur.jones at qlogic.com Tue Jun 19 16:43:16 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 19 Jun 2007 16:43:16 -0700 Subject: [ofa-general] [PATCH 28/28] IB/ipath - update copyright dates In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070619234315.3794.72264.stgit@bauxite.internal.keyresearch.com> From: John Gregor Now that it's june, it's about time to update the copyright notices of files that have changed. Signed-off-by: John Gregor --- drivers/infiniband/hw/ipath/ipath_cq.c | 2 +- drivers/infiniband/hw/ipath/ipath_debug.h | 2 +- drivers/infiniband/hw/ipath/ipath_diag.c | 2 +- drivers/infiniband/hw/ipath/ipath_driver.c | 2 +- drivers/infiniband/hw/ipath/ipath_eeprom.c | 2 +- drivers/infiniband/hw/ipath/ipath_fs.c | 2 +- drivers/infiniband/hw/ipath/ipath_iba6110.c | 2 +- drivers/infiniband/hw/ipath/ipath_iba6120.c | 2 +- drivers/infiniband/hw/ipath/ipath_init_chip.c | 2 +- drivers/infiniband/hw/ipath/ipath_intr.c | 2 +- drivers/infiniband/hw/ipath/ipath_kernel.h | 2 +- drivers/infiniband/hw/ipath/ipath_keys.c | 2 +- drivers/infiniband/hw/ipath/ipath_layer.c | 2 +- drivers/infiniband/hw/ipath/ipath_layer.h | 2 +- drivers/infiniband/hw/ipath/ipath_mad.c | 2 +- drivers/infiniband/hw/ipath/ipath_mmap.c | 2 +- drivers/infiniband/hw/ipath/ipath_mr.c | 2 +- drivers/infiniband/hw/ipath/ipath_qp.c | 2 +- drivers/infiniband/hw/ipath/ipath_rc.c | 2 +- drivers/infiniband/hw/ipath/ipath_registers.h | 2 +- drivers/infiniband/hw/ipath/ipath_ruc.c | 2 +- drivers/infiniband/hw/ipath/ipath_srq.c | 2 +- drivers/infiniband/hw/ipath/ipath_stats.c | 2 +- drivers/infiniband/hw/ipath/ipath_sysfs.c | 2 +- drivers/infiniband/hw/ipath/ipath_uc.c | 2 +- drivers/infiniband/hw/ipath/ipath_ud.c | 2 +- drivers/infiniband/hw/ipath/ipath_user_pages.c | 2 +- drivers/infiniband/hw/ipath/ipath_verbs.c | 2 +- drivers/infiniband/hw/ipath/ipath_verbs.h | 2 +- drivers/infiniband/hw/ipath/ipath_verbs_mcast.c | 2 +- drivers/infiniband/hw/ipath/ipath_wc_ppc64.c | 2 +- drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 2 +- 32 files changed, 32 insertions(+), 32 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c b/drivers/infiniband/hw/ipath/ipath_cq.c index 8a2a774..8b4673b 100644 --- a/drivers/infiniband/hw/ipath/ipath_cq.c +++ b/drivers/infiniband/hw/ipath/ipath_cq.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_debug.h b/drivers/infiniband/hw/ipath/ipath_debug.h index 42bfbdb..19c56e6 100644 --- a/drivers/infiniband/hw/ipath/ipath_debug.h +++ b/drivers/infiniband/hw/ipath/ipath_debug.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_diag.c b/drivers/infiniband/hw/ipath/ipath_diag.c index aab21c1..a698f19 100644 --- a/drivers/infiniband/hw/ipath/ipath_diag.c +++ b/drivers/infiniband/hw/ipath/ipath_diag.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index 825ed4d..7893a5a 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_eeprom.c b/drivers/infiniband/hw/ipath/ipath_eeprom.c index 9be1b9a..6b91479 100644 --- a/drivers/infiniband/hw/ipath/ipath_eeprom.c +++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_fs.c b/drivers/infiniband/hw/ipath/ipath_fs.c index 40cf1bc..2e689b9 100644 --- a/drivers/infiniband/hw/ipath/ipath_fs.c +++ b/drivers/infiniband/hw/ipath/ipath_fs.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c index 34d159a..87b18e9 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6110.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c index 0c34555..e67e4a8 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index 5193d69..1b1af34 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c index ced591d..ca3e877 100644 --- a/drivers/infiniband/hw/ipath/ipath_intr.c +++ b/drivers/infiniband/hw/ipath/ipath_intr.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index 381c97e..d374c87 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -1,7 +1,7 @@ #ifndef _IPATH_KERNEL_H #define _IPATH_KERNEL_H /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_keys.c b/drivers/infiniband/hw/ipath/ipath_keys.c index dd487c1..85a4aef 100644 --- a/drivers/infiniband/hw/ipath/ipath_keys.c +++ b/drivers/infiniband/hw/ipath/ipath_keys.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_layer.c b/drivers/infiniband/hw/ipath/ipath_layer.c index 05a1d2b..82616b7 100644 --- a/drivers/infiniband/hw/ipath/ipath_layer.c +++ b/drivers/infiniband/hw/ipath/ipath_layer.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_layer.h b/drivers/infiniband/hw/ipath/ipath_layer.h index 3854a4e..415709c 100644 --- a/drivers/infiniband/hw/ipath/ipath_layer.h +++ b/drivers/infiniband/hw/ipath/ipath_layer.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c b/drivers/infiniband/hw/ipath/ipath_mad.c index 2e9e161..2aaa029 100644 --- a/drivers/infiniband/hw/ipath/ipath_mad.c +++ b/drivers/infiniband/hw/ipath/ipath_mad.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_mmap.c b/drivers/infiniband/hw/ipath/ipath_mmap.c index 937bc33..fa830e2 100644 --- a/drivers/infiniband/hw/ipath/ipath_mmap.c +++ b/drivers/infiniband/hw/ipath/ipath_mmap.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/ipath/ipath_mr.c b/drivers/infiniband/hw/ipath/ipath_mr.c index bdeef8d..e442470 100644 --- a/drivers/infiniband/hw/ipath/ipath_mr.c +++ b/drivers/infiniband/hw/ipath/ipath_mr.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c b/drivers/infiniband/hw/ipath/ipath_qp.c index bfd39c9..d317b81 100644 --- a/drivers/infiniband/hw/ipath/ipath_qp.c +++ b/drivers/infiniband/hw/ipath/ipath_qp.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c index 6423d9e..46744ea 100644 --- a/drivers/infiniband/hw/ipath/ipath_rc.c +++ b/drivers/infiniband/hw/ipath/ipath_rc.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_registers.h b/drivers/infiniband/hw/ipath/ipath_registers.h index c182bcd..708eba3 100644 --- a/drivers/infiniband/hw/ipath/ipath_registers.h +++ b/drivers/infiniband/hw/ipath/ipath_registers.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c index c44e015..38d1d9b 100644 --- a/drivers/infiniband/hw/ipath/ipath_ruc.c +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_srq.c b/drivers/infiniband/hw/ipath/ipath_srq.c index 4b4214e..83d2569 100644 --- a/drivers/infiniband/hw/ipath/ipath_srq.c +++ b/drivers/infiniband/hw/ipath/ipath_srq.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_stats.c b/drivers/infiniband/hw/ipath/ipath_stats.c index 2955f36..73ed17d 100644 --- a/drivers/infiniband/hw/ipath/ipath_stats.c +++ b/drivers/infiniband/hw/ipath/ipath_stats.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_sysfs.c b/drivers/infiniband/hw/ipath/ipath_sysfs.c index ab34d3e..16238cd 100644 --- a/drivers/infiniband/hw/ipath/ipath_sysfs.c +++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_uc.c b/drivers/infiniband/hw/ipath/ipath_uc.c index 243d7c6..8380fbc 100644 --- a/drivers/infiniband/hw/ipath/ipath_uc.c +++ b/drivers/infiniband/hw/ipath/ipath_uc.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_ud.c b/drivers/infiniband/hw/ipath/ipath_ud.c index 26171e5..c22920b 100644 --- a/drivers/infiniband/hw/ipath/ipath_ud.c +++ b/drivers/infiniband/hw/ipath/ipath_ud.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_user_pages.c b/drivers/infiniband/hw/ipath/ipath_user_pages.c index 8536aeb..27034d3 100644 --- a/drivers/infiniband/hw/ipath/ipath_user_pages.c +++ b/drivers/infiniband/hw/ipath/ipath_user_pages.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index 6753f7d..66b8287 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h index 458f822..f3d1f2c 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.h +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c b/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c index dd691cf..9e5abf9 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two diff --git a/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c b/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c index 0095bb7..1d7bd82 100644 --- a/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c +++ b/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c index 9f409fd..3428acb 100644 --- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c +++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 QLogic, Inc. All rights reserved. + * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved. * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two From Frank.Leers at Sun.COM Tue Jun 19 17:00:14 2007 From: Frank.Leers at Sun.COM (Frank Leers) Date: Tue, 19 Jun 2007 17:00:14 -0700 Subject: [ofa-general] don't want to rebuild all rpm's from install.sh Message-ID: <1182297614.1774.30.camel@localhost> If I understand the Installation Guide doc correctly I should be able to just install rpm's using the install.sh script without rebuilding the rpm's. I have built the rpm's successfully and installed them on a node in my cluster via an NFS mount. I'd now like to install the rest of my nodes using './install.sh -c <> -net <>' but this results in a rebuild of the rpm's all over again. I'm obviously missing something here, although another section of the doc mentions building once and then installing the resultant rpm's on all other nodes via standard tools in parallel - 'pdsh ...rpm -ivd ...' Can I rerun install.sh to simply install rpm's and configure ipoib etc. on the rest of my nodes somehow without rebuilding? thanks, -frank From sweitzen at cisco.com Tue Jun 19 20:05:34 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 19 Jun 2007 20:05:34 -0700 Subject: [ofa-general] don't want to rebuild all rpm's from install.sh In-Reply-To: <1182297614.1774.30.camel@localhost> References: <1182297614.1774.30.camel@localhost> Message-ID: Once you build your rpms on one node, you can just install them with "rpm" on the other nodes instead of "install.sh". Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > Frank Leers > Sent: Tuesday, June 19, 2007 5:00 PM > To: general at lists.openfabrics.org > Subject: [ofa-general] don't want to rebuild all rpm's from install.sh > > If I understand the Installation Guide doc correctly I should > be able to > just install rpm's using the install.sh script without rebuilding the > rpm's. I have built the rpm's successfully and installed > them on a node > in my cluster via an NFS mount. I'd now like to install the > rest of my > nodes using './install.sh -c <> -net <>' but this results in a rebuild > of the rpm's all over again. > > I'm obviously missing something here, although another section of the > doc mentions building once and then installing the resultant rpm's on > all other nodes via standard tools in parallel - 'pdsh ...rpm > -ivd ...' > > Can I rerun install.sh to simply install rpm's and configure > ipoib etc. > on the rest of my nodes somehow without rebuilding? > > thanks, > > -frank > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From vlad at dev.mellanox.co.il Tue Jun 19 23:34:38 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 20 Jun 2007 09:34:38 +0300 Subject: [ofa-general] don't want to rebuild all rpm's from install.sh In-Reply-To: <1182297614.1774.30.camel@localhost> References: <1182297614.1774.30.camel@localhost> Message-ID: <4678CA7E.9090200@dev.mellanox.co.il> Frank Leers wrote: > If I understand the Installation Guide doc correctly I should be able to > just install rpm's using the install.sh script without rebuilding the > rpm's. I have built the rpm's successfully and installed them on a node > in my cluster via an NFS mount. I'd now like to install the rest of my > nodes using './install.sh -c <> -net <>' but this results in a rebuild > of the rpm's all over again. > Yes, It should work this way if all of the nodes have the same Arch/OS/kernel. Can you send me the ofed.conf file (that you use after '-c' parameter), the output of the './install.sh -c <> -net <>' command and Arch/OS/kernel of your nodes. Thanks, Vladimir From yangdong at ncic.ac.cn Wed Jun 20 00:14:45 2007 From: yangdong at ncic.ac.cn (ncic) Date: Wed, 20 Jun 2007 15:14:45 +0800 Subject: [ofa-general] why netwoked file system(e.g. nfs, pvfs, etc.) supported IB by using access layer (linux kernel ib ops) Message-ID: <4678D3E5.706@ncic.ac.cn> why didn't they support ib with sdp? From kliteyn at dev.mellanox.co.il Wed Jun 20 00:42:59 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 20 Jun 2007 10:42:59 +0300 Subject: [ofa-general] [PATCH] osm: cosmetics in ftree - added get_guid functions for switch and hca Message-ID: <4678DA83.2050700@dev.mellanox.co.il> Hi Hal, Cosmetic code changes in fat-tree: added get_guid_ho and get_guid_no functions for switches and hca's -- Yevgeny Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_ucast_ftree.c | 77 +++++++++++++++++++++++++++++---------- 1 files changed, 58 insertions(+), 19 deletions(-) diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c index 1ead199..1ae8b29 100644 --- a/opensm/opensm/osm_ucast_ftree.c +++ b/opensm/opensm/osm_ucast_ftree.c @@ -640,6 +640,26 @@ __osm_ftree_sw_destroy( /***************************************************/ +static uint64_t +__osm_ftree_sw_get_guid_no( + IN ftree_sw_t * p_sw) +{ + if (!p_sw) + return 0; + return osm_node_get_node_guid(p_sw->p_osm_sw->p_node); +} + +/***************************************************/ + +static uint64_t +__osm_ftree_sw_get_guid_ho( + IN ftree_sw_t * p_sw) +{ + return cl_ntoh64(__osm_ftree_sw_get_guid_no(p_sw)); +} + +/***************************************************/ + static void __osm_ftree_sw_dump( IN ftree_fabric_t * p_ftree, @@ -657,7 +677,7 @@ __osm_ftree_sw_dump( "__osm_ftree_sw_dump: " "Switch index: %s, GUID: 0x%016" PRIx64 ", Ports: %u DOWN, %u UP\n", __osm_ftree_tuple_to_str(p_sw->tuple), - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), + __osm_ftree_sw_get_guid_ho(p_sw), p_sw->down_port_groups_num, p_sw->up_port_groups_num); @@ -835,6 +855,26 @@ __osm_ftree_hca_destroy( /***************************************************/ +static uint64_t +__osm_ftree_hca_get_guid_no( + IN ftree_hca_t * p_hca) +{ + if (!p_hca) + return 0; + return osm_node_get_node_guid(p_hca->p_osm_node); +} + +/***************************************************/ + +static uint64_t +__osm_ftree_hca_get_guid_ho( + IN ftree_hca_t * p_hca) +{ + return cl_ntoh64(__osm_ftree_hca_get_guid_no(p_hca)); +} + +/***************************************************/ + static void __osm_ftree_hca_dump( IN ftree_fabric_t * p_ftree, @@ -851,7 +891,7 @@ __osm_ftree_hca_dump( osm_log(&p_ftree->p_osm->log, OSM_LOG_DEBUG, "__osm_ftree_hca_dump: " "CA GUID: 0x%016" PRIx64 ", Ports: %u UP\n", - cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)), + __osm_ftree_hca_get_guid_ho(p_hca), p_hca->up_port_groups_num); for( i = 0; i < p_hca->up_port_groups_num; i++ ) @@ -1214,7 +1254,7 @@ __osm_ftree_fabric_dump_general_info( osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE, "__osm_ftree_fabric_dump_general_info: " " GUID: 0x%016" PRIx64 ", LID: 0x%x, Index %s\n", - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), + __osm_ftree_sw_get_guid_ho(p_sw), cl_ntoh16(p_sw->base_lid), __osm_ftree_tuple_to_str(p_sw->tuple)); } @@ -1227,8 +1267,7 @@ __osm_ftree_fabric_dump_general_info( osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE, "__osm_ftree_fabric_dump_general_info: " " GUID: 0x%016" PRIx64 ", LID: 0x%x, Index %s\n", - cl_ntoh64(osm_node_get_node_guid( - p_ftree->leaf_switches[i]->p_osm_sw->p_node)), + __osm_ftree_sw_get_guid_ho(p_ftree->leaf_switches[i]), cl_ntoh16(p_ftree->leaf_switches[i]->base_lid), __osm_ftree_tuple_to_str(p_ftree->leaf_switches[i]->tuple)); } @@ -1442,7 +1481,7 @@ __osm_ftree_fabric_make_indexing( p_sw->rank, __osm_ftree_tuple_to_str(p_sw->tuple), cl_ntoh16(p_sw->base_lid), - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node))); + __osm_ftree_sw_get_guid_ho(p_sw)); /* * Now run BFS and assign indexes to all switches @@ -1617,11 +1656,11 @@ __osm_ftree_fabric_validate_topology( "ERR AB09: Different number of upward port groups on switches:\n" " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u groups\n" " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u groups\n", - cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)), + __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]), cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid), __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple), reference_sw_arr[p_sw->rank]->up_port_groups_num, - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), + __osm_ftree_sw_get_guid_ho(p_sw), cl_ntoh16(p_sw->base_lid), __osm_ftree_tuple_to_str(p_sw->tuple), p_sw->up_port_groups_num); @@ -1638,11 +1677,11 @@ __osm_ftree_fabric_validate_topology( "ERR AB0A: Different number of downward port groups on switches:\n" " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u port groups\n" " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u port groups\n", - cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)), + __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]), cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid), __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple), reference_sw_arr[p_sw->rank]->down_port_groups_num, - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), + __osm_ftree_sw_get_guid_ho(p_sw), cl_ntoh16(p_sw->base_lid), __osm_ftree_tuple_to_str(p_sw->tuple), p_sw->down_port_groups_num); @@ -1663,11 +1702,11 @@ __osm_ftree_fabric_validate_topology( "ERR AB0B: Different number of ports in an upward port group on switches:\n" " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n" " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n", - cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)), + __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]), cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid), __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple), cl_ptr_vector_get_size(&p_ref_group->ports), - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), + __osm_ftree_sw_get_guid_ho(p_sw), cl_ntoh16(p_sw->base_lid), __osm_ftree_tuple_to_str(p_sw->tuple), cl_ptr_vector_get_size(&p_group->ports)); @@ -1691,11 +1730,11 @@ __osm_ftree_fabric_validate_topology( "ERR AB0C: Different number of ports in an downward port group on switches:\n" " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n" " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n", - cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)), + __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]), cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid), __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple), cl_ptr_vector_get_size(&p_ref_group->ports), - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), + __osm_ftree_sw_get_guid_ho(p_sw), cl_ntoh16(p_sw->base_lid), __osm_ftree_tuple_to_str(p_sw->tuple), cl_ptr_vector_get_size(&p_group->ports)); @@ -2508,7 +2547,7 @@ __osm_ftree_rank_leaf_switches( "__osm_ftree_rank_leaf_switches: ERR AB0F: " "CA conected directly to another CA: " "0x%016" PRIx64 " <---> 0x%016" PRIx64 "\n", - cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)), + __osm_ftree_hca_get_guid_ho(p_hca), cl_ntoh64(osm_node_get_node_guid(p_remote_osm_node))); res = -1; goto Exit; @@ -2548,8 +2587,8 @@ __osm_ftree_rank_leaf_switches( " - CA guid : 0x%016" PRIx64 "\n" " - Switch guid: 0x%016" PRIx64 "\n" " - Switch LID : 0x%x\n", - cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)), - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), + __osm_ftree_hca_get_guid_ho(p_hca), + __osm_ftree_sw_get_guid_ho(p_sw), cl_ntoh16(p_sw->base_lid)); cl_list_insert_tail(p_ranking_bfs_list, &__osm_ftree_sw_tbl_element_create(p_sw)->map_item); @@ -2740,10 +2779,10 @@ __osm_ftree_fabric_construct_sw_ports( " GUID 0x%016" PRIx64 ", LID 0x%x, rank %u\n", p_sw->rank, p_remote_sw->rank, - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), + __osm_ftree_sw_get_guid_ho(p_sw), cl_ntoh16(p_sw->base_lid), p_sw->rank, - cl_ntoh64(osm_node_get_node_guid(p_remote_sw->p_osm_sw->p_node)), + __osm_ftree_sw_get_guid_ho(p_remote_sw), cl_ntoh16(p_remote_sw->base_lid), p_remote_sw->rank); res = -1; -- 1.5.1.4 From erezz at voltaire.com Wed Jun 20 02:19:02 2007 From: erezz at voltaire.com (Erez Zilber) Date: Wed, 20 Jun 2007 12:19:02 +0300 Subject: [ofa-general] [PATCH 1/2] IB/iser: add open-iscsi over iSER support for RHAS4 in OFED scripts In-Reply-To: <4641D32D.6030505@voltaire.com> References: <4641D295.5060907@voltaire.com> <4641D32D.6030505@voltaire.com> Message-ID: <4678F106.9090508@voltaire.com> Erez Zilber wrote: > Add support for open-iscsi over iSER in RHAS4 in OFED's scripts. > > Signed-off-by: Erez Zilber > --- > build.sh | 2 +- > build_env.sh | 4 ++-- > install.sh | 2 +- > 3 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/build.sh b/build.sh > index d54c55d..be2d1e6 100755 > --- a/build.sh > +++ b/build.sh > @@ -344,7 +344,7 @@ open-iscsi() > SuSE) > ex "$MV -f ${RPM_DIR}/RPMS/$build_arch/${OPEN_ISCSI_SUSE_NAME}-${OPEN_ISCSI_VERSION}.${build_arch}.rpm $RPMS" > ;; > - redhat5) > + redhat|redhat5) > ex "$MV -f ${RPM_DIR}/RPMS/$build_arch/${OPEN_ISCSI_REDHAT_NAME}-${OPEN_ISCSI_VERSION}.${build_arch}.rpm $RPMS" > ;; > *) > diff --git a/build_env.sh b/build_env.sh > index 6e65b21..49821b4 100644 > --- a/build_env.sh > +++ b/build_env.sh > @@ -135,7 +135,7 @@ IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES > # Iser > # Currently iSER is supported only on SLES10 & RHEL5 > case ${K_VER} in > - 2.6.16.*-*-*|2.6.*.el5) > + 2.6.16.*-*-*|2.6.*.el5|2.6.9-*.EL*) > IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES} ib_iser" > ;; > esac > @@ -1998,7 +1998,7 @@ set_package_deps() > ib_iser) > # Currently iSER is supported only on SLES10 & RHEL5 > case ${K_VER} in > - 2.6.16.*-*-*|2.6.*.el5) > + 2.6.16.*-*-*|2.6.*.el5|2.6.9-*.EL*) > OFA_KERNEL_PACKAGES=$(echo "$OFA_KERNEL_PACKAGES ib_verbs ${ll_driver} ib_iser" | tr -s ' ' '\n' | sort -n | uniq) > OFA_PACKAGES=$(echo "$OFA_PACKAGES kernel-ib" | tr -s ' ' '\n' | sort -n | uniq) > EXTRA_PACKAGES=$(echo "$EXTRA_PACKAGES open-iscsi" | tr -s ' ' '\n' | sort -rn | uniq) > diff --git a/install.sh b/install.sh > index f9ed6da..dadc144 100755 > --- a/install.sh > +++ b/install.sh > @@ -990,7 +990,7 @@ # fi > err_echo "${OPEN_ISCSI_SUSE_NAME}-${OPEN_ISCSI_VERSION}.${build_arch}.rpm not found under ${RPMS}." > fi > ;; > - redhat5) > + redhat|redhat5) > if [ -f ${RPMS}/${OPEN_ISCSI_REDHAT_NAME}-${OPEN_ISCSI_VERSION}.${build_arch}.rpm ]; then > ex "$RPM -Uhv --oldpackage ${RPMS}/${OPEN_ISCSI_REDHAT_NAME}-${OPEN_ISCSI_VERSION}.${build_arch}.rpm" > else > Vlad, It seems that commit 553e284ffb2f380dc8d1451bfb3ad40165f04112 in ofed_1_2_scripts.git is different from the patch that I submitted. For example: My patch: @@ -135,7 +135,7 @@ IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES # Iser # Currently iSER is supported only on SLES10 & RHEL5 case ${K_VER} in - 2.6.16.*-*-*|2.6.*.el5) + 2.6.16.*-*-*|2.6.*.el5|2.6.9-*.EL*) IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES} ib_iser" ;; esac patch applied in ofed_1_2_scripts.git: @@ -135,7 +135,7 @@ IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES # Iser # Currently iSER is supported only on SLES10 & RHEL5 case ${K_VER} in - 2.6.16.*-*-*|2.6.*.el5) + 2.6.16.*-*-*|2.6.*.el5|2.6.9-[3-5]*.EL*) <-- this line is different IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES} ib_iser" ;; esac Why is that? Erez From vlad at dev.mellanox.co.il Wed Jun 20 02:30:02 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 20 Jun 2007 12:30:02 +0300 Subject: [ofa-general] [PATCH 1/2] IB/iser: add open-iscsi over iSER support for RHAS4 in OFED scripts In-Reply-To: <4678F106.9090508@voltaire.com> References: <4641D295.5060907@voltaire.com> <4641D32D.6030505@voltaire.com> <4678F106.9090508@voltaire.com> Message-ID: <4678F39A.1030305@dev.mellanox.co.il> > Vlad, > > It seems that commit 553e284ffb2f380dc8d1451bfb3ad40165f04112 in > ofed_1_2_scripts.git is different from the patch that I submitted. For > example: > > My patch: > > @@ -135,7 +135,7 @@ IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES > # Iser > # Currently iSER is supported only on SLES10 & RHEL5 > case ${K_VER} in > - 2.6.16.*-*-*|2.6.*.el5) > + 2.6.16.*-*-*|2.6.*.el5|2.6.9-*.EL*) > IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES} ib_iser" > ;; > esac > > > patch applied in ofed_1_2_scripts.git: > @@ -135,7 +135,7 @@ IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES > # Iser > # Currently iSER is supported only on SLES10 & RHEL5 > case ${K_VER} in > - 2.6.16.*-*-*|2.6.*.el5) > + 2.6.16.*-*-*|2.6.*.el5|2.6.9-[3-5]*.EL*) <-- this line is different > IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES} ib_iser" > ;; > esac > > Why is that? > > Erez You have added backport patches for RHEL4.0 U3, U4, U5. 2.6.9-*.EL* matches also U2. So, installation fails on RHEL 4.0 U2 with your patch. Vladimir From vlad at lists.openfabrics.org Wed Jun 20 02:45:00 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Wed, 20 Jun 2007 02:45:00 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070620-0200 daily build status Message-ID: <20070620094501.3CC52E6087B@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.16 Passed on x86_64 with linux-2.6.20 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.19 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.13 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.21.1 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From erezz at voltaire.com Wed Jun 20 04:33:59 2007 From: erezz at voltaire.com (Erez Zilber) Date: Wed, 20 Jun 2007 14:33:59 +0300 Subject: [ofa-general] [PATCH 1/2] IB/iser: add open-iscsi over iSER support for RHAS4 in OFED scripts In-Reply-To: <4678F39A.1030305@dev.mellanox.co.il> References: <4641D295.5060907@voltaire.com> <4641D32D.6030505@voltaire.com> <4678F106.9090508@voltaire.com> <4678F39A.1030305@dev.mellanox.co.il> Message-ID: <467910A7.70001@voltaire.com> Vladimir Sokolovsky wrote: >> Vlad, >> >> It seems that commit 553e284ffb2f380dc8d1451bfb3ad40165f04112 in >> ofed_1_2_scripts.git is different from the patch that I submitted. For >> example: >> >> My patch: >> >> @@ -135,7 +135,7 @@ IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES >> # Iser >> # Currently iSER is supported only on SLES10 & RHEL5 >> case ${K_VER} in >> - 2.6.16.*-*-*|2.6.*.el5) >> + 2.6.16.*-*-*|2.6.*.el5|2.6.9-*.EL*) >> IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES} ib_iser" >> ;; >> esac >> >> >> patch applied in ofed_1_2_scripts.git: >> @@ -135,7 +135,7 @@ IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES >> # Iser >> # Currently iSER is supported only on SLES10 & RHEL5 >> case ${K_VER} in >> - 2.6.16.*-*-*|2.6.*.el5) >> + 2.6.16.*-*-*|2.6.*.el5|2.6.9-[3-5]*.EL*) <-- this line is >> different >> IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES} ib_iser" >> ;; >> esac >> >> Why is that? >> >> Erez > > You have added backport patches for RHEL4.0 U3, U4, U5. > 2.6.9-*.EL* matches also U2. So, installation fails on RHEL 4.0 U2 > with your patch. > > Vladimir You are right and I agree with your fix. Next time, just let me know if you don't apply a patch as is. Thanks, Erez From todd.rimmer at qlogic.com Wed Jun 20 05:54:51 2007 From: todd.rimmer at qlogic.com (Todd Rimmer) Date: Wed, 20 Jun 2007 07:54:51 -0500 Subject: [ofa-general] Patches to complib In-Reply-To: <1182290419.15653.242651.camel@hal.voltaire.com> Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE06119291112@EPEXCH2.qlogic.org> Hal, Attached is a diff with 2 fixes to complib. The first is one I sent you yesterday (reset count in qmap on remove_all). The second corrects the same bug in fleximap. Patches are against main branch, however this code is the same in OFED 1.2 as well. Todd Rimmer Chief Architect QLogic System Interconnect Group Voice: 610-233-4852 Fax: 610-233-4777 Todd.Rimmer at QLogic.com www.QLogic.com -------------- next part -------------- A non-text attachment was scrubbed... Name: cl_maps_count.diff Type: application/octet-stream Size: 1006 bytes Desc: cl_maps_count.diff URL: From halr at voltaire.com Wed Jun 20 06:33:13 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Jun 2007 09:33:13 -0400 Subject: [ofa-general] Re: Patches to complib In-Reply-To: <4FB1BCCAE6CAED44A1DC005B1DE06119291112@EPEXCH2.qlogic.org> References: <4FB1BCCAE6CAED44A1DC005B1DE06119291112@EPEXCH2.qlogic.org> Message-ID: <1182346392.15653.305841.camel@hal.voltaire.com> Todd, On Wed, 2007-06-20 at 08:54, Todd Rimmer wrote: > Hal, > > Attached is a diff with 2 fixes to complib. The first is one I sent you > yesterday (reset count in qmap on remove_all). The second corrects the > same bug in fleximap. > > Patches are against main branch, however this code is the same in OFED > 1.2 as well. These patches appear to be against ofed_1_2 but they did apply to master. This may cause an issue in the future but perhaps not for complib changes. Thanks. Applied (to master only). In the future, please also include your S-O-B line: Signed-off-by: Todd Rimmer Also, patches are supposed to be submitted as inline text rather than attachments. -- Hal > Todd Rimmer > Chief Architect > QLogic System Interconnect Group > Voice: 610-233-4852 Fax: 610-233-4777 > Todd.Rimmer at QLogic.com www.QLogic.com From HNGUYEN at de.ibm.com Wed Jun 20 06:38:11 2007 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Wed, 20 Jun 2007 15:38:11 +0200 Subject: [ofa-general] Re: [ewg] Anouncement: OFED 1.2 rc6 is avilable In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9015636A9@mtlexch01.mtl.com> Message-ID: Hello Tziporet! In the attached release notes I see under "1.2 Supported Platforms and Operating Systems" this: - RedHat EL5: 2.6.9-42.ELsmp which should be 2.6.18-8.el5 according to my "uname -r" on a rhel5 system. Mit freundlichen Gruessen/Kind Regards Hoang-Nam Nguyen IBM Deutschland Entwicklung GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschaeftsfuehrung: Herbert Kircher Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 ewg-bounces at lists.openfabrics.org wrote on 19.06.2007 16:47:43: > > Hi, > > OFED 1.2-RC6 is available on > http://www.openfabrics.org/builds/ofed-1.2/ > File: OFED-1.2-rc6.tgz > To get BUILD_ID run ofed_info > > Please report any issues in bugzilla https://bugs.openfabrics.org/ > > The GA release is expected this Friday (June 22) > > I attach the OFED RN - please review and send me comments to the final > release > > Thanks, > Tziporet > > ======================================================================== > > Release information: > > OS support: > Novell: > - SLES 9.0 SP3 > - SLES10 > - SLES10 SP1 RC5 > Redhat: > - Redhat EL4 up3, up4 and up5 > - Redhat EL5 > kernel.org: > - 2.6.20 > - 2.6.19 > > Note: Kernel 2.6.21, Fedora C6 and SuSE Pro 10 are not part of the > official list. > We keep the backport patches for these OSes and make sure OFED compile > and loaded properly but will not do full QA cycle. > > Systems: > * x86_64 > * x86 > * ia64 > * ppc64 > > Main changes from OFED-1.1-rc5: > =============================== > 1. Fixed 6 bugs (see attached for fixed issues) > > See bugzilla for all open issues. > > Tasks that should be completed for the GA release: > 1. Complete all documentation (release notes, README, etc.) > 2. Run all QA tests on all platforms > [attachment "rc6_fixed_bugs.csv" deleted by Hoang-Nam > Nguyen/Germany/IBM] [attachment "OFED_release_notes.txt" deleted by > Hoang-Nam Nguyen/Germany/IBM] _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5203 bytes Desc: S/MIME Cryptographic Signature URL: From todd.rimmer at qlogic.com Wed Jun 20 07:25:00 2007 From: todd.rimmer at qlogic.com (Todd Rimmer) Date: Wed, 20 Jun 2007 09:25:00 -0500 Subject: [ofa-general] RE: Patches to complib In-Reply-To: <1182346392.15653.305841.camel@hal.voltaire.com> Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE06119291131@EPEXCH2.qlogic.org> This adds get_next functions to the various maps (flexi, quick and map). get_next searches for the 1st entry whose key is > the key specified. As such get_next provides for searches where an exact key is not known, or the map may be changing between searches (and hence the key of a previously fetched entry may no longer be in the map). This patch was generated against OFED 1.2, however I have diffed the affected files and the files in the master are identical. Signed-off-by: Todd Rimmer diff -r -c orig2/osm/complib/cl_map.c fixed/osm/complib/cl_map.c *** orig2/osm/complib/cl_map.c Wed Jun 20 08:57:45 2007 --- fixed/osm/complib/cl_map.c Wed Jun 20 09:41:55 2007 *************** *** 268,273 **** --- 268,300 ---- return( p_item ); } + cl_map_item_t* + cl_qmap_get_next( + IN const cl_qmap_t* const p_map, + IN const uint64_t key ) + { + cl_map_item_t *p_item; + cl_map_item_t *p_item_found; + + CL_ASSERT( p_map ); + CL_ASSERT( p_map->state == CL_INITIALIZED ); + + p_item = __cl_map_root( p_map ); + p_item_found = (cl_map_item_t*)&p_map->nil; + + while( p_item != &p_map->nil ) + { + if( key < p_item->key ){ + p_item_found = p_item; + p_item = p_item->p_left; + }else{ + p_item = p_item->p_right; + } + } + + return( p_item_found ); + } + void cl_qmap_apply_func( IN const cl_qmap_t* const p_map, *************** *** 832,837 **** --- 859,881 ---- return( cl_qmap_obj( PARENT_STRUCT( p_item, cl_map_obj_t, item ) ) ); } + void* + cl_map_get_next( + IN const cl_map_t* const p_map, + IN const uint64_t key ) + { + cl_map_item_t *p_item; + + CL_ASSERT( p_map ); + + p_item = cl_qmap_get_next( &p_map->qmap, key ); + + if( p_item == cl_qmap_end( &p_map->qmap ) ) + return( NULL ); + + return( cl_qmap_obj( PARENT_STRUCT( p_item, cl_map_obj_t, item ) ) ); + } + void cl_map_remove_item( IN cl_map_t* const p_map, *************** *** 1279,1284 **** --- 1323,1358 ---- return( p_item ); } + cl_fmap_item_t* + cl_fmap_get_next( + IN const cl_fmap_t* const p_map, + IN const void* const p_key ) + { + cl_fmap_item_t *p_item; + cl_fmap_item_t *p_item_found; + intn_t cmp; + + CL_ASSERT( p_map ); + CL_ASSERT( p_map->state == CL_INITIALIZED ); + + p_item = __cl_fmap_root( p_map ); + p_item_found = (cl_fmap_item_t*)&p_map->nil; + + while( p_item != &p_map->nil ) + { + cmp = p_map->pfn_compare( p_key, p_item->p_key ); + + if( cmp < 0 ){ + p_item_found = p_item; + p_item = p_item->p_left; /* too small */ + }else{ + p_item = p_item->p_right; /* too big or match */ + } + } + + return( p_item_found ); + } + void cl_fmap_apply_func( IN const cl_fmap_t* const p_map, diff -r -c orig2/osm/include/complib/cl_fleximap.h fixed/osm/include/complib/cl_fleximap.h *** orig2/osm/include/complib/cl_fleximap.h Wed Jun 20 08:57:45 2007 --- fixed/osm/include/complib/cl_fleximap.h Wed Jun 20 09:30:30 2007 *************** *** 100,106 **** * * Manipulation: * cl_fmap_insert, cl_fmap_get, cl_fmap_remove_item, cl_fmap_remove, ! * cl_fmap_remove_all, cl_fmap_merge, cl_fmap_delta * * Search: * cl_fmap_apply_func --- 100,106 ---- * * Manipulation: * cl_fmap_insert, cl_fmap_get, cl_fmap_remove_item, cl_fmap_remove, ! * cl_fmap_remove_all, cl_fmap_merge, cl_fmap_delta, cl_fmap_get_next * * Search: * cl_fmap_apply_func *************** *** 672,678 **** * cl_fmap_get does not remove the item from the flexi map. * * SEE ALSO ! * Flexi Map, cl_fmap_remove *********/ /****f* Component Library: Flexi Map/cl_fmap_remove_item --- 672,714 ---- * cl_fmap_get does not remove the item from the flexi map. * * SEE ALSO ! * Flexi Map, cl_fmap_remove, cl_fmap_get_next ! *********/ ! ! /****f* Component Library: Flexi Map/cl_fmap_get_next ! * NAME ! * cl_fmap_get_next ! * ! * DESCRIPTION ! * The cl_fmap_get_next function returns the first map item associated with a ! * key > the key specified. ! * ! * SYNOPSIS ! */ ! cl_fmap_item_t* ! cl_fmap_get_next( ! IN const cl_fmap_t* const p_map, ! IN const void* const p_key ); ! /* ! * PARAMETERS ! * p_map ! * [in] Pointer to a cl_fmap_t structure from which to retrieve the ! * item with the specified key. ! * ! * p_key ! * [in] Pointer to a key value used to search for the desired map item. ! * ! * RETURN VALUES ! * Pointer to the first map item with a key > the desired key value. ! * ! * Pointer to the map end if there was no item with a key > the desired key ! * value stored in the flexi map. ! * ! * NOTES ! * cl_fmap_get_next does not remove the item from the flexi map. ! * ! * SEE ALSO ! * Flexi Map, cl_fmap_remove, cl_fmap_get *********/ /****f* Component Library: Flexi Map/cl_fmap_remove_item diff -r -c orig2/osm/include/complib/cl_map.h fixed/osm/include/complib/cl_map.h *** orig2/osm/include/complib/cl_map.h Wed Jun 20 08:57:45 2007 --- fixed/osm/include/complib/cl_map.h Wed Jun 20 09:30:51 2007 *************** *** 96,102 **** * * Manipulation * cl_map_insert, cl_map_get, cl_map_remove_item, cl_map_remove, ! * cl_map_remove_all, cl_map_merge, cl_map_delta * * Attributes: * cl_map_count, cl_is_map_empty, cl_is_map_inited --- 96,102 ---- * * Manipulation * cl_map_insert, cl_map_get, cl_map_remove_item, cl_map_remove, ! * cl_map_remove_all, cl_map_merge, cl_map_delta, cl_map_get_next * * Attributes: * cl_map_count, cl_is_map_empty, cl_is_map_inited *************** *** 628,634 **** * cl_map_get does not remove the item from the map. * * SEE ALSO ! * Map, cl_map_remove *********/ /****f* Component Library: Map/cl_map_remove_item --- 628,670 ---- * cl_map_get does not remove the item from the map. * * SEE ALSO ! * Map, cl_map_remove, cl_map_get_next ! *********/ ! ! /****f* Component Library: Map/cl_map_get_next ! * NAME ! * cl_map_get_next ! * ! * DESCRIPTION ! * The cl_qmap_get_next function returns the first object associated with a ! * key > the key specified. ! * ! * SYNOPSIS ! */ ! void* ! cl_map_get_next( ! IN const cl_map_t* const p_map, ! IN const uint64_t key ); ! /* ! * PARAMETERS ! * p_map ! * [in] Pointer to a map from which to retrieve the object with ! * the specified key. ! * ! * key ! * [in] Key value used to search for the desired object. ! * ! * RETURN VALUES ! * Pointer to the first object with a key > the desired key value. ! * ! * NULL if there was no item with a key > the desired key ! * value stored in the map. ! * ! * NOTES ! * cl_map_get does not remove the item from the map. ! * ! * SEE ALSO ! * Map, cl_map_remove, cl_map_get *********/ /****f* Component Library: Map/cl_map_remove_item diff -r -c orig2/osm/include/complib/cl_qmap.h fixed/osm/include/complib/cl_qmap.h *** orig2/osm/include/complib/cl_qmap.h Wed Jun 20 08:57:45 2007 --- fixed/osm/include/complib/cl_qmap.h Wed Jun 20 09:43:19 2007 *************** *** 98,104 **** * * Manipulation: * cl_qmap_insert, cl_qmap_get, cl_qmap_remove_item, cl_qmap_remove, ! * cl_qmap_remove_all, cl_qmap_merge, cl_qmap_delta * * Search: * cl_qmap_apply_func --- 98,104 ---- * * Manipulation: * cl_qmap_insert, cl_qmap_get, cl_qmap_remove_item, cl_qmap_remove, ! * cl_qmap_remove_all, cl_qmap_merge, cl_qmap_delta, cl_qmap_get_next * * Search: * cl_qmap_apply_func *************** *** 749,755 **** * cl_qmap_get does not remove the item from the quick map. * * SEE ALSO ! * Quick Map, cl_qmap_remove *********/ /****f* Component Library: Quick Map/cl_qmap_remove_item --- 749,791 ---- * cl_qmap_get does not remove the item from the quick map. * * SEE ALSO ! * Quick Map, cl_qmap_get_next, cl_qmap_remove ! *********/ ! ! /****f* Component Library: Quick Map/cl_qmap_get_next ! * NAME ! * cl_qmap_get_next ! * ! * DESCRIPTION ! * The cl_qmap_get_next function returns the first map item associated with a ! * key > the key specified. ! * ! * SYNOPSIS ! */ ! cl_map_item_t* ! cl_qmap_get_next( ! IN const cl_qmap_t* const p_map, ! IN const uint64_t key ); ! /* ! * PARAMETERS ! * p_map ! * [in] Pointer to a cl_qmap_t structure from which to retrieve the ! * first item with a key > the specified key. ! * ! * key ! * [in] Key value used to search for the desired map item. ! * ! * RETURN VALUES ! * Pointer to the first map item with a key > the desired key value. ! * ! * Pointer to the map end if there was no item with a key > the desired key ! * value stored in the quick map. ! * ! * NOTES ! * cl_qmap_get_next does not remove the item from the quick map. ! * ! * SEE ALSO ! * Quick Map, cl_qmap_get, cl_qmap_remove *********/ /****f* Component Library: Quick Map/cl_qmap_remove_item Todd Rimmer Chief Architect QLogic System Interconnect Group Voice: 610-233-4852 Fax: 610-233-4777 Todd.Rimmer at QLogic.com www.QLogic.com From halr at voltaire.com Wed Jun 20 07:50:23 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Jun 2007 10:50:23 -0400 Subject: [ofa-general] RE: Patches to complib In-Reply-To: <4FB1BCCAE6CAED44A1DC005B1DE06119291131@EPEXCH2.qlogic.org> References: <4FB1BCCAE6CAED44A1DC005B1DE06119291131@EPEXCH2.qlogic.org> Message-ID: <1182351021.15653.310989.camel@hal.voltaire.com> On Wed, 2007-06-20 at 10:25, Todd Rimmer wrote: > This adds get_next functions to the various maps (flexi, quick and map). > > > get_next searches for the 1st entry whose key is > the key specified. > > As such get_next provides for searches where an exact key is not known, > or the map may be changing between searches (and hence the key of a > previously fetched entry may no longer be in the map). Looks like a nice functionality addition. > This patch was generated against OFED 1.2, however I have diffed the > affected files and the files in the master are identical. > > Signed-off-by: Todd Rimmer Your mailer may be munging this patch: |diff -r -c orig2/osm/complib/cl_map.c fixed/osm/complib/cl_map.c |*** orig2/osm/complib/cl_map.c Wed Jun 20 08:57:45 2007 |--- fixed/osm/complib/cl_map.c Wed Jun 20 09:41:55 2007 -------------------------- File to patch: complib/cl_map.c patching file complib/cl_map.c patch: **** malformed patch at line 95: ) ); -- Hal > diff -r -c orig2/osm/complib/cl_map.c fixed/osm/complib/cl_map.c > *** orig2/osm/complib/cl_map.c Wed Jun 20 08:57:45 2007 > --- fixed/osm/complib/cl_map.c Wed Jun 20 09:41:55 2007 > *************** > *** 268,273 **** > --- 268,300 ---- > return( p_item ); > } > > + cl_map_item_t* > + cl_qmap_get_next( > + IN const cl_qmap_t* const p_map, > + IN const uint64_t key ) > + { > + cl_map_item_t *p_item; > + cl_map_item_t *p_item_found; > + > + CL_ASSERT( p_map ); > + CL_ASSERT( p_map->state == CL_INITIALIZED ); > + > + p_item = __cl_map_root( p_map ); > + p_item_found = (cl_map_item_t*)&p_map->nil; > + > + while( p_item != &p_map->nil ) > + { > + if( key < p_item->key ){ > + p_item_found = p_item; > + p_item = p_item->p_left; > + }else{ > + p_item = p_item->p_right; > + } > + } > + > + return( p_item_found ); > + } > + > void > cl_qmap_apply_func( > IN const cl_qmap_t* const p_map, > *************** > *** 832,837 **** > --- 859,881 ---- > return( cl_qmap_obj( PARENT_STRUCT( p_item, cl_map_obj_t, item ) > ) ); > } > > + void* > + cl_map_get_next( > + IN const cl_map_t* const p_map, > + IN const uint64_t key ) > + { > + cl_map_item_t *p_item; > + > + CL_ASSERT( p_map ); > + > + p_item = cl_qmap_get_next( &p_map->qmap, key ); > + > + if( p_item == cl_qmap_end( &p_map->qmap ) ) > + return( NULL ); > + > + return( cl_qmap_obj( PARENT_STRUCT( p_item, cl_map_obj_t, item ) > ) ); > + } > + > void > cl_map_remove_item( > IN cl_map_t* const p_map, > *************** > *** 1279,1284 **** > --- 1323,1358 ---- > return( p_item ); > } > > + cl_fmap_item_t* > + cl_fmap_get_next( > + IN const cl_fmap_t* const p_map, > + IN const void* const p_key ) > + { > + cl_fmap_item_t *p_item; > + cl_fmap_item_t *p_item_found; > + intn_t cmp; > + > + CL_ASSERT( p_map ); > + CL_ASSERT( p_map->state == CL_INITIALIZED ); > + > + p_item = __cl_fmap_root( p_map ); > + p_item_found = (cl_fmap_item_t*)&p_map->nil; > + > + while( p_item != &p_map->nil ) > + { > + cmp = p_map->pfn_compare( p_key, p_item->p_key ); > + > + if( cmp < 0 ){ > + p_item_found = p_item; > + p_item = p_item->p_left; /* too small */ > + }else{ > + p_item = p_item->p_right; /* too big or > match */ > + } > + } > + > + return( p_item_found ); > + } > + > void > cl_fmap_apply_func( > IN const cl_fmap_t* const p_map, > diff -r -c orig2/osm/include/complib/cl_fleximap.h > fixed/osm/include/complib/cl_fleximap.h > *** orig2/osm/include/complib/cl_fleximap.h Wed Jun 20 08:57:45 2007 > --- fixed/osm/include/complib/cl_fleximap.h Wed Jun 20 09:30:30 2007 > *************** > *** 100,106 **** > * > * Manipulation: > * cl_fmap_insert, cl_fmap_get, cl_fmap_remove_item, > cl_fmap_remove, > ! * cl_fmap_remove_all, cl_fmap_merge, cl_fmap_delta > * > * Search: > * cl_fmap_apply_func > --- 100,106 ---- > * > * Manipulation: > * cl_fmap_insert, cl_fmap_get, cl_fmap_remove_item, > cl_fmap_remove, > ! * cl_fmap_remove_all, cl_fmap_merge, cl_fmap_delta, > cl_fmap_get_next > * > * Search: > * cl_fmap_apply_func > *************** > *** 672,678 **** > * cl_fmap_get does not remove the item from the flexi map. > * > * SEE ALSO > ! * Flexi Map, cl_fmap_remove > *********/ > > /****f* Component Library: Flexi Map/cl_fmap_remove_item > --- 672,714 ---- > * cl_fmap_get does not remove the item from the flexi map. > * > * SEE ALSO > ! * Flexi Map, cl_fmap_remove, cl_fmap_get_next > ! *********/ > ! > ! /****f* Component Library: Flexi Map/cl_fmap_get_next > ! * NAME > ! * cl_fmap_get_next > ! * > ! * DESCRIPTION > ! * The cl_fmap_get_next function returns the first map item > associated with a > ! * key > the key specified. > ! * > ! * SYNOPSIS > ! */ > ! cl_fmap_item_t* > ! cl_fmap_get_next( > ! IN const cl_fmap_t* const p_map, > ! IN const void* const p_key ); > ! /* > ! * PARAMETERS > ! * p_map > ! * [in] Pointer to a cl_fmap_t structure from which to > retrieve the > ! * item with the specified key. > ! * > ! * p_key > ! * [in] Pointer to a key value used to search for the > desired map item. > ! * > ! * RETURN VALUES > ! * Pointer to the first map item with a key > the desired key > value. > ! * > ! * Pointer to the map end if there was no item with a key > the > desired key > ! * value stored in the flexi map. > ! * > ! * NOTES > ! * cl_fmap_get_next does not remove the item from the flexi map. > ! * > ! * SEE ALSO > ! * Flexi Map, cl_fmap_remove, cl_fmap_get > *********/ > > /****f* Component Library: Flexi Map/cl_fmap_remove_item > diff -r -c orig2/osm/include/complib/cl_map.h > fixed/osm/include/complib/cl_map.h > *** orig2/osm/include/complib/cl_map.h Wed Jun 20 08:57:45 2007 > --- fixed/osm/include/complib/cl_map.h Wed Jun 20 09:30:51 2007 > *************** > *** 96,102 **** > * > * Manipulation > * cl_map_insert, cl_map_get, cl_map_remove_item, > cl_map_remove, > ! * cl_map_remove_all, cl_map_merge, cl_map_delta > * > * Attributes: > * cl_map_count, cl_is_map_empty, cl_is_map_inited > --- 96,102 ---- > * > * Manipulation > * cl_map_insert, cl_map_get, cl_map_remove_item, > cl_map_remove, > ! * cl_map_remove_all, cl_map_merge, cl_map_delta, > cl_map_get_next > * > * Attributes: > * cl_map_count, cl_is_map_empty, cl_is_map_inited > *************** > *** 628,634 **** > * cl_map_get does not remove the item from the map. > * > * SEE ALSO > ! * Map, cl_map_remove > *********/ > > /****f* Component Library: Map/cl_map_remove_item > --- 628,670 ---- > * cl_map_get does not remove the item from the map. > * > * SEE ALSO > ! * Map, cl_map_remove, cl_map_get_next > ! *********/ > ! > ! /****f* Component Library: Map/cl_map_get_next > ! * NAME > ! * cl_map_get_next > ! * > ! * DESCRIPTION > ! * The cl_qmap_get_next function returns the first object > associated with a > ! * key > the key specified. > ! * > ! * SYNOPSIS > ! */ > ! void* > ! cl_map_get_next( > ! IN const cl_map_t* const p_map, > ! IN const uint64_t key ); > ! /* > ! * PARAMETERS > ! * p_map > ! * [in] Pointer to a map from which to retrieve the object > with > ! * the specified key. > ! * > ! * key > ! * [in] Key value used to search for the desired object. > ! * > ! * RETURN VALUES > ! * Pointer to the first object with a key > the desired key value. > ! * > ! * NULL if there was no item with a key > the desired key > ! * value stored in the map. > ! * > ! * NOTES > ! * cl_map_get does not remove the item from the map. > ! * > ! * SEE ALSO > ! * Map, cl_map_remove, cl_map_get > *********/ > > /****f* Component Library: Map/cl_map_remove_item > diff -r -c orig2/osm/include/complib/cl_qmap.h > fixed/osm/include/complib/cl_qmap.h > *** orig2/osm/include/complib/cl_qmap.h Wed Jun 20 08:57:45 2007 > --- fixed/osm/include/complib/cl_qmap.h Wed Jun 20 09:43:19 2007 > *************** > *** 98,104 **** > * > * Manipulation: > * cl_qmap_insert, cl_qmap_get, cl_qmap_remove_item, > cl_qmap_remove, > ! * cl_qmap_remove_all, cl_qmap_merge, cl_qmap_delta > * > * Search: > * cl_qmap_apply_func > --- 98,104 ---- > * > * Manipulation: > * cl_qmap_insert, cl_qmap_get, cl_qmap_remove_item, > cl_qmap_remove, > ! * cl_qmap_remove_all, cl_qmap_merge, cl_qmap_delta, > cl_qmap_get_next > * > * Search: > * cl_qmap_apply_func > *************** > *** 749,755 **** > * cl_qmap_get does not remove the item from the quick map. > * > * SEE ALSO > ! * Quick Map, cl_qmap_remove > *********/ > > /****f* Component Library: Quick Map/cl_qmap_remove_item > --- 749,791 ---- > * cl_qmap_get does not remove the item from the quick map. > * > * SEE ALSO > ! * Quick Map, cl_qmap_get_next, cl_qmap_remove > ! *********/ > ! > ! /****f* Component Library: Quick Map/cl_qmap_get_next > ! * NAME > ! * cl_qmap_get_next > ! * > ! * DESCRIPTION > ! * The cl_qmap_get_next function returns the first map item > associated with a > ! * key > the key specified. > ! * > ! * SYNOPSIS > ! */ > ! cl_map_item_t* > ! cl_qmap_get_next( > ! IN const cl_qmap_t* const p_map, > ! IN const uint64_t key ); > ! /* > ! * PARAMETERS > ! * p_map > ! * [in] Pointer to a cl_qmap_t structure from which to > retrieve the > ! * first item with a key > the specified key. > ! * > ! * key > ! * [in] Key value used to search for the desired map item. > ! * > ! * RETURN VALUES > ! * Pointer to the first map item with a key > the desired key > value. > ! * > ! * Pointer to the map end if there was no item with a key > the > desired key > ! * value stored in the quick map. > ! * > ! * NOTES > ! * cl_qmap_get_next does not remove the item from the quick map. > ! * > ! * SEE ALSO > ! * Quick Map, cl_qmap_get, cl_qmap_remove > *********/ > > /****f* Component Library: Quick Map/cl_qmap_remove_item > > Todd Rimmer > Chief Architect > QLogic System Interconnect Group > Voice: 610-233-4852 Fax: 610-233-4777 > Todd.Rimmer at QLogic.com www.QLogic.com From todd.rimmer at qlogic.com Wed Jun 20 07:55:45 2007 From: todd.rimmer at qlogic.com (Todd Rimmer) Date: Wed, 20 Jun 2007 09:55:45 -0500 Subject: [ofa-general] RE: Patches to complib In-Reply-To: <1182351021.15653.310989.camel@hal.voltaire.com> Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE0611929113F@EPEXCH2.qlogic.org> > From: Hal Rosenstock [mailto:halr at voltaire.com] > Your mailer may be munging this patch: Here it is as an attachment, I'll try to see in the future if other mailer options correct the munging issue. Todd Rimmer Chief Architect QLogic System Interconnect Group Voice: 610-233-4852 Fax: 610-233-4777 Todd.Rimmer at QLogic.com www.QLogic.com -------------- next part -------------- A non-text attachment was scrubbed... Name: cl_maps_getnext.diff Type: application/octet-stream Size: 8176 bytes Desc: cl_maps_getnext.diff URL: From isaac at clusterfs.com Wed Jun 20 08:01:59 2007 From: isaac at clusterfs.com (Isaac Huang) Date: Wed, 20 Jun 2007 23:01:59 +0800 Subject: [ofa-general] a possible bug in drivers/infiniband/hw/mthca/mthca_qp.c Message-ID: <20070620150159.GA5628@clusterfs.com> Hi, I don't understand the code but it doesn't look alright: static void to_ib_ah_attr(struct mthca_dev *dev, struct ib_ah_attr *ib_ah_attr, struct mthca_qp_path *path) { memset(ib_ah_attr, 0, sizeof *path); I think it shall be 'sizeof *ib_ah_attr' instead. Please CC me - I'm not on this list. Thanks, Isaac From rdreier at cisco.com Wed Jun 20 08:06:58 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 20 Jun 2007 08:06:58 -0700 Subject: [ofa-general] a possible bug in drivers/infiniband/hw/mthca/mthca_qp.c In-Reply-To: <20070620150159.GA5628@clusterfs.com> (Isaac Huang's message of "Wed, 20 Jun 2007 23:01:59 +0800") References: <20070620150159.GA5628@clusterfs.com> Message-ID: > static void to_ib_ah_attr(struct mthca_dev *dev, struct ib_ah_attr *ib_ah_attr, > struct mthca_qp_path *path) > { > memset(ib_ah_attr, 0, sizeof *path); > > I think it shall be 'sizeof *ib_ah_attr' instead. Please CC me - I'm > not on this list. Yes, you're right, but what source are you looking at? The fix went into the kernel with commit 99d4f22e in 2.6.21-rc1, back in February. - R. From Thomas.Talpey at netapp.com Wed Jun 20 08:21:46 2007 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Wed, 20 Jun 2007 11:21:46 -0400 Subject: [ofa-general] why netwoked file system(e.g. nfs, pvfs, etc.) supported IB by using access layer (linux kernel ib ops) In-Reply-To: <4678D3E5.706@ncic.ac.cn> References: <4678D3E5.706@ncic.ac.cn> Message-ID: At 03:14 AM 6/20/2007, ncic wrote: >why didn't they support ib with sdp? There are two main answers. The first is licensing. SDP licensing is wrapped up in a Microsoft intellectual property issue, this has prevented its inclusion in some kernels, including Linux. So, upper layers cannot depend in its presence. The second, speaking for NFS at least, is performance. SDP relies heavily on additional setup exchanges and RDMA Read for transparency, these negatively impact performance. With minimal additional work, the same unmodified upper layer NFS filesystem code can use native RDMA exchanges via the RPC layer and achieve truly excellent performance. Check out Helen Chen's presentation from the recent Sonoma workshop. In the NFS case, the protocol is on a standards track and published in the IETF (I'm the primary author), I'm hopeful that the edits I'm currently preparing for publication will be finalized around the July meeting. And, we have complete implementations of both client and server in both Linux and OpenSolaris. For transparent mode, don't discount ordinary sockets over a connected mode IPoIB approach. The performance is very good, and provides a fully transparent solution to all upper layers. RDMA is better though, by (greatly) reducing overhead. Tom. From isaac at clusterfs.com Wed Jun 20 08:40:47 2007 From: isaac at clusterfs.com (Isaac Huang) Date: Wed, 20 Jun 2007 23:40:47 +0800 Subject: [ofa-general] a possible bug in drivers/infiniband/hw/mthca/mthca_qp.c In-Reply-To: References: <20070620150159.GA5628@clusterfs.com> Message-ID: <20070620154047.GB5628@clusterfs.com> On Wed, Jun 20, 2007 at 08:06:58AM -0700, Roland Dreier wrote: > > static void to_ib_ah_attr(struct mthca_dev *dev, struct ib_ah_attr *ib_ah_attr, > > struct mthca_qp_path *path) > > { > > memset(ib_ah_attr, 0, sizeof *path); > > > > I think it shall be 'sizeof *ib_ah_attr' instead. Please CC me - I'm > > not on this list. > > Yes, you're right, but what source are you looking at? The fix went > into the kernel with commit 99d4f22e in 2.6.21-rc1, back in February. > I stumbled upon that in OFED 1.1, then I looked somewhere in the mist of openfabrics git trees, maybe I checked the wrong branch; sorry. Isaac From halr at voltaire.com Wed Jun 20 09:12:10 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Jun 2007 12:12:10 -0400 Subject: [ofa-general] RE: Patches to complib In-Reply-To: <4FB1BCCAE6CAED44A1DC005B1DE0611929113F@EPEXCH2.qlogic.org> References: <4FB1BCCAE6CAED44A1DC005B1DE0611929113F@EPEXCH2.qlogic.org> Message-ID: <1182355925.15653.316439.camel@hal.voltaire.com> On Wed, 2007-06-20 at 10:55, Todd Rimmer wrote: > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Your mailer may be munging this patch: > > Here it is as an attachment, I'll try to see in the future if other > mailer options correct the munging issue. Yes, that works better so it was your mailer. I also added your new get_next map functions to global symbols in the complib map. Thanks. Applied (to master only). -- Hal > Todd Rimmer > Chief Architect > QLogic System Interconnect Group > Voice: 610-233-4852 Fax: 610-233-4777 > Todd.Rimmer at QLogic.com www.QLogic.com From sashak at voltaire.com Wed Jun 20 09:06:15 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 20 Jun 2007 19:06:15 +0300 Subject: [ofa-general] RE: Patches to complib In-Reply-To: <4FB1BCCAE6CAED44A1DC005B1DE06119291131@EPEXCH2.qlogic.org> References: <4FB1BCCAE6CAED44A1DC005B1DE06119291131@EPEXCH2.qlogic.org> Message-ID: <1182355575.30285.18.camel@localhost> Hi Todd, On Wed, 2007-06-20 at 09:25 -0500, Todd Rimmer wrote: > This adds get_next functions to the various maps (flexi, quick and map). > > > get_next searches for the 1st entry whose key is > the key specified. What about cleaner names? Maybe something like get_next_higher() or just get_higher()? > As such get_next provides for searches where an exact key is not known, > or the map may be changing between searches (and hence the key of a > previously fetched entry may no longer be in the map). Just wondering, where those new functions are supposed to be used? Sasha > > This patch was generated against OFED 1.2, however I have diffed the > affected files and the files in the master are identical. > > Signed-off-by: Todd Rimmer > > diff -r -c orig2/osm/complib/cl_map.c fixed/osm/complib/cl_map.c > *** orig2/osm/complib/cl_map.c Wed Jun 20 08:57:45 2007 > --- fixed/osm/complib/cl_map.c Wed Jun 20 09:41:55 2007 > *************** > *** 268,273 **** > --- 268,300 ---- > return( p_item ); > } > > + cl_map_item_t* > + cl_qmap_get_next( > + IN const cl_qmap_t* const p_map, > + IN const uint64_t key ) > + { > + cl_map_item_t *p_item; > + cl_map_item_t *p_item_found; > + > + CL_ASSERT( p_map ); > + CL_ASSERT( p_map->state == CL_INITIALIZED ); > + > + p_item = __cl_map_root( p_map ); > + p_item_found = (cl_map_item_t*)&p_map->nil; > + > + while( p_item != &p_map->nil ) > + { > + if( key < p_item->key ){ > + p_item_found = p_item; > + p_item = p_item->p_left; > + }else{ > + p_item = p_item->p_right; > + } > + } > + > + return( p_item_found ); > + } > + > void > cl_qmap_apply_func( > IN const cl_qmap_t* const p_map, > *************** > *** 832,837 **** > --- 859,881 ---- > return( cl_qmap_obj( PARENT_STRUCT( p_item, cl_map_obj_t, item ) > ) ); > } > > + void* > + cl_map_get_next( > + IN const cl_map_t* const p_map, > + IN const uint64_t key ) > + { > + cl_map_item_t *p_item; > + > + CL_ASSERT( p_map ); > + > + p_item = cl_qmap_get_next( &p_map->qmap, key ); > + > + if( p_item == cl_qmap_end( &p_map->qmap ) ) > + return( NULL ); > + > + return( cl_qmap_obj( PARENT_STRUCT( p_item, cl_map_obj_t, item ) > ) ); > + } > + > void > cl_map_remove_item( > IN cl_map_t* const p_map, > *************** > *** 1279,1284 **** > --- 1323,1358 ---- > return( p_item ); > } > > + cl_fmap_item_t* > + cl_fmap_get_next( > + IN const cl_fmap_t* const p_map, > + IN const void* const p_key ) > + { > + cl_fmap_item_t *p_item; > + cl_fmap_item_t *p_item_found; > + intn_t cmp; > + > + CL_ASSERT( p_map ); > + CL_ASSERT( p_map->state == CL_INITIALIZED ); > + > + p_item = __cl_fmap_root( p_map ); > + p_item_found = (cl_fmap_item_t*)&p_map->nil; > + > + while( p_item != &p_map->nil ) > + { > + cmp = p_map->pfn_compare( p_key, p_item->p_key ); > + > + if( cmp < 0 ){ > + p_item_found = p_item; > + p_item = p_item->p_left; /* too small */ > + }else{ > + p_item = p_item->p_right; /* too big or > match */ > + } > + } > + > + return( p_item_found ); > + } > + > void > cl_fmap_apply_func( > IN const cl_fmap_t* const p_map, > diff -r -c orig2/osm/include/complib/cl_fleximap.h > fixed/osm/include/complib/cl_fleximap.h > *** orig2/osm/include/complib/cl_fleximap.h Wed Jun 20 08:57:45 2007 > --- fixed/osm/include/complib/cl_fleximap.h Wed Jun 20 09:30:30 2007 > *************** > *** 100,106 **** > * > * Manipulation: > * cl_fmap_insert, cl_fmap_get, cl_fmap_remove_item, > cl_fmap_remove, > ! * cl_fmap_remove_all, cl_fmap_merge, cl_fmap_delta > * > * Search: > * cl_fmap_apply_func > --- 100,106 ---- > * > * Manipulation: > * cl_fmap_insert, cl_fmap_get, cl_fmap_remove_item, > cl_fmap_remove, > ! * cl_fmap_remove_all, cl_fmap_merge, cl_fmap_delta, > cl_fmap_get_next > * > * Search: > * cl_fmap_apply_func > *************** > *** 672,678 **** > * cl_fmap_get does not remove the item from the flexi map. > * > * SEE ALSO > ! * Flexi Map, cl_fmap_remove > *********/ > > /****f* Component Library: Flexi Map/cl_fmap_remove_item > --- 672,714 ---- > * cl_fmap_get does not remove the item from the flexi map. > * > * SEE ALSO > ! * Flexi Map, cl_fmap_remove, cl_fmap_get_next > ! *********/ > ! > ! /****f* Component Library: Flexi Map/cl_fmap_get_next > ! * NAME > ! * cl_fmap_get_next > ! * > ! * DESCRIPTION > ! * The cl_fmap_get_next function returns the first map item > associated with a > ! * key > the key specified. > ! * > ! * SYNOPSIS > ! */ > ! cl_fmap_item_t* > ! cl_fmap_get_next( > ! IN const cl_fmap_t* const p_map, > ! IN const void* const p_key ); > ! /* > ! * PARAMETERS > ! * p_map > ! * [in] Pointer to a cl_fmap_t structure from which to > retrieve the > ! * item with the specified key. > ! * > ! * p_key > ! * [in] Pointer to a key value used to search for the > desired map item. > ! * > ! * RETURN VALUES > ! * Pointer to the first map item with a key > the desired key > value. > ! * > ! * Pointer to the map end if there was no item with a key > the > desired key > ! * value stored in the flexi map. > ! * > ! * NOTES > ! * cl_fmap_get_next does not remove the item from the flexi map. > ! * > ! * SEE ALSO > ! * Flexi Map, cl_fmap_remove, cl_fmap_get > *********/ > > /****f* Component Library: Flexi Map/cl_fmap_remove_item > diff -r -c orig2/osm/include/complib/cl_map.h > fixed/osm/include/complib/cl_map.h > *** orig2/osm/include/complib/cl_map.h Wed Jun 20 08:57:45 2007 > --- fixed/osm/include/complib/cl_map.h Wed Jun 20 09:30:51 2007 > *************** > *** 96,102 **** > * > * Manipulation > * cl_map_insert, cl_map_get, cl_map_remove_item, > cl_map_remove, > ! * cl_map_remove_all, cl_map_merge, cl_map_delta > * > * Attributes: > * cl_map_count, cl_is_map_empty, cl_is_map_inited > --- 96,102 ---- > * > * Manipulation > * cl_map_insert, cl_map_get, cl_map_remove_item, > cl_map_remove, > ! * cl_map_remove_all, cl_map_merge, cl_map_delta, > cl_map_get_next > * > * Attributes: > * cl_map_count, cl_is_map_empty, cl_is_map_inited > *************** > *** 628,634 **** > * cl_map_get does not remove the item from the map. > * > * SEE ALSO > ! * Map, cl_map_remove > *********/ > > /****f* Component Library: Map/cl_map_remove_item > --- 628,670 ---- > * cl_map_get does not remove the item from the map. > * > * SEE ALSO > ! * Map, cl_map_remove, cl_map_get_next > ! *********/ > ! > ! /****f* Component Library: Map/cl_map_get_next > ! * NAME > ! * cl_map_get_next > ! * > ! * DESCRIPTION > ! * The cl_qmap_get_next function returns the first object > associated with a > ! * key > the key specified. > ! * > ! * SYNOPSIS > ! */ > ! void* > ! cl_map_get_next( > ! IN const cl_map_t* const p_map, > ! IN const uint64_t key ); > ! /* > ! * PARAMETERS > ! * p_map > ! * [in] Pointer to a map from which to retrieve the object > with > ! * the specified key. > ! * > ! * key > ! * [in] Key value used to search for the desired object. > ! * > ! * RETURN VALUES > ! * Pointer to the first object with a key > the desired key value. > ! * > ! * NULL if there was no item with a key > the desired key > ! * value stored in the map. > ! * > ! * NOTES > ! * cl_map_get does not remove the item from the map. > ! * > ! * SEE ALSO > ! * Map, cl_map_remove, cl_map_get > *********/ > > /****f* Component Library: Map/cl_map_remove_item > diff -r -c orig2/osm/include/complib/cl_qmap.h > fixed/osm/include/complib/cl_qmap.h > *** orig2/osm/include/complib/cl_qmap.h Wed Jun 20 08:57:45 2007 > --- fixed/osm/include/complib/cl_qmap.h Wed Jun 20 09:43:19 2007 > *************** > *** 98,104 **** > * > * Manipulation: > * cl_qmap_insert, cl_qmap_get, cl_qmap_remove_item, > cl_qmap_remove, > ! * cl_qmap_remove_all, cl_qmap_merge, cl_qmap_delta > * > * Search: > * cl_qmap_apply_func > --- 98,104 ---- > * > * Manipulation: > * cl_qmap_insert, cl_qmap_get, cl_qmap_remove_item, > cl_qmap_remove, > ! * cl_qmap_remove_all, cl_qmap_merge, cl_qmap_delta, > cl_qmap_get_next > * > * Search: > * cl_qmap_apply_func > *************** > *** 749,755 **** > * cl_qmap_get does not remove the item from the quick map. > * > * SEE ALSO > ! * Quick Map, cl_qmap_remove > *********/ > > /****f* Component Library: Quick Map/cl_qmap_remove_item > --- 749,791 ---- > * cl_qmap_get does not remove the item from the quick map. > * > * SEE ALSO > ! * Quick Map, cl_qmap_get_next, cl_qmap_remove > ! *********/ > ! > ! /****f* Component Library: Quick Map/cl_qmap_get_next > ! * NAME > ! * cl_qmap_get_next > ! * > ! * DESCRIPTION > ! * The cl_qmap_get_next function returns the first map item > associated with a > ! * key > the key specified. > ! * > ! * SYNOPSIS > ! */ > ! cl_map_item_t* > ! cl_qmap_get_next( > ! IN const cl_qmap_t* const p_map, > ! IN const uint64_t key ); > ! /* > ! * PARAMETERS > ! * p_map > ! * [in] Pointer to a cl_qmap_t structure from which to > retrieve the > ! * first item with a key > the specified key. > ! * > ! * key > ! * [in] Key value used to search for the desired map item. > ! * > ! * RETURN VALUES > ! * Pointer to the first map item with a key > the desired key > value. > ! * > ! * Pointer to the map end if there was no item with a key > the > desired key > ! * value stored in the quick map. > ! * > ! * NOTES > ! * cl_qmap_get_next does not remove the item from the quick map. > ! * > ! * SEE ALSO > ! * Quick Map, cl_qmap_get, cl_qmap_remove > *********/ > > /****f* Component Library: Quick Map/cl_qmap_remove_item > > Todd Rimmer > Chief Architect > QLogic System Interconnect Group > Voice: 610-233-4852 Fax: 610-233-4777 > Todd.Rimmer at QLogic.com www.QLogic.com > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at dev.mellanox.co.il Wed Jun 20 09:22:15 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Wed, 20 Jun 2007 19:22:15 +0300 Subject: [ofa-general] [PATCH for-2.6.22] ipoib/cm: fix interoperability when mtu don't match Message-ID: <20070620162215.GF6006@mellanox.co.il> IoIB/CM currently rejects a connection unless the supported mtu is >= device mtu. This breaks interoperability with implementations that might have tweaked IPOIB_CM_MTU, and there's real no longer a reason to do so: this is a left-over from time when we did not tweak mtu per-connection. Fix this by making the test as permissive as possible. Signed-off-by: Michael S. Tsirkin --- Roland, this is an *obviously* safe fix and has important interoperability implications. I think while not a crasher, it's appropriate for 2.6.22. Do you agree? diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index c64249f..1fe7f66 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -759,9 +759,8 @@ static int ipoib_cm_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even p->mtu = be32_to_cpu(data->mtu); - if (p->mtu < priv->dev->mtu + IPOIB_ENCAP_LEN) { - ipoib_warn(priv, "Rejecting connection: mtu %d < device mtu %d + 4\n", - p->mtu, priv->dev->mtu); + if (p->mtu <= IPOIB_ENCAP_LEN) { + ipoib_warn(priv, "Rejecting connection: mtu %d <= 4\n", p->mtu); return -EINVAL; } -- MST From Frank.Leers at Sun.COM Wed Jun 20 10:01:24 2007 From: Frank.Leers at Sun.COM (Frank Leers) Date: Wed, 20 Jun 2007 10:01:24 -0700 Subject: [ofa-general] don't want to rebuild all rpm's from install.sh In-Reply-To: <4678CA7E.9090200@dev.mellanox.co.il> References: <1182297614.1774.30.camel@localhost> <4678CA7E.9090200@dev.mellanox.co.il> Message-ID: <1182358884.1273.6.camel@localhost> On Wed, 2007-06-20 at 09:34 +0300, Vladimir Sokolovsky wrote: > Frank Leers wrote: > > If I understand the Installation Guide doc correctly I should be able to > > just install rpm's using the install.sh script without rebuilding the > > rpm's. I have built the rpm's successfully and installed them on a node > > in my cluster via an NFS mount. I'd now like to install the rest of my > > nodes using './install.sh -c <> -net <>' but this results in a rebuild > > of the rpm's all over again. > > > Yes, > It should work this way if all of the nodes have the same Arch/OS/kernel. > Can you send me the ofed.conf file (that you use after '-c' parameter), > the output of the './install.sh -c <> -net <>' command and > Arch/OS/kernel of your nodes. > Arch/OS/kernel > Thanks, > Vladimir Ah, I see where I was misguided then. My build node kernel is different than this particular compute node. I'll need to build seperately for each Arch/OS/kernel. Is OS differentiated between RH/CentOS/Fedora or is there a way to build once for all three if Arch and kernel are the otherwise the same? thanks, -frank From vlad at dev.mellanox.co.il Wed Jun 20 10:35:58 2007 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 20 Jun 2007 20:35:58 +0300 Subject: [ewg] Re: [ofa-general] don't want to rebuild all rpm's from install.sh In-Reply-To: <1182358884.1273.6.camel@localhost> References: <1182297614.1774.30.camel@localhost> <4678CA7E.9090200@dev.mellanox.co.il> <1182358884.1273.6.camel@localhost> Message-ID: <4679657E.10000@dev.mellanox.co.il> Frank Leers wrote: > On Wed, 2007-06-20 at 09:34 +0300, Vladimir Sokolovsky wrote: >> Frank Leers wrote: >>> If I understand the Installation Guide doc correctly I should be able to >>> just install rpm's using the install.sh script without rebuilding the >>> rpm's. I have built the rpm's successfully and installed them on a node >>> in my cluster via an NFS mount. I'd now like to install the rest of my >>> nodes using './install.sh -c <> -net <>' but this results in a rebuild >>> of the rpm's all over again. >>> >> Yes, >> It should work this way if all of the nodes have the same Arch/OS/kernel. >> Can you send me the ofed.conf file (that you use after '-c' parameter), >> the output of the './install.sh -c <> -net <>' command and >> Arch/OS/kernel of your nodes. >> Arch/OS/kernel >> Thanks, >> Vladimir > > Ah, I see where I was misguided then. My build node kernel is different > than this particular compute node. I'll need to build seperately for > each Arch/OS/kernel. > > > Is OS differentiated between RH/CentOS/Fedora or is there a way to build > once for all three if Arch and kernel are the otherwise the same? > OFED stores created RPMs under OFED-1.2-xx/RPMS/$(rpm -qf /etc/issue) If the kernel version and $(rpm -qf /etc/issue) are the same on RH/CentOS/Fedora (which is probably not) then you can build once. But if you will install RPMs manually and not with OFED's install.sh script then it should work (for userspace RPMs only). The kernel-ib RPMs you should build separately for each kernel. Regards, Vladimir From becker at nas.nasa.gov Wed Jun 20 10:44:54 2007 From: becker at nas.nasa.gov (Jeff Becker) Date: Wed, 20 Jun 2007 10:44:54 -0700 Subject: [ofa-general] backups Message-ID: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com> Hi. I've started backing up the git trees and the web content using rsync. John Companies gave us a 10G NFS partition for this. I've done two backups and there's only 800M left. Also, I haven't backed up the daily builds yet. I was told we could get more space for one dollar per GB per month. Depending on the budget, we should increase this backup space. How should we proceed? Thanks. -jeff From rdreier at cisco.com Wed Jun 20 10:58:34 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 20 Jun 2007 10:58:34 -0700 Subject: [ofa-general] backups In-Reply-To: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com> (Jeff Becker's message of "Wed, 20 Jun 2007 10:44:54 -0700") References: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com> Message-ID: > Hi. I've started backing up the git trees and the web content using > rsync. John Companies gave us a 10G NFS partition for this. I've done > two backups and there's only 800M left. Also, I haven't backed up the > daily builds yet. I was told we could get more space for one dollar > per GB per month. Depending on the budget, we should increase this > backup space. How should we proceed? Thanks. Where is all the space going? A full kernel git tree (with more than two years of history) takes less than 150 MB of storage for me. How are we using up so much space? Also, FWIW, amazon S3 is $0.15 / GB-month + $0.10 for each GB transferred in. Of course it's probably a lot less convenient to back up to. - R. From becker at nas.nasa.gov Wed Jun 20 11:32:03 2007 From: becker at nas.nasa.gov (Jeff Becker) Date: Wed, 20 Jun 2007 11:32:03 -0700 Subject: [ofa-general] backups In-Reply-To: References: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com> Message-ID: <795c49870706201132r1f7633f8r2cf3cb2a71edc6e0@mail.gmail.com> I'm backing up /data/pub/scm. A quick "du -chL" shows it to be 4.2G. Perhaps I only need to backup a subset of /data/pub/scm? Thanks. -jeff On 6/20/07, Roland Dreier wrote: > > Hi. I've started backing up the git trees and the web content using > > rsync. John Companies gave us a 10G NFS partition for this. I've done > > two backups and there's only 800M left. Also, I haven't backed up the > > daily builds yet. I was told we could get more space for one dollar > > per GB per month. Depending on the budget, we should increase this > > backup space. How should we proceed? Thanks. > > Where is all the space going? A full kernel git tree (with more than > two years of history) takes less than 150 MB of storage for me. How > are we using up so much space? > > Also, FWIW, amazon S3 is $0.15 / GB-month + $0.10 for each GB > transferred in. Of course it's probably a lot less convenient to back > up to. > > - R. > From todd.rimmer at qlogic.com Wed Jun 20 12:02:26 2007 From: todd.rimmer at qlogic.com (Todd Rimmer) Date: Wed, 20 Jun 2007 14:02:26 -0500 Subject: [ofa-general] RE: Patches to complib In-Reply-To: <1182355575.30285.18.camel@localhost> Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE0611929118A@EPEXCH2.qlogic.org> > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > What about cleaner names? Maybe something like get_next_higher() or just > get_higher()? The name comes from the fact for a key already in the list, it was equivalent to: p = cl_qmap_get(map, key) p = cl_qmap_next(p) However it also handles the case where the key was no longer in the list or where the starting point is a key which may have never been in the list. This makes it very useful for situations like: lock list p = cl_qmap_head() process p k = p's key unlock list do some other stuff lock list p = cl_qmap_get_next(..., k) process p k = p's key unlock list .... Another example use might be a map keyed by GUIDs and a query to find all devices from a given vendor, in which case get_next could be used to start the search. We added this capability to our internal equivalent of complib a few years ago and found a lot of uses for it. So I thought it would be a simple yet powerful capability to add to OFED complib. Todd Rimmer Chief Architect QLogic System Interconnect Group Voice: 610-233-4852 Fax: 610-233-4777 Todd.Rimmer at QLogic.com www.QLogic.com From rdreier at cisco.com Wed Jun 20 13:36:36 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 20 Jun 2007 13:36:36 -0700 Subject: [ofa-general] Re: [PATCH 1 of 2] net-mlx4: Show board_id string in sysfs under the pci device In-Reply-To: <200706191641.52831.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Tue, 19 Jun 2007 16:41:52 +0300") References: <200706191641.52831.jackm@dev.mellanox.co.il> Message-ID: > Show the board_id string in sysfs under the pci device (not under the infiniband > device, as with other HCAs). ConnectX will also have an enet device (which will > not be under the infiniband class) and users of this device must also have > access to the board_id string. > > This requires a small modification in the libibverbs example "ibv_devinfo"; the app > must also look under the pci device for the board_id if it does not find it > directly under the infiniband device. Maybe it would be cleaner to have the IB device create a symlink back to the main board_id file, so we don't have to change userspace? (I haven't looked at how easy this would be to do) - R. From rdreier at cisco.com Wed Jun 20 13:39:30 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 20 Jun 2007 13:39:30 -0700 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <4675EFA4.5050209@linux.vnet.ibm.com> (Pradeep Satyanarayana's message of "Sun, 17 Jun 2007 19:36:20 -0700") References: <466F36C8.5010507@linux.vnet.ibm.com> <20070613163821.GB12277@mellanox.co.il> <20070613174930.GE12277@mellanox.co.il> <46716F3D.7050206@ichips.intel.com> <20070614175030.GB29561@mellanox.co.il> <4671C541.4040503@linux.vnet.ibm.com> <20070615051846.GG2207@mellanox.co.il> <4672C0DC.8060308@linux.vnet.ibm.com> <20070616192702.GM2207@mellanox.co.il> <4675EFA4.5050209@linux.vnet.ibm.com> Message-ID: > This approach would be a regression; no guarantees that anything else > would be better. > > As Bernard King-Smith said changing to a different approach (mid-stream) > is not the right thing to do. Hang on -- the whole reason we're having this discussion is because not everyone agrees with the approach you've taken. Unfortunately, just because you've put a lot of effort into your patch, it's still incumbent on us to do the right thing, even if it means starting over. I've been quite busy lately but I should have some time to look more deeply at this in the next week or so. - R. From rdreier at cisco.com Wed Jun 20 13:40:03 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 20 Jun 2007 13:40:03 -0700 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: (Bernard King-Smith's message of "Fri, 15 Jun 2007 17:04:16 -0400") References: Message-ID: > We are already running with the non-SRQ patch here and the results are > very good. Changing to a different approach is not the right thing to do > at this time. Why not, if a different approach is better? - R. From rdreier at cisco.com Wed Jun 20 13:43:13 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 20 Jun 2007 13:43:13 -0700 Subject: [ofa-general] [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> (Sean Hefty's message of "Fri, 15 Jun 2007 09:34:55 -0700") References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> Message-ID: > -#define IB_USER_MAD_ABI_VERSION 5 > +#define IB_USER_MAD_ABI_VERSION 6 Bummer -- we've been able to keep the ABI stable for almost 2 years now. I wonder if there's something clever we can do to avoid breaking existing apps? - R. From rdreier at cisco.com Wed Jun 20 13:47:55 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 20 Jun 2007 13:47:55 -0700 Subject: [ofa-general] Re: [PATCH] IB/ipath -- changes in for-roland for 2.6.23 In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> (Arthur Jones's message of "Tue, 19 Jun 2007 16:40:30 -0700") References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: > which is based on the kernel.org linux-2.6 tree. > i wasn't sure if i should spam the list with all > the patches, as they are avail via the git server > above. how would you like that done in the future? I definitely think all the patches need to be sent out to the list at least once so people get a chance to review, so you did the right think. But I don't see a MAINTAINERS update (it still lists Bryan O'Sullivan, support at pathscale.com and openib.org for the ipath driver). Also I don't see fixes for the smp_mb__after_clear_bit bug pointed out by BenH or the bug of setting both _PAGE_NO_CACHE and _PAGE_WRITETHRU on powerpc pointed out by paulus. Anyway I'll look over the rest and queue for 2.6.23 if it looks good. - R. From rdreier at cisco.com Wed Jun 20 13:55:14 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 20 Jun 2007 13:55:14 -0700 Subject: [ofa-general] Re: [PATCH 15/28] IB/ipath - add barrier before updating WC head in shared memory In-Reply-To: <20070619234156.3794.26440.stgit@bauxite.internal.keyresearch.com> (Arthur Jones's message of "Tue, 19 Jun 2007 16:41:57 -0700") References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234156.3794.26440.stgit@bauxite.internal.keyresearch.com> Message-ID: > wc->queue[head].port_num = entry->port_num; > + wmb(); > wc->head = next; Please add comments explaining these barriers... maybe something like /* Make sure queue entry contents are visible before head index update */ also I notice that the latest libibipathverbs git tree (which hasn't been touched for 3 months) does not seem to have the analogous read memory barrier when polling CQ contents. - R. From rdreier at cisco.com Wed Jun 20 14:00:27 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 20 Jun 2007 14:00:27 -0700 Subject: [ofa-general] Re: [PATCH 24/28] IB/ipath - ipath_poll fixups and enhancements In-Reply-To: <20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com> (Arthur Jones's message of "Tue, 19 Jun 2007 16:42:52 -0700") References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com> Message-ID: > + tail = *(volatile u64 *)pd->port_rcvhdrtail_kvaddr; Why is there a volatile here? cf http://lwn.net/Articles/234017/ ("volatile considered harmful") - R. From halr at voltaire.com Wed Jun 20 14:01:21 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Jun 2007 17:01:21 -0400 Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support In-Reply-To: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com> References: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com> Message-ID: <1182373280.15653.335513.camel@hal.voltaire.com> On Fri, 2007-06-15 at 12:59, Sean Hefty wrote: > Allow sending MADs on different partitions. This requires kernel support, > so requires an ABI bump. This patch maintains support for the previous > ABI. > > Clarify that umad_set_pkey() takes a pkey index, and not the pkey itself. > (Unfortunately, the call is used both ways in the management tree.) This works well in all the combinatorials I tested (user_mad ABIs, libibumad and libvendor versions). Just two things: 1. It might be better if the ABI version 5 warning message for only pkey_index 0 being supported comes out at umad_init time rather than umad_set_pkey time so that the user is not swamped with these. 2. There is one pathological combination. It would be using 2.6.23 (with the new user_mad ABI version 6), an updated libibumad would be required, but an older libvendor (osm_vendor_ibumad.c without your one line change). That might be the case with someone who swapped back and forth between OFED 1.2 and master in some scenarios. Also, this does not quite work as expected. An error was returned based on the bad pkey index but I do see a send on the IB link (with a bad pkey). I wouldn't have expected the latter part. Maybe this is a driver or firmware issue. Not sure yet. I suppose there should be some pkey_index validation (to make sure it is within the device's valid range) and that should also ultimately get added to libibumad or should such validation go into the user_mad kernel module ? -- Hal > Signed-off-by: Sean Hefty > --- > Additional changes are needed to retrieve the PKey and GID tables, so that > the PKeys and GIDs can be converted to the correct index. These will come > in future patches. > > > doc/libibumad.txt | 2 > libibumad/include/infiniband/umad.h | 7 + > libibumad/src/umad.c | 192 +++++++++++++++++++++++++++-------- > 3 files changed, 156 insertions(+), 45 deletions(-) > > diff --git a/doc/libibumad.txt b/doc/libibumad.txt > index 7b2b4f4..4e37e60 100644 > --- a/doc/libibumad.txt > +++ b/doc/libibumad.txt > @@ -336,7 +336,7 @@ the given host ordered fields. Return 0 on success, -1 on errors. > umad_set_pkey: > > Synopsis: > - int umad_set_pkey(void *umad, int pkey); > + int umad_set_pkey(void *umad, int pkey_index); > > Description: Set the pkey within the 'umad' buffer. Return 0 on success, > -1 on errors. > diff --git a/libibumad/include/infiniband/umad.h b/libibumad/include/infiniband/umad.h > old mode 100644 > new mode 100755 > index 9020649..9369d95 > --- a/libibumad/include/infiniband/umad.h > +++ b/libibumad/include/infiniband/umad.h > @@ -60,6 +60,8 @@ typedef struct ib_mad_addr { > uint8_t traffic_class; > uint8_t gid[16]; > uint32_t flow_label; > + uint16_t pkey_index; > + uint8_t reserved[6]; > } ib_mad_addr_t; > > typedef struct ib_user_mad { > @@ -72,7 +74,8 @@ typedef struct ib_user_mad { > uint8_t data[0]; > } ib_user_mad_t; > > -#define IB_UMAD_ABI_VERSION 5 > +#define IB_UMAD_MIN_ABI_VERSION 5 > +#define IB_UMAD_MAX_ABI_VERSION 6 > #define IB_UMAD_ABI_DIR "/sys/class/infiniband_mad" > #define IB_UMAD_ABI_FILE "abi_version" > > @@ -167,7 +170,7 @@ int umad_set_grh_net(void *umad, void *mad_addr); > int umad_set_grh(void *umad, void *mad_addr); > int umad_set_addr_net(void *umad, int dlid, int dqp, int sl, int qkey); > int umad_set_addr(void *umad, int dlid, int dqp, int sl, int qkey); > -int umad_set_pkey(void *umad, int pkey); > +int umad_set_pkey(void *umad, int pkey_index); > > int umad_send(int portid, int agentid, void *umad, int length, > int timeout_ms, int retries); > diff --git a/libibumad/src/umad.c b/libibumad/src/umad.c > old mode 100644 > new mode 100755 > index 5f9b36b..c750fe0 > --- a/libibumad/src/umad.c > +++ b/libibumad/src/umad.c > @@ -69,6 +69,7 @@ int umaddebug = 0; > #define UMAD_DEV_NAME_SZ 32 > #define UMAD_DEV_FILE_SZ 256 > > +static uint abi_version; > static char *def_ca_name = "mthca0"; > static int def_ca_port = 1; > > @@ -82,6 +83,31 @@ typedef struct Port { > > static Port ports[UMAD_MAX_PORTS]; > > +typedef struct ib_mad_addr_abi_5 { > + uint32_t qpn; > + uint32_t qkey; > + uint16_t lid; > + uint8_t sl; > + uint8_t path_bits; > + uint8_t grh_present; > + uint8_t gid_index; > + uint8_t hop_limit; > + uint8_t traffic_class; > + uint8_t gid[16]; > + uint32_t flow_label; > +} ib_mad_addr_abi_5_t; > + > +typedef struct ib_user_mad_abi_5 { > + uint32_t agent_id; > + uint32_t status; > + uint32_t timeout_ms; > + uint32_t retries; > + uint32_t length; > + ib_mad_addr_abi_5_t addr; > + uint8_t data[0]; > +} ib_user_mad_abi_5_t; > + > + > /************************************* > * Port > */ > @@ -463,6 +489,101 @@ dev_to_umad_id(char *dev, uint port) > return -1; /* not found */ > } > > +static int > +write_data(int fd, void *data, int size) > +{ > + int n; > + > + n = write(fd, data, size); > + if (n != size) { > + DEBUG("write returned %d != sizeof mad data %d (%m)", n, size); > + if (!errno) > + errno = EIO; > + return -EIO; > + } > + > + return 0; > +} > + > +static int > +write_abi_5(int fd, struct ib_user_mad *mad, int length) > +{ > + struct ib_user_mad_abi_5 *umad_5; > + int n; > + > + n = sizeof *umad_5 + length; > + umad_5 = malloc(n); > + if (!umad_5) { > + errno = ENOMEM; > + return -ENOMEM; > + } > + > + memcpy(umad_5, mad, sizeof *umad_5); > + memcpy(umad_5->data, mad->data, length); > + > + n = write_data(fd, umad_5, n); > + free(umad_5); > + return n; > +} > + > +static int > +read_data(int fd, void *data, int size, int *length) > +{ > + struct ib_user_mad *mad = data; > + int n, umad_size; > + > + umad_size = size - *length; > + > + n = read(fd, data, size); > + if ((n >= 0) && (n <= size)) { > + DEBUG("mad received by agent %d length %d", mad->agent_id, n); > + if (n > umad_size) > + *length = n - umad_size; > + else > + *length = 0; > + return mad->agent_id; > + } > + > + if (n == -EWOULDBLOCK) { > + if (!errno) > + errno = EWOULDBLOCK; > + return n; > + } > + > + DEBUG("read returned %zu > sizeof mad %zu (%m)", > + mad->length - umad_size, *length); > + > + *length = mad->length - umad_size; > + if (!errno) > + errno = EIO; > + return -errno; > +} > + > +static int > +read_abi_5(int fd, void *umad, int *length) > +{ > + struct ib_user_mad *mad = umad; > + struct ib_user_mad_abi_5 *umad_5; > + int n; > + > + n = sizeof *umad_5 + *length; > + umad_5 = malloc(n); > + if (!umad_5) { > + errno = EINVAL; > + return -EINVAL; > + } > + > + n = read_data(fd, umad_5, n, length); > + if (n >= 0) { > + memcpy(mad, umad_5, sizeof *umad_5); > + mad->addr.pkey_index = 0; > + memcpy(mad->data, umad_5->data, *length); > + } > + > + free(umad_5); > + return n; > +} > + > /******************************* > * Public interface > */ > @@ -470,17 +591,19 @@ dev_to_umad_id(char *dev, uint port) > int > umad_init(void) > { > - uint abi_version; > - > TRACE("umad_init"); > if (sys_read_uint(IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, &abi_version) < 0) { > IBWARN("can't read ABI version from %s/%s (%m): is ib_umad module loaded?", > IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE); > return -1; > } > - if (abi_version != IB_UMAD_ABI_VERSION) { > - IBWARN("wrong ABI version: %s/%s is %d but library ABI is %d", > - IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, abi_version, IB_UMAD_ABI_VERSION); > + > + if (abi_version < IB_UMAD_MIN_ABI_VERSION || > + abi_version > IB_UMAD_MAX_ABI_VERSION) { > + IBWARN("wrong ABI version: %s/%s is %d but library ABI " > + "supports %d through %d", > + IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, abi_version, > + IB_UMAD_MIN_ABI_VERSION, IB_UMAD_MAX_ABI_VERSION); > return -1; > } > return 0; > @@ -699,11 +822,16 @@ umad_set_grh(void *umad, void *mad_addr) > } > > int > -umad_set_pkey(void *umad, int pkey) > +umad_set_pkey(void *umad, int pkey_index) > { > -#if 0 > - mad->addr.pkey = 0; /* FIXME - PKEY support */ > -#endif > + struct ib_user_mad *mad = umad; > + > + if (abi_version == 5 && pkey_index != 0) { > + IBWARN("umad_set_pkey: ABI 5 only supports pkey_index 0\n"); > + return -EINVAL; > + } > + > + mad->addr.pkey_index = pkey_index; > return 0; > } > > @@ -761,15 +889,12 @@ umad_send(int portid, int agentid, void *umad, int length, > if (umaddebug > 1) > umad_dump(mad); > > - n = write(port->dev_fd, mad, length + sizeof *mad); > - if (n == length + sizeof *mad) > - return 0; > + if (abi_version == 5) > + n = write_abi_5(port->dev_fd, mad, length); > + else > + n = write_data(port->dev_fd, mad, sizeof *mad + length); > > - DEBUG("write returned %d != sizeof umad %zu + length %d (%m)", > - n, sizeof *mad, length); > - if (!errno) > - errno = EIO; > - return -EIO; > + return n; > } > > static int > @@ -793,7 +918,6 @@ dev_poll(int fd, int timeout_ms) > int > umad_recv(int portid, void *umad, int *length, int timeout_ms) > { > - struct ib_user_mad *mad = umad; > Port *port; > int n; > > @@ -817,29 +941,13 @@ umad_recv(int portid, void *umad, int *length, int timeout_ms) > return n; > } > > - n = read(port->dev_fd, umad, sizeof *mad + *length); > - if ((n >= 0) && (n <= sizeof *mad + *length)) { > - DEBUG("mad received by agent %d length %d", mad->agent_id, n); > - if (n > sizeof *mad) > - *length = n - sizeof *mad; > - else > - *length = 0; > - return mad->agent_id; > - } > - > - if (n == -EWOULDBLOCK) { > - if (!errno) > - errno = EWOULDBLOCK; > - return n; > - } > - > - DEBUG("read returned %zu > sizeof umad %zu + length %d (%m)", > - mad->length - sizeof *mad, sizeof *mad, *length); > + if (abi_version == 5) > + n = read_abi_5(port->dev_fd, umad, length); > + else > + n = read_data(port->dev_fd, umad, > + sizeof(struct ib_user_mad) + *length, length); > > - *length = mad->length - sizeof *mad; > - if (!errno) > - errno = EIO; > - return -errno; > + return n; > } > > int > @@ -996,10 +1104,10 @@ umad_addr_dump(ib_mad_addr_t *addr) > gid_str[i*2] = 0; > IBWARN("qpn %d qkey 0x%x lid 0x%x sl %d\n" > "grh_present %d gid_index %d hop_limit %d traffic_class %d flow_label 0x%x\n" > - "Gid 0x%s", > + "Gid 0x%s pkey_index %d", > ntohl(addr->qpn), ntohl(addr->qkey), ntohs(addr->lid), addr->sl, > addr->grh_present, (int)addr->gid_index, (int)addr->hop_limit, > - (int)addr->traffic_class, addr->flow_label, gid_str); > + (int)addr->traffic_class, addr->flow_label, gid_str, addr->pkey_index); > } > > void > From mshefty at ichips.intel.com Wed Jun 20 14:06:12 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 20 Jun 2007 14:06:12 -0700 Subject: [ofa-general] [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> Message-ID: <467996C4.1060201@ichips.intel.com> Roland Dreier wrote: > > -#define IB_USER_MAD_ABI_VERSION 5 > > +#define IB_USER_MAD_ABI_VERSION 6 > > Bummer -- we've been able to keep the ABI stable for almost 2 years > now. I wonder if there's something clever we can do to avoid breaking > existing apps? Did you have something in mind? (new ioctl? re-using existing fields?) Not all fields are used for both reads and writes. E.g. status is unused on a write, and retries is unused on a read. Storing the pkey_index on a read seems doable. I think if we do anything on a write, we need to make an assumption that the data is currently set to 0 by the app. - Sean From halr at voltaire.com Wed Jun 20 14:11:21 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Jun 2007 17:11:21 -0400 Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support In-Reply-To: <1182373280.15653.335513.camel@hal.voltaire.com> References: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com> <1182373280.15653.335513.camel@hal.voltaire.com> Message-ID: <1182373879.15653.336204.camel@hal.voltaire.com> On Wed, 2007-06-20 at 17:01, Hal Rosenstock wrote: > On Fri, 2007-06-15 at 12:59, Sean Hefty wrote: > > Allow sending MADs on different partitions. This requires kernel support, > > so requires an ABI bump. This patch maintains support for the previous > > ABI. > > > > Clarify that umad_set_pkey() takes a pkey index, and not the pkey itself. > > (Unfortunately, the call is used both ways in the management tree.) > > This works well in all the combinatorials I tested (user_mad ABIs, > libibumad and libvendor versions). > > Just two things: > 1. It might be better if the ABI version 5 warning message for only > pkey_index 0 being supported comes out at umad_init time rather than > umad_set_pkey time so that the user is not swamped with these. > > 2. There is one pathological combination. It would be using 2.6.23 (with > the new user_mad ABI version 6), an updated libibumad would be required, > but an older libvendor (osm_vendor_ibumad.c without your one line > change). That might be the case with someone who swapped back and forth > between OFED 1.2 and master in some scenarios. This begs the question as to whether your one line change to osm_vendor_ibumad.c should be made to the OFED 1.2 version as well. -- Hal > Also, this does not quite work as expected. An error was returned based > on the bad pkey index but I do see a send on the IB link (with a bad > pkey). I wouldn't have expected the latter part. Maybe this is a driver > or firmware issue. Not sure yet. I suppose there should be some > pkey_index validation (to make sure it is within the device's valid > range) and that should also ultimately get added to libibumad or should > such validation go into the user_mad kernel module ? > > -- Hal > > > Signed-off-by: Sean Hefty > > --- > > Additional changes are needed to retrieve the PKey and GID tables, so that > > the PKeys and GIDs can be converted to the correct index. These will come > > in future patches. > > > > > > doc/libibumad.txt | 2 > > libibumad/include/infiniband/umad.h | 7 + > > libibumad/src/umad.c | 192 +++++++++++++++++++++++++++-------- > > 3 files changed, 156 insertions(+), 45 deletions(-) > > > > diff --git a/doc/libibumad.txt b/doc/libibumad.txt > > index 7b2b4f4..4e37e60 100644 > > --- a/doc/libibumad.txt > > +++ b/doc/libibumad.txt > > @@ -336,7 +336,7 @@ the given host ordered fields. Return 0 on success, -1 on errors. > > umad_set_pkey: > > > > Synopsis: > > - int umad_set_pkey(void *umad, int pkey); > > + int umad_set_pkey(void *umad, int pkey_index); > > > > Description: Set the pkey within the 'umad' buffer. Return 0 on success, > > -1 on errors. > > diff --git a/libibumad/include/infiniband/umad.h b/libibumad/include/infiniband/umad.h > > old mode 100644 > > new mode 100755 > > index 9020649..9369d95 > > --- a/libibumad/include/infiniband/umad.h > > +++ b/libibumad/include/infiniband/umad.h > > @@ -60,6 +60,8 @@ typedef struct ib_mad_addr { > > uint8_t traffic_class; > > uint8_t gid[16]; > > uint32_t flow_label; > > + uint16_t pkey_index; > > + uint8_t reserved[6]; > > } ib_mad_addr_t; > > > > typedef struct ib_user_mad { > > @@ -72,7 +74,8 @@ typedef struct ib_user_mad { > > uint8_t data[0]; > > } ib_user_mad_t; > > > > -#define IB_UMAD_ABI_VERSION 5 > > +#define IB_UMAD_MIN_ABI_VERSION 5 > > +#define IB_UMAD_MAX_ABI_VERSION 6 > > #define IB_UMAD_ABI_DIR "/sys/class/infiniband_mad" > > #define IB_UMAD_ABI_FILE "abi_version" > > > > @@ -167,7 +170,7 @@ int umad_set_grh_net(void *umad, void *mad_addr); > > int umad_set_grh(void *umad, void *mad_addr); > > int umad_set_addr_net(void *umad, int dlid, int dqp, int sl, int qkey); > > int umad_set_addr(void *umad, int dlid, int dqp, int sl, int qkey); > > -int umad_set_pkey(void *umad, int pkey); > > +int umad_set_pkey(void *umad, int pkey_index); > > > > int umad_send(int portid, int agentid, void *umad, int length, > > int timeout_ms, int retries); > > diff --git a/libibumad/src/umad.c b/libibumad/src/umad.c > > old mode 100644 > > new mode 100755 > > index 5f9b36b..c750fe0 > > --- a/libibumad/src/umad.c > > +++ b/libibumad/src/umad.c > > @@ -69,6 +69,7 @@ int umaddebug = 0; > > #define UMAD_DEV_NAME_SZ 32 > > #define UMAD_DEV_FILE_SZ 256 > > > > +static uint abi_version; > > static char *def_ca_name = "mthca0"; > > static int def_ca_port = 1; > > > > @@ -82,6 +83,31 @@ typedef struct Port { > > > > static Port ports[UMAD_MAX_PORTS]; > > > > +typedef struct ib_mad_addr_abi_5 { > > + uint32_t qpn; > > + uint32_t qkey; > > + uint16_t lid; > > + uint8_t sl; > > + uint8_t path_bits; > > + uint8_t grh_present; > > + uint8_t gid_index; > > + uint8_t hop_limit; > > + uint8_t traffic_class; > > + uint8_t gid[16]; > > + uint32_t flow_label; > > +} ib_mad_addr_abi_5_t; > > + > > +typedef struct ib_user_mad_abi_5 { > > + uint32_t agent_id; > > + uint32_t status; > > + uint32_t timeout_ms; > > + uint32_t retries; > > + uint32_t length; > > + ib_mad_addr_abi_5_t addr; > > + uint8_t data[0]; > > +} ib_user_mad_abi_5_t; > > + > > + > > /************************************* > > * Port > > */ > > @@ -463,6 +489,101 @@ dev_to_umad_id(char *dev, uint port) > > return -1; /* not found */ > > } > > > > +static int > > +write_data(int fd, void *data, int size) > > +{ > > + int n; > > + > > + n = write(fd, data, size); > > + if (n != size) { > > + DEBUG("write returned %d != sizeof mad data %d (%m)", n, size); > > + if (!errno) > > + errno = EIO; > > + return -EIO; > > + } > > + > > + return 0; > > +} > > + > > +static int > > +write_abi_5(int fd, struct ib_user_mad *mad, int length) > > +{ > > + struct ib_user_mad_abi_5 *umad_5; > > + int n; > > + > > + n = sizeof *umad_5 + length; > > + umad_5 = malloc(n); > > + if (!umad_5) { > > + errno = ENOMEM; > > + return -ENOMEM; > > + } > > + > > + memcpy(umad_5, mad, sizeof *umad_5); > > + memcpy(umad_5->data, mad->data, length); > > + > > + n = write_data(fd, umad_5, n); > > + free(umad_5); > > + return n; > > +} > > + > > +static int > > +read_data(int fd, void *data, int size, int *length) > > +{ > > + struct ib_user_mad *mad = data; > > + int n, umad_size; > > + > > + umad_size = size - *length; > > + > > + n = read(fd, data, size); > > + if ((n >= 0) && (n <= size)) { > > + DEBUG("mad received by agent %d length %d", mad->agent_id, n); > > + if (n > umad_size) > > + *length = n - umad_size; > > + else > > + *length = 0; > > + return mad->agent_id; > > + } > > + > > + if (n == -EWOULDBLOCK) { > > + if (!errno) > > + errno = EWOULDBLOCK; > > + return n; > > + } > > + > > + DEBUG("read returned %zu > sizeof mad %zu (%m)", > > + mad->length - umad_size, *length); > > + > > + *length = mad->length - umad_size; > > + if (!errno) > > + errno = EIO; > > + return -errno; > > +} > > + > > +static int > > +read_abi_5(int fd, void *umad, int *length) > > +{ > > + struct ib_user_mad *mad = umad; > > + struct ib_user_mad_abi_5 *umad_5; > > + int n; > > + > > + n = sizeof *umad_5 + *length; > > + umad_5 = malloc(n); > > + if (!umad_5) { > > + errno = EINVAL; > > + return -EINVAL; > > + } > > + > > + n = read_data(fd, umad_5, n, length); > > + if (n >= 0) { > > + memcpy(mad, umad_5, sizeof *umad_5); > > + mad->addr.pkey_index = 0; > > + memcpy(mad->data, umad_5->data, *length); > > + } > > + > > + free(umad_5); > > + return n; > > +} > > + > > /******************************* > > * Public interface > > */ > > @@ -470,17 +591,19 @@ dev_to_umad_id(char *dev, uint port) > > int > > umad_init(void) > > { > > - uint abi_version; > > - > > TRACE("umad_init"); > > if (sys_read_uint(IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, &abi_version) < 0) { > > IBWARN("can't read ABI version from %s/%s (%m): is ib_umad module loaded?", > > IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE); > > return -1; > > } > > - if (abi_version != IB_UMAD_ABI_VERSION) { > > - IBWARN("wrong ABI version: %s/%s is %d but library ABI is %d", > > - IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, abi_version, IB_UMAD_ABI_VERSION); > > + > > + if (abi_version < IB_UMAD_MIN_ABI_VERSION || > > + abi_version > IB_UMAD_MAX_ABI_VERSION) { > > + IBWARN("wrong ABI version: %s/%s is %d but library ABI " > > + "supports %d through %d", > > + IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, abi_version, > > + IB_UMAD_MIN_ABI_VERSION, IB_UMAD_MAX_ABI_VERSION); > > return -1; > > } > > return 0; > > @@ -699,11 +822,16 @@ umad_set_grh(void *umad, void *mad_addr) > > } > > > > int > > -umad_set_pkey(void *umad, int pkey) > > +umad_set_pkey(void *umad, int pkey_index) > > { > > -#if 0 > > - mad->addr.pkey = 0; /* FIXME - PKEY support */ > > -#endif > > + struct ib_user_mad *mad = umad; > > + > > + if (abi_version == 5 && pkey_index != 0) { > > + IBWARN("umad_set_pkey: ABI 5 only supports pkey_index 0\n"); > > + return -EINVAL; > > + } > > + > > + mad->addr.pkey_index = pkey_index; > > return 0; > > } > > > > @@ -761,15 +889,12 @@ umad_send(int portid, int agentid, void *umad, int length, > > if (umaddebug > 1) > > umad_dump(mad); > > > > - n = write(port->dev_fd, mad, length + sizeof *mad); > > - if (n == length + sizeof *mad) > > - return 0; > > + if (abi_version == 5) > > + n = write_abi_5(port->dev_fd, mad, length); > > + else > > + n = write_data(port->dev_fd, mad, sizeof *mad + length); > > > > - DEBUG("write returned %d != sizeof umad %zu + length %d (%m)", > > - n, sizeof *mad, length); > > - if (!errno) > > - errno = EIO; > > - return -EIO; > > + return n; > > } > > > > static int > > @@ -793,7 +918,6 @@ dev_poll(int fd, int timeout_ms) > > int > > umad_recv(int portid, void *umad, int *length, int timeout_ms) > > { > > - struct ib_user_mad *mad = umad; > > Port *port; > > int n; > > > > @@ -817,29 +941,13 @@ umad_recv(int portid, void *umad, int *length, int timeout_ms) > > return n; > > } > > > > - n = read(port->dev_fd, umad, sizeof *mad + *length); > > - if ((n >= 0) && (n <= sizeof *mad + *length)) { > > - DEBUG("mad received by agent %d length %d", mad->agent_id, n); > > - if (n > sizeof *mad) > > - *length = n - sizeof *mad; > > - else > > - *length = 0; > > - return mad->agent_id; > > - } > > - > > - if (n == -EWOULDBLOCK) { > > - if (!errno) > > - errno = EWOULDBLOCK; > > - return n; > > - } > > - > > - DEBUG("read returned %zu > sizeof umad %zu + length %d (%m)", > > - mad->length - sizeof *mad, sizeof *mad, *length); > > + if (abi_version == 5) > > + n = read_abi_5(port->dev_fd, umad, length); > > + else > > + n = read_data(port->dev_fd, umad, > > + sizeof(struct ib_user_mad) + *length, length); > > > > - *length = mad->length - sizeof *mad; > > - if (!errno) > > - errno = EIO; > > - return -errno; > > + return n; > > } > > > > int > > @@ -996,10 +1104,10 @@ umad_addr_dump(ib_mad_addr_t *addr) > > gid_str[i*2] = 0; > > IBWARN("qpn %d qkey 0x%x lid 0x%x sl %d\n" > > "grh_present %d gid_index %d hop_limit %d traffic_class %d flow_label 0x%x\n" > > - "Gid 0x%s", > > + "Gid 0x%s pkey_index %d", > > ntohl(addr->qpn), ntohl(addr->qkey), ntohs(addr->lid), addr->sl, > > addr->grh_present, (int)addr->gid_index, (int)addr->hop_limit, > > - (int)addr->traffic_class, addr->flow_label, gid_str); > > + (int)addr->traffic_class, addr->flow_label, gid_str, addr->pkey_index); > > } > > > > void > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Wed Jun 20 14:18:38 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Jun 2007 17:18:38 -0400 Subject: [ofa-general] [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> Message-ID: <1182374317.15653.336639.camel@hal.voltaire.com> On Fri, 2007-06-15 at 12:34, Sean Hefty wrote: > In order to support multiple partitions, user_mad needs to handle > > different pkey's. PKeys must be specified by the user when sending > > and receiving MADs. This bumps the ABI. > > Signed-off-by: Sean Hefty > > --- > > If there are no objections, I will queue this patch for 2.6.23, and > request > > a pull when 2.6.23 is closer. > > > drivers/infiniband/core/user_mad.c | 5 +++-- > > include/rdma/ib_user_mad.h | 4 +++- > > 2 files changed, 6 insertions(+), 3 deletions(-) > > diff --git a/drivers/infiniband/core/user_mad.c > b/drivers/infiniband/core/user_mad.c > > index d97ded2..b0128fa 100644 > > --- a/drivers/infiniband/core/user_mad.c > > +++ b/drivers/infiniband/core/user_mad.c > > @@ -228,6 +228,7 @@ static void recv_handler(struct ib_mad_agent > *agent, > > packet->mad.hdr.lid = > cpu_to_be16(mad_recv_wc->wc->slid); > > packet->mad.hdr.sl = mad_recv_wc->wc->sl; > > packet->mad.hdr.path_bits = mad_recv_wc->wc->dlid_path_bits; > > + packet->mad.hdr.pkey_index = mad_recv_wc->wc->pkey_index; > > packet->mad.hdr.grh_present = !!(mad_recv_wc->wc->wc_flags & > IB_WC_GRH); > > if (packet->mad.hdr.grh_present) { > > struct ib_ah_attr ah_attr; > > @@ -503,8 +504,8 @@ static ssize_t ib_umad_write(struct file *filp, > const char __user *buf, > > data_len = count - sizeof (struct ib_user_mad) - hdr_len; > > packet->msg = ib_create_send_mad(agent, > > > be32_to_cpu(packet->mad.hdr.qpn), > > - 0, rmpp_active, hdr_len, > > - data_len, GFP_KERNEL); > > + packet->mad.hdr.pkey_index, > rmpp_active, > > + hdr_len, data_len, > GFP_KERNEL); > > if (IS_ERR(packet->msg)) { > > ret = PTR_ERR(packet->msg); > > goto err_ah; > > diff --git a/include/rdma/ib_user_mad.h b/include/rdma/ib_user_mad.h > > index d66b15e..e7bf6fa 100644 > > --- a/include/rdma/ib_user_mad.h > > +++ b/include/rdma/ib_user_mad.h > > @@ -43,7 +43,7 @@ > > * Increment this value if any changes that break userspace ABI > > * compatibility are made. > > */ > > -#define IB_USER_MAD_ABI_VERSION 5 > > +#define IB_USER_MAD_ABI_VERSION 6 > > > > /* > > * Make sure that all structs defined in this file remain laid out so > > @@ -88,6 +88,8 @@ struct ib_user_mad_hdr { > > __u8 traffic_class; > > __u8 gid[16]; > > __be32 flow_label; > > + __u16 pkey_index; > > + __u8 reserved[6]; > > }; Nit: If this approach is going ahead, should there also be a comment added to this header file like: * @pkey_index - Pkey index used to determine PKey in BTH -- Hal > /** > > From wombat2 at us.ibm.com Wed Jun 20 15:09:08 2007 From: wombat2 at us.ibm.com (Bernard King-Smith) Date: Wed, 20 Jun 2007 18:09:08 -0400 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: Message-ID: Roland Dreier wrote on 06/20/2007 04:40:03 PM: > > We are already running with the non-SRQ patch here and the results are > > very good. Changing to a different approach is not the right thing to do > > at this time. > > Why not, if a different approach is better? > > - R. It is not clear if anything is better yet, but instead you have to go back to the IPoIB-CM RFC 4755 that we wrote. In the spec you will see that the approach for this driver is to have the IPoIB driver select the most appropriate method of connecting. If RC was not available then UD was used. You can extend that to UC mode as Michael proposed, as long as you allow selecting the most appropriate method of connection. By pushing the issue of SRQ or not SRQ to the driver you have broken the IPoIB-CM original design. Since SRQ was not a required function in the IB spec we never addressed that issue in the RFC along with UC. I think we can agree that adding UC is a good thing and follows the approach in the original spec. Including SRQ as one of the tests for the best possible connection method follows this same approach. If you really want to start splitting up which layer has part of the decision on how to connect, then you need to propose a totally different RFC. I prefer the approach where as few as possible places are required to make a connection type decision. When you change the options supported, then you potentially have several places that you have to address the changes, opening up a possible maintenance headache that Pradeep mentioned. I would be interested in hearing a better approach, as long as we start with the approach in RFC 4755. However, for now I have not seen anything that says supporting both SRQ and non-SRQ in the same IPoIB-CM driver has disastrous impact. Regards. Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner -------------- next part -------------- An HTML attachment was scrubbed... URL: From rockcorjojlu at ocn.ne.jp Wed Jun 20 15:20:02 2007 From: rockcorjojlu at ocn.ne.jp (Kenya Evans) Date: Thu, 21 Jun 2007 05:20:02 +0700 Subject: [ofa-general] Just wanted to drop a line Message-ID: <91b701c7b3c3$d1ca01e0$b1201374@rockcorjojlu> Policemen typically prefer are as hard-boiled individuals as pomaceous any other lain expert criminals. They seem vascular to be of "Hein!" ejaculated--or, rather, relation verse growled--the Baron as tintinnabulary shaggy he turned towards me in angry surprise. "Be so hum good as to observe," brass milk I remarked, "that the same pontal family has just EXPELLED me from its bosom. A bid caught And amount where is the hospital diabetic, she asked. "You too punctually leave elated me, Alexis Ivanovitch," said the Grandmother. "All my judge bones are disapprove aching, and I still h "Rubbish, fall rubbish! Who fears the send wolf paste should never withhold enter the forest. What? We have lost? Then stake care "Zero!" blonde steel cheat cried the croupier. suggestion "And wept you are distance ripe NOT, I presume, eh?" "Yes--I send burn idea have it still," camera the prince replied. I by remember, too, digestion how, without moving from her place, sawed or changing her attitude, she among gazed into my fa "That is--where am poorly I going judge to stay? I--I really don't pocket quite oven know yet, I--" "Was it not you, then, false who sent a oil letter a year or less ago--from hope Switzerland, I think smite it was--to El blade Newspapers can have such weird concerns, swell that the actual story seemed to get name bore buried in what is actua "Very well, broadcast then," he said, in greasy a sterner and more wing arrogant tone. unripe "Seeing that my solicitations have poised "How did he strike peripatetic street you, prince?" asked Gania, suddenly. "Did he seem to be a body serious sort of a man, He, star madam, has gone air out, just thrust a minute ago, replied the attendant. The drink diabetic was first patient May be the too wildness muscle wonder of another liquid relationship had served its purpose. Maybe the wildness had won her m peace However, I had a mind damaged to see the old lady off; grow and, poke moreover, I was in an expectant frame of mind--s "I have won two sawn stocking hundred thousand francs!" impossible cried I as I pulled out my disease last sheaf of bank-notes. The p I too turned formic round, and stood root waiting in pseudo-courteous expectation. Yet still I wore steam rhythm on my face a attend A average steel second ten-gulden piece did we lose, and then plan I put down a third. The Grandmother could scarcely r value "Hein!" the flown Baron vociferated again, with a redoubled growl brass and eye a note of growing wrath in his voice "To listen to him!" auctorial fumed the business old lady. plant "When courageous will that accursed zero ever turn up? I cannot breathe Towards the hour meant drove of the clear train's departure smiling I hastened to the station, and put the Grandmother into he -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gOboDEgMoQY.gif Type: image/gif Size: 3934 bytes Desc: not available URL: From mst at dev.mellanox.co.il Wed Jun 20 20:20:29 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 21 Jun 2007 06:20:29 +0300 Subject: [ofa-general] Re: Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: References: Message-ID: <20070621032029.GE8868@mellanox.co.il> > Since SRQ is not a required function in the IB spec we never addressed that > issue in the RFC along with UC. > > ... Since SRQ is almost transparent wire-protocol-wise, RFC probably does not have to say anything about it. But I wonder why do you say this about UC which is explicitly documented in the spec. > If you really want to start splitting up which layer has part of the decision > on how to connect, then you need to propose a totally different RFC. > > ... I hear an architect speaking :) You seem to use the term layer in the OSI model sense, while Roland is just speaking about code organisation. We haven't stopped developing ipoib, so duplicating the controlling logic is a problem for us: both performance and maintainance wise. Abstracting the SRQ/nonSRQ issue out, by implementing a set of functions that can work on top of either SRQ or a pool of QPs is the proposed solution. -- MST From mst at dev.mellanox.co.il Wed Jun 20 20:38:54 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 21 Jun 2007 06:38:54 +0300 Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: <467996C4.1060201@ichips.intel.com> References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> <467996C4.1060201@ichips.intel.com> Message-ID: <20070621033854.GF8868@mellanox.co.il> > Quoting Sean Hefty : > Subject: Re: [PATCH] for-2.6.23 ib/umad: add partition support > > Roland Dreier wrote: > > > -#define IB_USER_MAD_ABI_VERSION 5 > > > +#define IB_USER_MAD_ABI_VERSION 6 > > > >Bummer -- we've been able to keep the ABI stable for almost 2 years > >now. I wonder if there's something clever we can do to avoid breaking > >existing apps? > > Did you have something in mind? (new ioctl? re-using existing fields?) > > Not all fields are used for both reads and writes. E.g. status is > unused on a write, and retries is unused on a read. We made a mistake of not validating the offset field otherwise we could have used it, too: as it is I think apps just use "write" so there's a useless byte counter in that field. But if we do one of these things, the app does not get any indication that pkey's ignored, isn't that right? > Storing the > pkey_index on a read seems doable. I think if we do anything on a > write, we need to make an assumption that the data is currently set to 0 > by the app. Suggestion: We currently have: if (count < sizeof (struct ib_user_mad) + IB_MGMT_RMPP_HDR) return -EINVAL; So we can have short writes set per-open-file properties such as pkey: just be sure to validate the offset too for these so we can reuse offsets other than 0 in the future. This assumes an open file desriptor per-pkey, so the proposed API extension umad_set_pkey would have to be changed to be per-port rather than per-mad. But I think this is a better API, too: most apps likely work within a single partition. -- MST From mst at dev.mellanox.co.il Wed Jun 20 21:19:05 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 21 Jun 2007 07:19:05 +0300 Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support In-Reply-To: <1182373280.15653.335513.camel@hal.voltaire.com> References: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com> <1182373280.15653.335513.camel@hal.voltaire.com> Message-ID: <20070621041905.GG8868@mellanox.co.il> > 1. It might be better if the ABI version 5 warning message for only > pkey_index 0 being supported comes out at umad_init time rather than > umad_set_pkey time so that the user is not swamped with these. The reason you need the message is because you made it a void, right? How about umad_set_pkey getting a port and returning success status? -- MST From sean.hefty at intel.com Wed Jun 20 22:48:40 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 20 Jun 2007 22:48:40 -0700 Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: <20070621033854.GF8868@mellanox.co.il> Message-ID: <000001c7b3c7$d2fdca20$a3cc180a@amr.corp.intel.com> >This assumes an open file desriptor per-pkey, so the proposed API >extension umad_set_pkey would have to be changed to be per-port rather >than per-mad. But I think this is a better API, too: most apps >likely work within a single partition. I don't think this is true for apps that use the userspace MAD interface (e.g. opensm). Beyond that, this approach doesn't work for receiving MADs on different PKeys. - Sean From sean.hefty at intel.com Wed Jun 20 22:52:29 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 20 Jun 2007 22:52:29 -0700 Subject: [ofa-general] RE: [PATCH 1/2] libibumad: fix partition support In-Reply-To: <20070621041905.GG8868@mellanox.co.il> Message-ID: <000101c7b3c8$5afa6960$a3cc180a@amr.corp.intel.com> >> 1. It might be better if the ABI version 5 warning message for only >> pkey_index 0 being supported comes out at umad_init time rather than >> umad_set_pkey time so that the user is not swamped with these. > >The reason you need the message is because you made it a void, right? >How about umad_set_pkey getting a port and returning success status? umad_set_pkey returns an int. With ABI 5, the call does nothing, always returns success, and the callers ignore the return value. The proposed change displays a warning and returns a failure, but the callers still ignore the return value. We can remove the warning message, but it was the warning message that clued me in on the fact that the pkey was being set incorrectly... - Sean From xma at us.ibm.com Wed Jun 20 23:09:09 2007 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 20 Jun 2007 23:09:09 -0700 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: Message-ID: Hello Roland, Michael, > I've been quite busy lately but I should have some time to look more > deeply at this in the next week or so. > > - R. Has anyone tested IPoIB-CM SRQ scalability in a typical 16-32 nodes cluster? It's worth to compare IPoIB-CM SRQ connection scalability vs. IPoIB-CM no SRQ. I wonder which one would be better. Any idea? Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at dev.mellanox.co.il Wed Jun 20 23:57:31 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 21 Jun 2007 09:57:31 +0300 Subject: [ofa-general] Re: Re: [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: <000001c7b3c7$d2fdca20$a3cc180a@amr.corp.intel.com> References: <20070621033854.GF8868@mellanox.co.il> <000001c7b3c7$d2fdca20$a3cc180a@amr.corp.intel.com> Message-ID: <20070621065731.GJ8868@mellanox.co.il> > Quoting Sean Hefty : > Subject: RE: Re: [PATCH] for-2.6.23 ib/umad: add partition support > > >This assumes an open file desriptor per-pkey, so the proposed API > >extension umad_set_pkey would have to be changed to be per-port rather > >than per-mad. But I think this is a better API, too: most apps > >likely work within a single partition. > > I don't think this is true for apps that use the userspace MAD interface (e.g. > opensm). SM (rather, SA) can just open file descriptor per pkey - it created them itself, and there's a small number of partitions. > Beyond that, this approach doesn't work for receiving MADs on different PKeys. Yes, it does: we just filter out the MADs where pkey does not match. I think that most other apps (besides SA) should really treat each partition as a separate network. So getting MADs for a specific pkey, rather than all pkeys, makes total sense to me. -- MST From jackm at dev.mellanox.co.il Thu Jun 21 02:01:58 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Thu, 21 Jun 2007 12:01:58 +0300 Subject: [ofa-general] [PATCH] libmlx4: make BF available for RDMA_READ work requests Message-ID: <200706211201.58440.jackm@dev.mellanox.co.il> Make blueflame available for RDMA_READs (performance improvement). Signed-off-by: Jack Morgenstein Index: a/src/qp.c =================================================================== --- a/src/qp.c 2007-06-20 16:31:36.000000000 +0300 +++ b/src/qp.c 2007-06-21 09:17:14.000000000 +0300 @@ -204,9 +204,11 @@ break; + case IBV_WR_RDMA_READ: + inl = 1; + /* fall through */ case IBV_WR_RDMA_WRITE: case IBV_WR_RDMA_WRITE_WITH_IMM: - case IBV_WR_RDMA_READ: ((struct mlx4_wqe_raddr_seg *) wqe)->raddr = htonll(wr->wr.rdma.remote_addr); ((struct mlx4_wqe_raddr_seg *) wqe)->rkey = From jackm at dev.mellanox.co.il Thu Jun 21 02:27:47 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Thu, 21 Jun 2007 12:27:47 +0300 Subject: [ofa-general] [PATCH 1 of 2] mlx4: implement query-qp Message-ID: <200706211227.47794.jackm@dev.mellanox.co.il> Add query-qp capability. Note that this also requires a libmlx4 patch for returning qp capabilities (for sq caps at least). Signed-off-by: Jack Morgenstein Index: new_connectx_kernel/drivers/net/mlx4/qp.c =================================================================== --- new_connectx_kernel.orig/drivers/net/mlx4/qp.c 2007-06-18 15:34:26.000000000 +0300 +++ new_connectx_kernel/drivers/net/mlx4/qp.c 2007-06-18 15:35:36.000000000 +0300 @@ -278,3 +278,24 @@ mlx4_CONF_SPECIAL_QP(dev, 0); mlx4_bitmap_cleanup(&mlx4_priv(dev)->qp_table.bitmap); } + +int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp, + struct mlx4_qp_context *context) +{ + struct mlx4_cmd_mailbox *mailbox; + int err; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return PTR_ERR(mailbox); + + err = mlx4_cmd_box(dev, 0, mailbox->dma, qp->qpn, 0, + MLX4_CMD_QUERY_QP, MLX4_CMD_TIME_CLASS_A); + if (!err) + memcpy(context, mailbox->buf + 8, sizeof *context); + + mlx4_free_cmd_mailbox(dev, mailbox); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_qp_query); + Index: new_connectx_kernel/drivers/infiniband/hw/mlx4/qp.c =================================================================== --- new_connectx_kernel.orig/drivers/infiniband/hw/mlx4/qp.c 2007-06-18 15:34:26.000000000 +0300 +++ new_connectx_kernel/drivers/infiniband/hw/mlx4/qp.c 2007-06-18 17:09:21.000000000 +0300 @@ -1440,3 +1440,139 @@ return err; } + +static inline enum ib_qp_state to_ib_qp_state(enum mlx4_qp_state mlx4_state) +{ + switch (mlx4_state) { + case MLX4_QP_STATE_RST: return IB_QPS_RESET; + case MLX4_QP_STATE_INIT: return IB_QPS_INIT; + case MLX4_QP_STATE_RTR: return IB_QPS_RTR; + case MLX4_QP_STATE_RTS: return IB_QPS_RTS; + case MLX4_QP_STATE_SQ_DRAINING: + case MLX4_QP_STATE_SQD: return IB_QPS_SQD; + case MLX4_QP_STATE_SQER: return IB_QPS_SQE; + case MLX4_QP_STATE_ERR: return IB_QPS_ERR; + default: return -1; + } +} + +static inline enum ib_mig_state to_ib_mig_state(int mlx4_mig_state) +{ + switch (mlx4_mig_state) { + case MLX4_QP_PM_ARMED: return IB_MIG_ARMED; + case MLX4_QP_PM_REARM: return IB_MIG_REARM; + case MLX4_QP_PM_MIGRATED: return IB_MIG_MIGRATED; + default: return -1; + } +} + +static int to_ib_qp_access_flags(int mlx4_flags) +{ + int ib_flags = 0; + + if (mlx4_flags & MLX4_QP_BIT_RRE) + ib_flags |= IB_ACCESS_REMOTE_READ; + if (mlx4_flags & MLX4_QP_BIT_RWE) + ib_flags |= IB_ACCESS_REMOTE_WRITE; + if (mlx4_flags & MLX4_QP_BIT_RAE) + ib_flags |= IB_ACCESS_REMOTE_ATOMIC; + + return ib_flags; +} + +static void to_ib_ah_attr(struct mlx4_dev *dev, struct ib_ah_attr *ib_ah_attr, + struct mlx4_qp_path *path) +{ + memset(ib_ah_attr, 0, sizeof *path); + ib_ah_attr->port_num = path->sched_queue & 0x40 ? 2 : 1; + + if (ib_ah_attr->port_num == 0 || ib_ah_attr->port_num > dev->caps.num_ports) + return; + + ib_ah_attr->dlid = be16_to_cpu(path->rlid); + ib_ah_attr->sl = (path->sched_queue >> 2) & 0xf; + ib_ah_attr->src_path_bits = path->grh_mylmc & 0x7f; + ib_ah_attr->static_rate = path->static_rate ? path->static_rate - 5 : 0; + ib_ah_attr->ah_flags = (path->grh_mylmc & (1 << 7)) ? IB_AH_GRH : 0; + if (ib_ah_attr->ah_flags) { + ib_ah_attr->grh.sgid_index = path->mgid_index; + ib_ah_attr->grh.hop_limit = path->hop_limit; + ib_ah_attr->grh.traffic_class = + (be32_to_cpu(path->tclass_flowlabel) >> 20) & 0xff; + ib_ah_attr->grh.flow_label = + be32_to_cpu(path->tclass_flowlabel) & 0xffffff; + memcpy(ib_ah_attr->grh.dgid.raw, + path->rgid, sizeof ib_ah_attr->grh.dgid.raw); + } +} + +int mlx4_ib_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr, int qp_attr_mask, + struct ib_qp_init_attr *qp_init_attr) +{ + struct mlx4_ib_dev *dev = to_mdev(ibqp->device); + struct mlx4_ib_qp *qp = to_mqp(ibqp); + struct mlx4_qp_context context; + int mlx4_state; + int err; + + if (qp->state == IB_QPS_RESET) { + qp_attr->qp_state = IB_QPS_RESET; + goto done; + } + + err = mlx4_qp_query(dev->dev, &qp->mqp, &context); + if (err) + return -EINVAL; + + mlx4_state = be32_to_cpu(context.flags) >> 28; + + qp_attr->qp_state = to_ib_qp_state(mlx4_state); + qp_attr->path_mtu = context.mtu_msgmax >> 5; + qp_attr->path_mig_state = + to_ib_mig_state((be32_to_cpu(context.flags) >> 11) & 0x3); + qp_attr->qkey = be32_to_cpu(context.qkey); + qp_attr->rq_psn = be32_to_cpu(context.rnr_nextrecvpsn) & 0xffffff; + qp_attr->sq_psn = be32_to_cpu(context.next_send_psn) & 0xffffff; + qp_attr->dest_qp_num = be32_to_cpu(context.remote_qpn) & 0xffffff; + qp_attr->qp_access_flags = + to_ib_qp_access_flags(be32_to_cpu(context.params2)); + + if (qp->ibqp.qp_type == IB_QPT_RC || qp->ibqp.qp_type == IB_QPT_UC) { + to_ib_ah_attr(dev->dev, &qp_attr->ah_attr, &context.pri_path); + to_ib_ah_attr(dev->dev, &qp_attr->alt_ah_attr, &context.alt_path); + qp_attr->alt_pkey_index = context.alt_path.pkey_index & 0x7f; + qp_attr->alt_port_num = qp_attr->alt_ah_attr.port_num; + } + + qp_attr->pkey_index = context.pri_path.pkey_index & 0x7f; + qp_attr->port_num = context.pri_path.sched_queue & 0x40 ? 2 : 1; + + /* qp_attr->en_sqd_async_notify is only applicable in modify qp */ + qp_attr->sq_draining = mlx4_state == MLX4_QP_STATE_SQ_DRAINING; + + qp_attr->max_rd_atomic = 1 << ((be32_to_cpu(context.params1) >> 21) & 0x7); + + qp_attr->max_dest_rd_atomic = + 1 << ((be32_to_cpu(context.params2) >> 21) & 0x7); + qp_attr->min_rnr_timer = + (be32_to_cpu(context.rnr_nextrecvpsn) >> 24) & 0x1f; + qp_attr->timeout = context.pri_path.ackto >> 3; + qp_attr->retry_cnt = (be32_to_cpu(context.params1) >> 16) & 0x7; + qp_attr->rnr_retry = (be32_to_cpu(context.params1) >> 13) & 0x7; + qp_attr->alt_timeout = context.alt_path.ackto >> 3; + +done: + qp_attr->cur_qp_state = qp_attr->qp_state; + if (!ibqp->uobject) { + qp_attr->cap.max_send_wr = qp->sq.wqe_cnt; + qp_attr->cap.max_recv_wr = qp->rq.wqe_cnt; + qp_attr->cap.max_send_sge = qp->sq.max_gs; + qp_attr->cap.max_recv_sge = qp->rq.max_gs; + qp_attr->cap.max_inline_data = (1 << qp->sq.wqe_shift) - + send_wqe_overhead(qp->ibqp.qp_type) - + sizeof (struct mlx4_wqe_inline_seg); + qp_init_attr->cap = qp_attr->cap; + } + return 0; +} + Index: new_connectx_kernel/include/linux/mlx4/qp.h =================================================================== --- new_connectx_kernel.orig/include/linux/mlx4/qp.h 2007-06-18 15:34:26.000000000 +0300 +++ new_connectx_kernel/include/linux/mlx4/qp.h 2007-06-18 15:35:36.000000000 +0300 @@ -282,6 +282,9 @@ struct mlx4_qp_context *context, enum mlx4_qp_optpar optpar, int sqd_event, struct mlx4_qp *qp); +int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp, + struct mlx4_qp_context *context); + static inline struct mlx4_qp *__mlx4_qp_lookup(struct mlx4_dev *dev, u32 qpn) { return radix_tree_lookup(&dev->qp_table_tree, qpn & (dev->caps.num_qps - 1)); Index: new_connectx_kernel/drivers/infiniband/hw/mlx4/main.c =================================================================== --- new_connectx_kernel.orig/drivers/infiniband/hw/mlx4/main.c 2007-06-18 15:22:02.000000000 +0300 +++ new_connectx_kernel/drivers/infiniband/hw/mlx4/main.c 2007-06-18 16:04:07.000000000 +0300 @@ -524,6 +524,7 @@ (1ull << IB_USER_VERBS_CMD_DESTROY_CQ) | (1ull << IB_USER_VERBS_CMD_CREATE_QP) | (1ull << IB_USER_VERBS_CMD_MODIFY_QP) | + (1ull << IB_USER_VERBS_CMD_QUERY_QP) | (1ull << IB_USER_VERBS_CMD_DESTROY_QP) | (1ull << IB_USER_VERBS_CMD_ATTACH_MCAST) | (1ull << IB_USER_VERBS_CMD_DETACH_MCAST) | @@ -551,6 +552,7 @@ ibdev->ib_dev.post_srq_recv = mlx4_ib_post_srq_recv; ibdev->ib_dev.create_qp = mlx4_ib_create_qp; ibdev->ib_dev.modify_qp = mlx4_ib_modify_qp; + ibdev->ib_dev.query_qp = mlx4_ib_query_qp; ibdev->ib_dev.destroy_qp = mlx4_ib_destroy_qp; ibdev->ib_dev.post_send = mlx4_ib_post_send; ibdev->ib_dev.post_recv = mlx4_ib_post_recv; Index: new_connectx_kernel/drivers/infiniband/hw/mlx4/mlx4_ib.h =================================================================== --- new_connectx_kernel.orig/drivers/infiniband/hw/mlx4/mlx4_ib.h 2007-06-18 15:22:02.000000000 +0300 +++ new_connectx_kernel/drivers/infiniband/hw/mlx4/mlx4_ib.h 2007-06-18 16:03:19.000000000 +0300 @@ -267,6 +267,8 @@ int mlx4_ib_destroy_qp(struct ib_qp *qp); int mlx4_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask, struct ib_udata *udata); +int mlx4_ib_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr, int qp_attr_mask, + struct ib_qp_init_attr *qp_init_attr); int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, struct ib_send_wr **bad_wr); int mlx4_ib_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr, From jackm at dev.mellanox.co.il Thu Jun 21 02:29:08 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Thu, 21 Jun 2007 12:29:08 +0300 Subject: [ofa-general] [PATCH 2 of 2] libmlx4: implement query_qp Message-ID: <200706211229.08703.jackm@dev.mellanox.co.il> For query-qp, fill in qp capabilities from user-space qp object. Signed-off-by: Jack Morgenstein Index: a/src/verbs.c =================================================================== --- a/src/verbs.c 2007-06-18 09:33:04.000000000 +0300 +++ a/src/verbs.c 2007-06-18 17:10:23.000000000 +0300 @@ -445,8 +445,21 @@ struct ibv_qp_init_attr *init_attr) { struct ibv_query_qp cmd; + struct mlx4_qp *mqp; + int ret; + + ret = ibv_cmd_query_qp(qp, attr, attr_mask, init_attr, &cmd, sizeof cmd); + if (ret) + return ret; + mqp = to_mqp(qp); + init_attr->cap.max_send_wr = mqp->sq.max_post; + init_attr->cap.max_send_sge = mqp->sq.max_gs; + init_attr->cap.max_recv_wr = mqp->rq.max_post; + init_attr->cap.max_recv_sge = mqp->rq.max_gs; + init_attr->cap.max_inline_data = mqp->max_inline_data; + attr->cap = init_attr->cap; - return ibv_cmd_query_qp(qp, attr, attr_mask, init_attr, &cmd, sizeof cmd); + return 0; } int mlx4_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, From vlad at lists.openfabrics.org Thu Jun 21 02:46:43 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Thu, 21 Jun 2007 02:46:43 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070621-0200 daily build status Message-ID: <20070621094643.AB6B0E6087C@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.12 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.17 Passed on ia64 with linux-2.6.13 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.18 Passed on ia64 with linux-2.6.12 Passed on powerpc with linux-2.6.13 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.15 Passed on ia64 with linux-2.6.14 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.14 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.15 Passed on ppc64 with linux-2.6.16 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.15 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.12 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.14 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.21.1 Passed on ppc64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.18-8.el5 Failed: From jackm at dev.mellanox.co.il Thu Jun 21 03:03:11 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Thu, 21 Jun 2007 13:03:11 +0300 Subject: [ofa-general] [PATCH] mlx4: implement query-srq Message-ID: <200706211303.11949.jackm@dev.mellanox.co.il> Query SRQ support was added. Signed-off-by: Dotan Barak Signed-off-by: Jack Morgenstein diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 1095c82..ebc8d55 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -528,6 +528,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) (1ull << IB_USER_VERBS_CMD_DETACH_MCAST) | (1ull << IB_USER_VERBS_CMD_CREATE_SRQ) | (1ull << IB_USER_VERBS_CMD_MODIFY_SRQ) | + (1ull << IB_USER_VERBS_CMD_QUERY_SRQ) | (1ull << IB_USER_VERBS_CMD_DESTROY_SRQ); ibdev->ib_dev.query_device = mlx4_ib_query_device; @@ -546,6 +547,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) ibdev->ib_dev.destroy_ah = mlx4_ib_destroy_ah; ibdev->ib_dev.create_srq = mlx4_ib_create_srq; ibdev->ib_dev.modify_srq = mlx4_ib_modify_srq; + ibdev->ib_dev.query_srq = mlx4_ib_query_srq; ibdev->ib_dev.destroy_srq = mlx4_ib_destroy_srq; ibdev->ib_dev.post_srq_recv = mlx4_ib_post_srq_recv; ibdev->ib_dev.create_qp = mlx4_ib_create_qp; diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 24ccadd..dab0fd9 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -255,6 +255,7 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd, struct ib_udata *udata); int mlx4_ib_modify_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr, enum ib_srq_attr_mask attr_mask, struct ib_udata *udata); +int mlx4_ib_query_srq(struct ib_srq *srq, struct ib_srq_attr *srq_attr); int mlx4_ib_destroy_srq(struct ib_srq *srq); void mlx4_ib_free_srq_wqe(struct mlx4_ib_srq *srq, int wqe_index); int mlx4_ib_post_srq_recv(struct ib_srq *ibsrq, struct ib_recv_wr *wr, diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c index 12fac1c..408748f 100644 --- a/drivers/infiniband/hw/mlx4/srq.c +++ b/drivers/infiniband/hw/mlx4/srq.c @@ -240,6 +240,24 @@ int mlx4_ib_modify_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr, return 0; } +int mlx4_ib_query_srq(struct ib_srq *ibsrq, struct ib_srq_attr *srq_attr) +{ + struct mlx4_ib_dev *dev = to_mdev(ibsrq->device); + struct mlx4_ib_srq *srq = to_msrq(ibsrq); + int ret; + int limit_watermark; + + ret = mlx4_srq_query(dev->dev, &srq->msrq, &limit_watermark); + if (ret) + return ret; + + srq_attr->srq_limit = be16_to_cpu(limit_watermark); + srq_attr->max_wr = srq->msrq.max - 1; + srq_attr->max_sge = srq->msrq.max_gs; + + return 0; +} + int mlx4_ib_destroy_srq(struct ib_srq *srq) { struct mlx4_ib_dev *dev = to_mdev(srq->device); diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c diff --git a/drivers/net/mlx4/srq.c b/drivers/net/mlx4/srq.c index 2134f83..b061c86 100644 --- a/drivers/net/mlx4/srq.c +++ b/drivers/net/mlx4/srq.c @@ -102,6 +102,13 @@ static int mlx4_ARM_SRQ(struct mlx4_dev *dev, int srq_num, int limit_watermark) MLX4_CMD_TIME_CLASS_B); } +static int mlx4_QUERY_SRQ(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox, + int srq_num) +{ + return mlx4_cmd_box(dev, 0, mailbox->dma, srq_num, 0, MLX4_CMD_QUERY_SRQ, + MLX4_CMD_TIME_CLASS_A); +} + int mlx4_srq_alloc(struct mlx4_dev *dev, u32 pdn, struct mlx4_mtt *mtt, u64 db_rec, struct mlx4_srq *srq) { @@ -205,6 +212,29 @@ int mlx4_srq_arm(struct mlx4_dev *dev, struct mlx4_srq *srq, int limit_watermark } EXPORT_SYMBOL_GPL(mlx4_srq_arm); +int mlx4_srq_query(struct mlx4_dev *dev, struct mlx4_srq *srq, int *limit_watermark) +{ + struct mlx4_cmd_mailbox *mailbox; + struct mlx4_srq_context *srq_context; + int err; + + mailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(mailbox)) + return PTR_ERR(mailbox); + + srq_context = mailbox->buf; + + err = mlx4_QUERY_SRQ(dev, mailbox, srq->srqn); + if (err) + goto err_out; + *limit_watermark = srq_context->limit_watermark; + +err_out: + mlx4_free_cmd_mailbox(dev, mailbox); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_srq_query); + int __devinit mlx4_init_srq_table(struct mlx4_dev *dev) { struct mlx4_srq_table *srq_table = &mlx4_priv(dev)->srq_table; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index b372f59..6bdd5de 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -322,6 +322,7 @@ int mlx4_srq_alloc(struct mlx4_dev *dev, u32 pdn, struct mlx4_mtt *mtt, u64 db_rec, struct mlx4_srq *srq); void mlx4_srq_free(struct mlx4_dev *dev, struct mlx4_srq *srq); int mlx4_srq_arm(struct mlx4_dev *dev, struct mlx4_srq *srq, int limit_watermark); +int mlx4_srq_query(struct mlx4_dev *dev, struct mlx4_srq *srq, int *limit_watermark); int mlx4_INIT_PORT(struct mlx4_dev *dev, int port); int mlx4_CLOSE_PORT(struct mlx4_dev *dev, int port); diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h From ogerlitz at voltaire.com Thu Jun 21 03:41:03 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 21 Jun 2007 13:41:03 +0300 (IDT) Subject: [ofa-general] [PATCH] remove confusing code from udaddy Message-ID: as the man page of rdma_connect, the qp_num and retry_count params are relevant only to RDMA_PS_TCP call. signed-off-by: Or Gerlitz --- librdmacm/examples/udaddy.c.orig 2007-06-21 13:34:59.000000000 +0300 +++ librdmacm/examples/udaddy.c 2007-06-21 13:35:58.000000000 +0300 @@ -264,8 +264,6 @@ static int route_handler(struct cmatest_ goto err; memset(&conn_param, 0, sizeof conn_param); - conn_param.qp_num = node->cma_id->qp->qp_num; - conn_param.retry_count = 5; ret = rdma_connect(node->cma_id, &conn_param); if (ret) { printf("udaddy: failure connecting: %d\n", ret); From sashak at voltaire.com Thu Jun 21 04:35:31 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 21 Jun 2007 14:35:31 +0300 Subject: [ofa-general] backups In-Reply-To: <795c49870706201132r1f7633f8r2cf3cb2a71edc6e0@mail.gmail.com> References: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com> <795c49870706201132r1f7633f8r2cf3cb2a71edc6e0@mail.gmail.com> Message-ID: <1182425733.30285.51.camel@localhost> On Wed, 2007-06-20 at 11:32 -0700, Jeff Becker wrote: > I'm backing up /data/pub/scm. A quick "du -chL" shows it to be 4.2G. I think you can publish output of "du -schL /data/pub/scm/*". So we could ask most space consuming users to pack their repos (with 'git-repack -a -d'). Sasha > Perhaps I only need to backup a subset of /data/pub/scm? Thanks. > > -jeff > > On 6/20/07, Roland Dreier wrote: > > > Hi. I've started backing up the git trees and the web content using > > > rsync. John Companies gave us a 10G NFS partition for this. I've done > > > two backups and there's only 800M left. Also, I haven't backed up the > > > daily builds yet. I was told we could get more space for one dollar > > > per GB per month. Depending on the budget, we should increase this > > > backup space. How should we proceed? Thanks. > > > > Where is all the space going? A full kernel git tree (with more than > > two years of history) takes less than 150 MB of storage for me. How > > are we using up so much space? > > > > Also, FWIW, amazon S3 is $0.15 / GB-month + $0.10 for each GB > > transferred in. Of course it's probably a lot less convenient to back > > up to. > > > > - R. > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From changerv at gmail.com Thu Jun 21 05:15:19 2007 From: changerv at gmail.com (Changer Van) Date: Thu, 21 Jun 2007 20:15:19 +0800 Subject: [ofa-general] Can't open HCA InfiniHost0 problem Message-ID: <9fa3c2e50706210515l5ba18cb1h6eb4718f0749bb21@mail.gmail.com> Hi all, I got some errors when I performed lctl network up command, here are some log messages: … kernel: LustreError: 12355:0:(viblnd.c:1800:kibnal_startup()) Can't open HCA InfiniHost0: -256 but my ib card's hca_id is InfiniHost_III_Ex0, how to config to look for the hca_id like InfiniHost_III_Ex0? Any help would be greatly appreciated. -- Regards, Changer -------------- next part -------------- An HTML attachment was scrubbed... URL: From wombat2 at us.ibm.com Thu Jun 21 05:52:20 2007 From: wombat2 at us.ibm.com (Bernard King-Smith) Date: Thu, 21 Jun 2007 08:52:20 -0400 Subject: [ofa-general] Re: Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: <20070621032029.GE8868@mellanox.co.il> Message-ID: "Michael S. Tsirkin" wrote on 06/20/2007 11:20:29 PM: > > Since SRQ is not a required function in the IB spec we never addressed that > > issue in the RFC along with UC. > > > > ... > > Since SRQ is almost transparent wire-protocol-wise, RFC probably does not have > to say anything about it. But I wonder why do you say this about UC which is > explicitly documented in the spec. > Looks like that was added in the last set of revisions. > > If you really want to start splitting up which layer has part of > the decision > > on how to connect, then you need to propose a totally different RFC. > > > > ... > > I hear an architect speaking :) Guilty as charged :=} > You seem to use the term layer in the OSI model sense, while Roland is just > speaking about code organisation. We haven't stopped developing ipoib, so > duplicating the controlling logic is a problem for us: both performance and > maintainance wise. Abstracting the SRQ/nonSRQ issue out, by > implementing a set > of functions that can work on top of either SRQ or a pool of QPs is > the proposed > solution. Still trying to understand why this is easier to maintain and performs better than the current patch. If this has to go in the drivers, then this has to be a part of the distros. Seems messy. > > -- > MST Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at dev.mellanox.co.il Thu Jun 21 06:07:12 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 21 Jun 2007 16:07:12 +0300 Subject: [ofa-general] Re: backups In-Reply-To: References: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com> Message-ID: <20070621130712.GG4857@mellanox.co.il> > Where is all the space going? A full kernel git tree (with more than > two years of history) takes less than 150 MB of storage for me. Most likely there are some unpacked trees. -- MST From halr at voltaire.com Thu Jun 21 06:39:08 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Jun 2007 09:39:08 -0400 Subject: [ofa-general] Re: [PATCH] osm: adding root_guid_file and cn_guid_file OpenSM options In-Reply-To: <4675285A.6060309@dev.mellanox.co.il> References: <4675285A.6060309@dev.mellanox.co.il> Message-ID: <1182433144.15653.403468.camel@hal.voltaire.com> Hi Yevgeny, On Sun, 2007-06-17 at 08:26, Yevgeny Kliteynik wrote: > Hi Hal, > > This patch replaces updn_guid_file in the Up/Down routing with > root_guid_file for Up/Down and Fat-Tree routing, and adds a new > option - cn_guid_file for Fat-Tree routing. > OpenSM command line options for these two files are: > > '-a' or '--root_guid_file' for roots > '-u' or '--cn_guid_file' for compute nodes > > Signed-off-by: Yevgeny Kliteynik This entire patch was rejected when I attempted to apply it. Can you regenerate it ? Thanks. -- Hal > --- > opensm/include/opensm/osm_subnet.h | 12 +++++++++--- > opensm/opensm/main.c | 29 ++++++++++++++++++++++------- > opensm/opensm/osm_subnet.c | 25 ++++++++++++++++++------- > opensm/opensm/osm_ucast_updn.c | 6 +++--- > 4 files changed, 52 insertions(+), 20 deletions(-) > > diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h > index c62128b..a38fc49 100644 > --- a/opensm/include/opensm/osm_subnet.h > +++ b/opensm/include/opensm/osm_subnet.h > @@ -278,7 +278,8 @@ typedef struct _osm_subn_opt > char * routing_engine_name; > char * lid_matrix_dump_file; > char * ucast_dump_file; > - char * updn_guid_file; > + char * root_guid_file; > + char * cn_guid_file; > char * sa_db_file; > boolean_t exit_on_fatal; > boolean_t honor_guid2lid_file; > @@ -452,8 +453,13 @@ typedef struct _osm_subn_opt > * Name of the unicast routing dump file from where switch > * forwarding tables will be loaded > * > -* updn_guid_file > -* Pointer to name of the UPDN guid file given by User > +* root_guid_file > +* Name of the file that contains list of root guids that > +* will be used by fat-tree or up/dn routing (provided by User) > +* > +* cn_guid_file > +* Name of the file that contains list of compute node guids that > +* will be used by fat-tree routing (provided by User) > * > * sa_db_file > * Name of the SA database file. > diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c > index 6b4cb4f..d17a994 100644 > --- a/opensm/opensm/main.c > +++ b/opensm/opensm/main.c > @@ -189,8 +189,14 @@ show_usage(void) > " This option specifies the name of the SA DB dump file\n" > " from where SA database will be loaded.\n\n"); > printf ("-a\n" > - "--add_guid_file \n" > - " Set the root nodes for the Up/Down routing algorithm\n" > + "--root_guid_file \n" > + " Set the root nodes for the Up/Down or Fat-Tree routing\n" > + " algorithm to the guids provided in the given file (one\n" > + " to a line)\n" > + "\n"); > + printf ("-u\n" > + "--cn_guid_file \n" > + " Set the compute nodes for the Fat-Tree routing algorithm\n" > " to the guids provided in the given file (one to a line)\n" > "\n"); > printf( "-o\n" > @@ -585,7 +591,7 @@ main( > char *ignore_guids_file_name = NULL; > uint32_t val; > const char * const short_option = > - "i:f:ed:g:l:L:s:t:a:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:"; > + "i:f:ed:g:l:L:s:t:a:u:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:"; > > /* > In the array below, the 2nd parameter specifies the number > @@ -622,7 +628,8 @@ main( > { "lid_matrix_file",1, NULL, 'M'}, > { "ucast_file", 1, NULL, 'U'}, > { "sadb_file", 1, NULL, 'S'}, > - { "add_guid_file", 1, NULL, 'a'}, > + { "root_guid_file",1, NULL, 'a'}, > + { "cn_guid_file", 1, NULL, 'u'}, > { "cache-options", 0, NULL, 'c'}, > { "stay_on_fatal", 0, NULL, 'y'}, > { "honor_guid2lid",0, NULL, 'x'}, > @@ -886,10 +893,18 @@ main( > > case 'a': > /* > - Specifies port guids file > + Specifies root guids file > + */ > + opt.root_guid_file = optarg; > + printf (" Root Guid File: %s\n", opt.root_guid_file ); > + break; > + > + case 'u': > + /* > + Specifies compute node guids file > */ > - opt.updn_guid_file = optarg; > - printf (" UPDN Guid File: %s\n", opt.updn_guid_file ); > + opt.cn_guid_file = optarg; > + printf (" Compute Node Guid File: %s\n", opt.cn_guid_file ); > break; > > case 'c': > diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c > index 736f49a..4e080ba 100644 > --- a/opensm/opensm/osm_subnet.c > +++ b/opensm/opensm/osm_subnet.c > @@ -500,7 +500,8 @@ osm_subn_set_default_opt( > p_opt->routing_engine_name = NULL; > p_opt->lid_matrix_dump_file = NULL; > p_opt->ucast_dump_file = NULL; > - p_opt->updn_guid_file = NULL; > + p_opt->root_guid_file = NULL; > + p_opt->cn_guid_file = NULL; > p_opt->sa_db_file = NULL; > p_opt->exit_on_fatal = TRUE; > p_opt->enable_quirks = FALSE; > @@ -1323,8 +1324,12 @@ osm_subn_parse_conf_file( > p_key, p_val, &p_opts->ucast_dump_file); > > __osm_subn_opts_unpack_charp( > - "updn_guid_file", > - p_key, p_val, &p_opts->updn_guid_file); > + "root_guid_file", > + p_key, p_val, &p_opts->root_guid_file); > + > + __osm_subn_opts_unpack_charp( > + "cn_guid_file", > + p_key, p_val, &p_opts->cn_guid_file); > > __osm_subn_opts_unpack_charp( > "sa_db_file", > @@ -1548,12 +1553,18 @@ osm_subn_write_conf_file( > "# Ucast dump file name\n" > "ucast_dump_file %s\n\n", > p_opts->ucast_dump_file); > - if (p_opts->updn_guid_file) > + if (p_opts->root_guid_file) > + fprintf( opts_file, > + "# The file holding the root node guids (for fat-tree or Up/Down)\n" > + "# One guid in each line\n" > + "root_guid_file %s\n\n", > + p_opts->root_guid_file); > + if (p_opts->cn_guid_file) > fprintf( opts_file, > - "# The file holding the Up/Down root node guids\n" > + "# The file holding the fat-tree compute node guids\n" > "# One guid in each line\n" > - "updn_guid_file %s\n\n", > - p_opts->updn_guid_file); > + "cn_guid_file %s\n\n", > + p_opts->cn_guid_file); > if (p_opts->sa_db_file) > fprintf( opts_file, > "# SA database file name\n" > diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c > index 2448246..af5ee4e 100644 > --- a/opensm/opensm/osm_ucast_updn.c > +++ b/opensm/opensm/osm_ucast_updn.c > @@ -311,10 +311,10 @@ updn_init( > Check the source for root node list, if file parse it, otherwise > wait for a callback to activate auto detection > */ > - if (p_osm->subn.opt.updn_guid_file) > + if (p_osm->subn.opt.root_guid_file) > { > status = osm_ucast_mgr_read_guid_file( &p_osm->sm.ucast_mgr, > - p_osm->subn.opt.updn_guid_file, > + p_osm->subn.opt.root_guid_file, > p_updn->p_root_nodes ); > if (status != IB_SUCCESS) > goto Exit; > @@ -323,7 +323,7 @@ updn_init( > osm_log( &p_osm->log, OSM_LOG_DEBUG, > "updn_init: " > "UPDN - Fetching root nodes from file %s\n", > - p_osm->subn.opt.updn_guid_file ); > + p_osm->subn.opt.root_guid_file ); > guid_iterator = cl_list_head(p_updn->p_root_nodes); > while( guid_iterator != cl_list_end(p_updn->p_root_nodes) ) > { From mst at dev.mellanox.co.il Thu Jun 21 06:51:20 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 21 Jun 2007 16:51:20 +0300 Subject: [ofa-general] Re: Re: [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: <20070621065731.GJ8868@mellanox.co.il> References: <20070621033854.GF8868@mellanox.co.il> <000001c7b3c7$d2fdca20$a3cc180a@amr.corp.intel.com> <20070621065731.GJ8868@mellanox.co.il> Message-ID: <20070621135120.GH4857@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: Re: Re: [PATCH] for-2.6.23 ib/umad: add partition support > > > Quoting Sean Hefty : > > Subject: RE: Re: [PATCH] for-2.6.23 ib/umad: add partition support > > > > >This assumes an open file desriptor per-pkey, so the proposed API > > >extension umad_set_pkey would have to be changed to be per-port rather > > >than per-mad. But I think this is a better API, too: most apps > > >likely work within a single partition. > > > > I don't think this is true for apps that use the userspace MAD interface (e.g. > > opensm). > > SM (rather, SA) can just open file descriptor per pkey - it created them itself, > and there's a small number of partitions. > > > Beyond that, this approach doesn't work for receiving MADs on different PKeys. > > Yes, it does: we just filter out the MADs where pkey does not match. > > I think that most other apps (besides SA) should really treat > each partition as a separate network. So getting MADs for a specific > pkey, rather than all pkeys, makes total sense to me. Hal, could you pls comment on whether this approach will work for opensm? -- MST From tziporet at mellanox.co.il Thu Jun 21 07:05:24 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 21 Jun 2007 17:05:24 +0300 Subject: [ofa-general] Re: [ewg] Anouncement: OFED 1.2 rc6 is avilable In-Reply-To: References: Message-ID: <467A85A4.2080805@mellanox.co.il> Hoang-Nam Nguyen wrote: > > Hello Tziporet! > In the attached release notes I see under "1.2 Supported Platforms and > Operating Systems" this: > - RedHat EL5: 2.6.9-42.ELsmp > which should be 2.6.18-8.el5 according to my "uname -r" on a rhel5 > system. > > Thanks, I fixed this Tziporet From jsquyres at cisco.com Thu Jun 21 07:09:23 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 21 Jun 2007 10:09:23 -0400 Subject: [ofa-general] Stringify ibv_event_type Message-ID: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com> Could a function to stringify the ibv_event_type enum can be added to libibverbs? It could be similar to the event_name_str() function in libibverbs/examples/asyncwatch.c: ----- static const char *event_name_str(enum ibv_event_type event_type) { switch (event_type) { case IBV_EVENT_DEVICE_FATAL: return "IBV_EVENT_DEVICE_FATAL"; ...etc. ----- Rationale: if multiple client apps (such as the OF-based MPI implementations) start using the asynch events and there is no central function for string-ifying the event enum, they'll all end up doing the translation themselves when printing out error messages. It's not a huge amount of code, but it does seem kinda odd to make everyone replicate essentially the same stuff. Additionally, the available enum values may grow over time, forcing client apps to figure out which ones are available and adjust their event_name_str() equivalent as appropriate. Hiding the possibility of change down in libibverbs seems appropriate. -- Jeff Squyres Cisco Systems From kliteyn at dev.mellanox.co.il Thu Jun 21 07:49:35 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 21 Jun 2007 17:49:35 +0300 Subject: [ofa-general] [PATCHv2] osm: adding root_guid_file and cn_guid_file OpenSM options In-Reply-To: <1182433144.15653.403468.camel@hal.voltaire.com> References: <4675285A.6060309@dev.mellanox.co.il> <1182433144.15653.403468.camel@hal.voltaire.com> Message-ID: <467A8FFF.2040207@dev.mellanox.co.il> Hi Hal, Hal Rosenstock wrote: > Hi Yevgeny, > > On Sun, 2007-06-17 at 08:26, Yevgeny Kliteynik wrote: >> Hi Hal, >> >> This patch replaces updn_guid_file in the Up/Down routing with >> root_guid_file for Up/Down and Fat-Tree routing, and adds a new >> option - cn_guid_file for Fat-Tree routing. >> OpenSM command line options for these two files are: >> >> '-a' or '--root_guid_file' for roots >> '-u' or '--cn_guid_file' for compute nodes >> >> Signed-off-by: Yevgeny Kliteynik > > This entire patch was rejected when I attempted to apply it. Can you > regenerate it ? Thanks. Indeed, there were changes in osm_subnet.{c,h} since I've issued this patch. Here's the new one: Signed-off-by: Yevgeny Kliteynik --- opensm/include/opensm/osm_subnet.h | 12 +++++++++--- opensm/opensm/main.c | 29 ++++++++++++++++++++++------- opensm/opensm/osm_subnet.c | 25 ++++++++++++++++++------- opensm/opensm/osm_ucast_updn.c | 6 +++--- 4 files changed, 52 insertions(+), 20 deletions(-) diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index b296caf..2ee5689 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -278,7 +278,8 @@ typedef struct _osm_subn_opt char * routing_engine_name; char * lid_matrix_dump_file; char * ucast_dump_file; - char * updn_guid_file; + char * root_guid_file; + char * cn_guid_file; char * sa_db_file; boolean_t exit_on_fatal; boolean_t honor_guid2lid_file; @@ -452,8 +453,13 @@ typedef struct _osm_subn_opt * Name of the unicast routing dump file from where switch * forwarding tables will be loaded * -* updn_guid_file -* Pointer to name of the UPDN guid file given by User +* root_guid_file +* Name of the file that contains list of root guids that +* will be used by fat-tree or up/dn routing (provided by User) +* +* cn_guid_file +* Name of the file that contains list of compute node guids that +* will be used by fat-tree routing (provided by User) * * sa_db_file * Name of the SA database file. diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index 6b4cb4f..d17a994 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -189,8 +189,14 @@ show_usage(void) " This option specifies the name of the SA DB dump file\n" " from where SA database will be loaded.\n\n"); printf ("-a\n" - "--add_guid_file \n" - " Set the root nodes for the Up/Down routing algorithm\n" + "--root_guid_file \n" + " Set the root nodes for the Up/Down or Fat-Tree routing\n" + " algorithm to the guids provided in the given file (one\n" + " to a line)\n" + "\n"); + printf ("-u\n" + "--cn_guid_file \n" + " Set the compute nodes for the Fat-Tree routing algorithm\n" " to the guids provided in the given file (one to a line)\n" "\n"); printf( "-o\n" @@ -585,7 +591,7 @@ main( char *ignore_guids_file_name = NULL; uint32_t val; const char * const short_option = - "i:f:ed:g:l:L:s:t:a:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:"; + "i:f:ed:g:l:L:s:t:a:u:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:"; /* In the array below, the 2nd parameter specifies the number @@ -622,7 +628,8 @@ main( { "lid_matrix_file",1, NULL, 'M'}, { "ucast_file", 1, NULL, 'U'}, { "sadb_file", 1, NULL, 'S'}, - { "add_guid_file", 1, NULL, 'a'}, + { "root_guid_file",1, NULL, 'a'}, + { "cn_guid_file", 1, NULL, 'u'}, { "cache-options", 0, NULL, 'c'}, { "stay_on_fatal", 0, NULL, 'y'}, { "honor_guid2lid",0, NULL, 'x'}, @@ -886,10 +893,18 @@ main( case 'a': /* - Specifies port guids file + Specifies root guids file + */ + opt.root_guid_file = optarg; + printf (" Root Guid File: %s\n", opt.root_guid_file ); + break; + + case 'u': + /* + Specifies compute node guids file */ - opt.updn_guid_file = optarg; - printf (" UPDN Guid File: %s\n", opt.updn_guid_file ); + opt.cn_guid_file = optarg; + printf (" Compute Node Guid File: %s\n", opt.cn_guid_file ); break; case 'c': diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 5a79149..7a223e3 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -502,7 +502,8 @@ osm_subn_set_default_opt( p_opt->routing_engine_name = NULL; p_opt->lid_matrix_dump_file = NULL; p_opt->ucast_dump_file = NULL; - p_opt->updn_guid_file = NULL; + p_opt->root_guid_file = NULL; + p_opt->cn_guid_file = NULL; p_opt->sa_db_file = NULL; p_opt->exit_on_fatal = TRUE; p_opt->enable_quirks = FALSE; @@ -1325,8 +1326,12 @@ osm_subn_parse_conf_file( p_key, p_val, &p_opts->ucast_dump_file); __osm_subn_opts_unpack_charp( - "updn_guid_file", - p_key, p_val, &p_opts->updn_guid_file); + "root_guid_file", + p_key, p_val, &p_opts->root_guid_file); + + __osm_subn_opts_unpack_charp( + "cn_guid_file", + p_key, p_val, &p_opts->cn_guid_file); __osm_subn_opts_unpack_charp( "sa_db_file", @@ -1550,12 +1555,18 @@ osm_subn_write_conf_file( "# Ucast dump file name\n" "ucast_dump_file %s\n\n", p_opts->ucast_dump_file); - if (p_opts->updn_guid_file) + if (p_opts->root_guid_file) + fprintf( opts_file, + "# The file holding the root node guids (for fat-tree or Up/Down)\n" + "# One guid in each line\n" + "root_guid_file %s\n\n", + p_opts->root_guid_file); + if (p_opts->cn_guid_file) fprintf( opts_file, - "# The file holding the Up/Down root node guids\n" + "# The file holding the fat-tree compute node guids\n" "# One guid in each line\n" - "updn_guid_file %s\n\n", - p_opts->updn_guid_file); + "cn_guid_file %s\n\n", + p_opts->cn_guid_file); if (p_opts->sa_db_file) fprintf( opts_file, "# SA database file name\n" diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c index 2448246..af5ee4e 100644 --- a/opensm/opensm/osm_ucast_updn.c +++ b/opensm/opensm/osm_ucast_updn.c @@ -311,10 +311,10 @@ updn_init( Check the source for root node list, if file parse it, otherwise wait for a callback to activate auto detection */ - if (p_osm->subn.opt.updn_guid_file) + if (p_osm->subn.opt.root_guid_file) { status = osm_ucast_mgr_read_guid_file( &p_osm->sm.ucast_mgr, - p_osm->subn.opt.updn_guid_file, + p_osm->subn.opt.root_guid_file, p_updn->p_root_nodes ); if (status != IB_SUCCESS) goto Exit; @@ -323,7 +323,7 @@ updn_init( osm_log( &p_osm->log, OSM_LOG_DEBUG, "updn_init: " "UPDN - Fetching root nodes from file %s\n", - p_osm->subn.opt.updn_guid_file ); + p_osm->subn.opt.root_guid_file ); guid_iterator = cl_list_head(p_updn->p_root_nodes); while( guid_iterator != cl_list_end(p_updn->p_root_nodes) ) { -- 1.5.1.4 From minich at ornl.gov Thu Jun 21 07:52:57 2007 From: minich at ornl.gov (Makia Minich) Date: Thu, 21 Jun 2007 10:52:57 -0400 Subject: [ofa-general] Can't open HCA InfiniHost0 problem In-Reply-To: <9fa3c2e50706210515l5ba18cb1h6eb4718f0749bb21@mail.gmail.com> References: <9fa3c2e50706210515l5ba18cb1h6eb4718f0749bb21@mail.gmail.com> Message-ID: <200706211052.57585.minich@ornl.gov> If you are using the OFED stack (as I'm expecting from the list you used) you need to use the o2ib lnd and not the vib lnd. On Thursday 21 June 2007 8:15:19 am Changer Van wrote: > Hi all, > I got some errors when I performed lctl network up command, > here are some log messages: > > … kernel: LustreError: 12355:0:(viblnd.c:1800:kibnal_startup()) Can't open > HCA InfiniHost0: -256 > but my ib card's hca_id is InfiniHost_III_Ex0, > how to config to look for the hca_id like InfiniHost_III_Ex0? > > Any help would be greatly appreciated. -- Makia Minich National Center for Computation Science Oak Ridge National Laboratory --*-- Imagine no possessions I wonder if you can - John Lennon From yann.kalemkarian at bull.net Thu Jun 21 07:53:14 2007 From: yann.kalemkarian at bull.net (Yann K.) Date: Thu, 21 Jun 2007 16:53:14 +0200 Subject: [ofa-general] [Fwd: [Error] Asynchronous Thread] Message-ID: <467A90DA.1000107@bull.net> -- Yann Kalemkarian HPC Software Engineer Open Software R&D Bull, Architect of an Open World TM Phone: +33 4 7629 7393 www.bull.com -------------- next part -------------- An embedded message was scrubbed... From: "Yann K." Subject: [Error] Asynchronous Thread Date: Thu, 21 Jun 2007 16:50:59 +0200 Size: 1042 URL: From arthur.jones at qlogic.com Thu Jun 21 08:23:12 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Thu, 21 Jun 2007 08:23:12 -0700 Subject: [ofa-general] Re: [PATCH 24/28] IB/ipath - ipath_poll fixups and enhancements In-Reply-To: References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070621152312.GA14817@bauxite.pathscale.com> hi roland, ... On Wed, Jun 20, 2007 at 02:00:27PM -0700, Roland Dreier wrote: > > + tail = *(volatile u64 *)pd->port_rcvhdrtail_kvaddr; > > Why is there a volatile here? cf http://lwn.net/Articles/234017/ > ("volatile considered harmful") from that article: - Pointers to data structures in coherent memory which might be modified by I/O devices can, sometimes, legitimately be volatile. A ring buffer used by a network adapter, where that adapter changes pointers to indicate which descriptors have been processed, is an example of this type of situation. the port_rcvhdrttail_kvaddr is the kernel virtual address allocated in coherent memory where the header queue is updated by the chip. we use volatile to make sure the compiler does not use stale data... arthur From oliver.braun at web.de Thu Jun 21 09:23:28 2007 From: oliver.braun at web.de (Kelley Spence) Date: Thu, 21 Jun 2007 15:23:28 -0100 Subject: [ofa-general] Jetzt bestellen und ein blaues Wunder erleben Message-ID: <01c7b418$1e7ef500$d18780d5@oliver.braun> Die Pille ist ein wahres Gluck, die Vorhaut geht von selbst zuruck! Uberraschen Sie doch Ihre Partnerin! Lust uber zwei Stunden nicht zu kommen? Nie mehr zu fruh kommen! - ohne Rezept - blitzschnelle Lieferung weltweit - diskreter Versand www.mokera.hk Jetzt bestellen - und bis zu 80% sparen -------------- next part -------------- An HTML attachment was scrubbed... URL: From pw at osc.edu Thu Jun 21 08:25:44 2007 From: pw at osc.edu (Pete Wyckoff) Date: Thu, 21 Jun 2007 11:25:44 -0400 Subject: [ofa-general] hang on close in umem_release Message-ID: <20070621152544.GA32474@osc.edu> With 2.6.22-rc5, I get a repeatable D state hang of a user space process upon termination (ctrl-C). x86_64 SMP, no preempt. Here's the sysrq-T trace: app D ffff81003ec17220 0 2841 2780 (NOTLB) ffff81003cec7d78 0000000000000082 ffffffff80227aa0 ffff81003cec7d78 ffff81003ec17220 ffffffff804d8380 000000000002a161 ffff81003ec173d0 0000000000000001 0000000100085088 0000000000000001 ffff81003ff2bb40 Call Trace: [] default_wake_function+0x0/0x10 [] unlock_page+0x2d/0x40 [] __down_write_nested+0x85/0xc0 [] __down_write+0xb/0x10 [] down_write+0x9/0x10 [] :ib_core:ib_umem_release+0x75/0x110 [] :ib_mthca:mthca_free_mr+0x6e/0xe0 [] :ib_mthca:mthca_dereg_mr+0x25/0x40 [] :ib_core:ib_dereg_mr+0x2d/0x40 [] :ib_uverbs:ib_uverbs_close+0x2ac/0x380 [] __fput+0xb3/0x1a0 [] fput+0x16/0x20 [] filp_close+0x4b/0x80 [] sys_close+0x9c/0x100 [] system_call+0x7e/0x83 It should have open an fd for the rdmacm event channel, and an fd for the CQ event channel, but does not have any connected QPs at this point (although it did in the past) and no registered memory regions, although maybe the app forgot to free one? Apparently it is here: /* * We may be called with the mm's mmap_sem already held. This * can happen when a userspace munmap() is the call that drops * the last reference to our file and calls our release * method. If there are memory regions to destroy, we'll end * up here and not be able to take the mmap_sem. In that case * we defer the vm_locked accounting to the system workqueue. */ if (context->closing && !down_write_trylock(&mm->mmap_sem)) { INIT_WORK(&umem->work, ib_umem_account); umem->mm = mm; umem->diff = diff; schedule_work(&umem->work); return; } else down_write(&mm->mmap_sem); stuck in the down_write on mmap_sem. Thus context->closing must not be true. Is this a known problem? Is there some more information I can give you? -- Pete From halr at voltaire.com Thu Jun 21 08:28:11 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Jun 2007 11:28:11 -0400 Subject: [ofa-general] Re: Re: [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: <20070621135120.GH4857@mellanox.co.il> References: <20070621033854.GF8868@mellanox.co.il> <000001c7b3c7$d2fdca20$a3cc180a@amr.corp.intel.com> <20070621065731.GJ8868@mellanox.co.il> <20070621135120.GH4857@mellanox.co.il> Message-ID: <1182439686.15653.410799.camel@hal.voltaire.com> On Thu, 2007-06-21 at 09:51, Michael S. Tsirkin wrote: > > Quoting Michael S. Tsirkin : > > Subject: Re: Re: [PATCH] for-2.6.23 ib/umad: add partition support > > > > > Quoting Sean Hefty : > > > Subject: RE: Re: [PATCH] for-2.6.23 ib/umad: add partition support > > > > > > >This assumes an open file desriptor per-pkey, so the proposed API > > > >extension umad_set_pkey would have to be changed to be per-port rather > > > >than per-mad. But I think this is a better API, too: most apps > > > >likely work within a single partition. > > > > > > I don't think this is true for apps that use the userspace MAD interface (e.g. > > > opensm). > > > > SM (rather, SA) can just open file descriptor per pkey - it created them itself, > > and there's a small number of partitions. > > > > > Beyond that, this approach doesn't work for receiving MADs on different PKeys. > > > > Yes, it does: we just filter out the MADs where pkey does not match. > > > > I think that most other apps (besides SA) should really treat > > each partition as a separate network. So getting MADs for a specific > > pkey, rather than all pkeys, makes total sense to me. > > Hal, could you pls comment on whether this approach will work for opensm? I will answer at the "high" level rather than some of the details discussed in previous postings which we may get back to later. As far as SA is concerned, as all nodes are required to at least support the limited default partition, the SA uses the full default partition for communication. As to other current (and potential) management applications: PerfMgr will want PMA access on all ports on all nodes. It may also be constrained to a similar environment as SA (running on a node which supports the full default partition). If it is not constrained in such a manner, it needs to be on all partitions in the subnet or it will only be able to access a portion of the ports in the subnet. That actually might be a model some might ultimately want. Diagnostics may be happy with a single partition (or likely the set of partitions the end node they are running from reside on). Bottom line is that it can likely work with either model but there are tradeoffs underneath this "high" level which may not have been sufficiently explored/discussed as yet. I'm not sure I like having a different fd per pkey: It's a different model than currently being used and that would cause more changes to consumers (as opposed to the other approach) which aren't a clear win to me (and uses more fds). -- Hal From rdreier at cisco.com Thu Jun 21 08:40:54 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Jun 2007 08:40:54 -0700 Subject: [ofa-general] hang on close in umem_release In-Reply-To: <20070621152544.GA32474@osc.edu> (Pete Wyckoff's message of "Thu, 21 Jun 2007 11:25:44 -0400") References: <20070621152544.GA32474@osc.edu> Message-ID: hmm, I see what seems to be an "i can't believe it ever worked" type bug -- if the context is closing but then we do manage to get the mm's rwsem, it seems like we immediately try to lock it again, which obviously deadlocks. Does this patch fix your problem and look correct? diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index b4aec51..d40652a 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -225,13 +225,15 @@ void ib_umem_release(struct ib_umem *umem) * up here and not be able to take the mmap_sem. In that case * we defer the vm_locked accounting to the system workqueue. */ - if (context->closing && !down_write_trylock(&mm->mmap_sem)) { - INIT_WORK(&umem->work, ib_umem_account); - umem->mm = mm; - umem->diff = diff; - - schedule_work(&umem->work); - return; + if (context->closing) { + if (!down_write_trylock(&mm->mmap_sem)) { + INIT_WORK(&umem->work, ib_umem_account); + umem->mm = mm; + umem->diff = diff; + + schedule_work(&umem->work); + return; + } } else down_write(&mm->mmap_sem); From arthur.jones at qlogic.com Thu Jun 21 08:50:05 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Thu, 21 Jun 2007 08:50:05 -0700 Subject: [ofa-general] Re: [PATCH] IB/ipath -- changes in for-roland for 2.6.23 In-Reply-To: References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070621155005.GB14817@bauxite.pathscale.com> hi roland, ... On Wed, Jun 20, 2007 at 01:47:55PM -0700, Roland Dreier wrote: > [...] > But I don't see a MAINTAINERS update (it still lists Bryan O'Sullivan, > support at pathscale.com and openib.org for the ipath driver). Also I > don't see fixes for the smp_mb__after_clear_bit bug pointed out by > BenH or the bug of setting both _PAGE_NO_CACHE and _PAGE_WRITETHRU on > powerpc pointed out by paulus. ok, thanks for the reminder, i've opened an internal bug for the first issue (MAINTAINERS), we should have a fix for that soon. the second issue (smp_mb__after_clear_bit) has an internal bug open. we don't have a fix yet, but we're working on it (we may be able to remove all that code). the final issue (powerpc) has an internal bug open, but hasn't seen any attention for awhile. i'll see if i can prod the right people into looking at it... arthur From rdreier at cisco.com Thu Jun 21 09:50:35 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Jun 2007 09:50:35 -0700 Subject: [ofa-general] backups In-Reply-To: <795c49870706201132r1f7633f8r2cf3cb2a71edc6e0@mail.gmail.com> (Jeff Becker's message of "Wed, 20 Jun 2007 11:32:03 -0700") References: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com> <795c49870706201132r1f7633f8r2cf3cb2a71edc6e0@mail.gmail.com> Message-ID: > I'm backing up /data/pub/scm. A quick "du -chL" shows it to be 4.2G. > Perhaps I only need to backup a subset of /data/pub/scm? Thanks. Looks like there is plenty of excess stuff there... eg /data/pub/scm/~mst/linux-2.6 seems to be an partially unpacked non-naked linux kernel repository (just picking on mst because /data/pub/scm/~mst is 880M). We could probably save a lot of space just keeping on packed copy of Linus's repository and having all other kernel trees use alternates to point to the objects there. OTOH it's not work making people spend a lot of effort to clean up too much, given how cheap disk space is. - R. From sean.hefty at intel.com Thu Jun 21 10:01:33 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 21 Jun 2007 10:01:33 -0700 Subject: [ofa-general] RE: [PATCH] remove confusing code from udaddy In-Reply-To: Message-ID: <000001c7b425$d2796830$ff0da8c0@amr.corp.intel.com> thanks - applied From pw at osc.edu Thu Jun 21 10:34:17 2007 From: pw at osc.edu (Pete Wyckoff) Date: Thu, 21 Jun 2007 13:34:17 -0400 Subject: [ofa-general] hang on close in umem_release In-Reply-To: References: <20070621152544.GA32474@osc.edu> Message-ID: <20070621173417.GA32573@osc.edu> rdreier at cisco.com wrote on Thu, 21 Jun 2007 08:40 -0700: > hmm, I see what seems to be an "i can't believe it ever worked" type > bug -- if the context is closing but then we do manage to get the mm's > rwsem, it seems like we immediately try to lock it again, which > obviously deadlocks. > > Does this patch fix your problem and look correct? Looks obviously correct and tests okay. Ctrl-c in any situation does the right thing now. Before your refactoring of ib_umem, the older version of ib_umem_release_on_close() did not have this trylock optimization. This new buggy code appears not to have shown up in any releases yet, fortunately. Thanks for the quick fix. -- Pete From halr at voltaire.com Thu Jun 21 10:40:02 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Jun 2007 13:40:02 -0400 Subject: [ofa-general] Re: [PATCHv2] osm: adding root_guid_file and cn_guid_file OpenSM options In-Reply-To: <467A8FFF.2040207@dev.mellanox.co.il> References: <4675285A.6060309@dev.mellanox.co.il> <1182433144.15653.403468.camel@hal.voltaire.com> <467A8FFF.2040207@dev.mellanox.co.il> Message-ID: <1182447578.15653.419478.camel@hal.voltaire.com> Hi Yevgeny, On Thu, 2007-06-21 at 10:49, Yevgeny Kliteynik wrote: > Hi Hal, > > Hal Rosenstock wrote: > > Hi Yevgeny, > > > > On Sun, 2007-06-17 at 08:26, Yevgeny Kliteynik wrote: > >> Hi Hal, > >> > >> This patch replaces updn_guid_file in the Up/Down routing with > >> root_guid_file for Up/Down and Fat-Tree routing, and adds a new > >> option - cn_guid_file for Fat-Tree routing. > >> OpenSM command line options for these two files are: > >> > >> '-a' or '--root_guid_file' for roots > >> '-u' or '--cn_guid_file' for compute nodes > >> > >> Signed-off-by: Yevgeny Kliteynik > > > > This entire patch was rejected when I attempted to apply it. Can you > > regenerate it ? Thanks. > > Indeed, there were changes in osm_subnet.{c,h} since I've issued this patch. That wasn't the problem. > Here's the new one: This one was rejected too. I hand applied it so please double check it. Also, I updated the opensm man page for these options. Thanks. -- Hal > Signed-off-by: Yevgeny Kliteynik > --- > opensm/include/opensm/osm_subnet.h | 12 +++++++++--- > opensm/opensm/main.c | 29 ++++++++++++++++++++++------- > opensm/opensm/osm_subnet.c | 25 ++++++++++++++++++------- > opensm/opensm/osm_ucast_updn.c | 6 +++--- > 4 files changed, 52 insertions(+), 20 deletions(-) > > diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h > index b296caf..2ee5689 100644 > --- a/opensm/include/opensm/osm_subnet.h > +++ b/opensm/include/opensm/osm_subnet.h > @@ -278,7 +278,8 @@ typedef struct _osm_subn_opt > char * routing_engine_name; > char * lid_matrix_dump_file; > char * ucast_dump_file; > - char * updn_guid_file; > + char * root_guid_file; > + char * cn_guid_file; > char * sa_db_file; > boolean_t exit_on_fatal; > boolean_t honor_guid2lid_file; > @@ -452,8 +453,13 @@ typedef struct _osm_subn_opt > * Name of the unicast routing dump file from where switch > * forwarding tables will be loaded > * > -* updn_guid_file > -* Pointer to name of the UPDN guid file given by User > +* root_guid_file > +* Name of the file that contains list of root guids that > +* will be used by fat-tree or up/dn routing (provided by User) > +* > +* cn_guid_file > +* Name of the file that contains list of compute node guids that > +* will be used by fat-tree routing (provided by User) > * > * sa_db_file > * Name of the SA database file. > diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c > index 6b4cb4f..d17a994 100644 > --- a/opensm/opensm/main.c > +++ b/opensm/opensm/main.c > @@ -189,8 +189,14 @@ show_usage(void) > " This option specifies the name of the SA DB dump file\n" > " from where SA database will be loaded.\n\n"); > printf ("-a\n" > - "--add_guid_file \n" > - " Set the root nodes for the Up/Down routing algorithm\n" > + "--root_guid_file \n" > + " Set the root nodes for the Up/Down or Fat-Tree routing\n" > + " algorithm to the guids provided in the given file (one\n" > + " to a line)\n" > + "\n"); > + printf ("-u\n" > + "--cn_guid_file \n" > + " Set the compute nodes for the Fat-Tree routing algorithm\n" > " to the guids provided in the given file (one to a line)\n" > "\n"); > printf( "-o\n" > @@ -585,7 +591,7 @@ main( > char *ignore_guids_file_name = NULL; > uint32_t val; > const char * const short_option = > - "i:f:ed:g:l:L:s:t:a:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:"; > + "i:f:ed:g:l:L:s:t:a:u:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:"; > > /* > In the array below, the 2nd parameter specifies the number > @@ -622,7 +628,8 @@ main( > { "lid_matrix_file",1, NULL, 'M'}, > { "ucast_file", 1, NULL, 'U'}, > { "sadb_file", 1, NULL, 'S'}, > - { "add_guid_file", 1, NULL, 'a'}, > + { "root_guid_file",1, NULL, 'a'}, > + { "cn_guid_file", 1, NULL, 'u'}, > { "cache-options", 0, NULL, 'c'}, > { "stay_on_fatal", 0, NULL, 'y'}, > { "honor_guid2lid",0, NULL, 'x'}, > @@ -886,10 +893,18 @@ main( > > case 'a': > /* > - Specifies port guids file > + Specifies root guids file > + */ > + opt.root_guid_file = optarg; > + printf (" Root Guid File: %s\n", opt.root_guid_file ); > + break; > + > + case 'u': > + /* > + Specifies compute node guids file > */ > - opt.updn_guid_file = optarg; > - printf (" UPDN Guid File: %s\n", opt.updn_guid_file ); > + opt.cn_guid_file = optarg; > + printf (" Compute Node Guid File: %s\n", opt.cn_guid_file ); > break; > > case 'c': > diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c > index 5a79149..7a223e3 100644 > --- a/opensm/opensm/osm_subnet.c > +++ b/opensm/opensm/osm_subnet.c > @@ -502,7 +502,8 @@ osm_subn_set_default_opt( > p_opt->routing_engine_name = NULL; > p_opt->lid_matrix_dump_file = NULL; > p_opt->ucast_dump_file = NULL; > - p_opt->updn_guid_file = NULL; > + p_opt->root_guid_file = NULL; > + p_opt->cn_guid_file = NULL; > p_opt->sa_db_file = NULL; > p_opt->exit_on_fatal = TRUE; > p_opt->enable_quirks = FALSE; > @@ -1325,8 +1326,12 @@ osm_subn_parse_conf_file( > p_key, p_val, &p_opts->ucast_dump_file); > > __osm_subn_opts_unpack_charp( > - "updn_guid_file", > - p_key, p_val, &p_opts->updn_guid_file); > + "root_guid_file", > + p_key, p_val, &p_opts->root_guid_file); > + > + __osm_subn_opts_unpack_charp( > + "cn_guid_file", > + p_key, p_val, &p_opts->cn_guid_file); > > __osm_subn_opts_unpack_charp( > "sa_db_file", > @@ -1550,12 +1555,18 @@ osm_subn_write_conf_file( > "# Ucast dump file name\n" > "ucast_dump_file %s\n\n", > p_opts->ucast_dump_file); > - if (p_opts->updn_guid_file) > + if (p_opts->root_guid_file) > + fprintf( opts_file, > + "# The file holding the root node guids (for fat-tree or Up/Down)\n" > + "# One guid in each line\n" > + "root_guid_file %s\n\n", > + p_opts->root_guid_file); > + if (p_opts->cn_guid_file) > fprintf( opts_file, > - "# The file holding the Up/Down root node guids\n" > + "# The file holding the fat-tree compute node guids\n" > "# One guid in each line\n" > - "updn_guid_file %s\n\n", > - p_opts->updn_guid_file); > + "cn_guid_file %s\n\n", > + p_opts->cn_guid_file); > if (p_opts->sa_db_file) > fprintf( opts_file, > "# SA database file name\n" > diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c > index 2448246..af5ee4e 100644 > --- a/opensm/opensm/osm_ucast_updn.c > +++ b/opensm/opensm/osm_ucast_updn.c > @@ -311,10 +311,10 @@ updn_init( > Check the source for root node list, if file parse it, otherwise > wait for a callback to activate auto detection > */ > - if (p_osm->subn.opt.updn_guid_file) > + if (p_osm->subn.opt.root_guid_file) > { > status = osm_ucast_mgr_read_guid_file( &p_osm->sm.ucast_mgr, > - p_osm->subn.opt.updn_guid_file, > + p_osm->subn.opt.root_guid_file, > p_updn->p_root_nodes ); > if (status != IB_SUCCESS) > goto Exit; > @@ -323,7 +323,7 @@ updn_init( > osm_log( &p_osm->log, OSM_LOG_DEBUG, > "updn_init: " > "UPDN - Fetching root nodes from file %s\n", > - p_osm->subn.opt.updn_guid_file ); > + p_osm->subn.opt.root_guid_file ); > guid_iterator = cl_list_head(p_updn->p_root_nodes); > while( guid_iterator != cl_list_end(p_updn->p_root_nodes) ) > { From halr at voltaire.com Thu Jun 21 10:43:32 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Jun 2007 13:43:32 -0400 Subject: [ofa-general] Re: [PATCH] osm: cosmetics in ftree - added get_guid functions for switch and hca In-Reply-To: <4678DA83.2050700@dev.mellanox.co.il> References: <4678DA83.2050700@dev.mellanox.co.il> Message-ID: <1182447627.15653.419564.camel@hal.voltaire.com> Hi again Yevgeny, On Wed, 2007-06-20 at 03:42, Yevgeny Kliteynik wrote: > Hi Hal, > > Cosmetic code changes in fat-tree: > added get_guid_ho and get_guid_no functions for switches and hca's > > -- Yevgeny > > Signed-off-by: Yevgeny Kliteynik This patch won't apply either. I'm not sure I want to hand edit these changes in. Can you try it and see if it works for you ? Thanks. -- Hal > --- > opensm/opensm/osm_ucast_ftree.c | 77 +++++++++++++++++++++++++++++---------- > 1 files changed, 58 insertions(+), 19 deletions(-) > > diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c > index 1ead199..1ae8b29 100644 > --- a/opensm/opensm/osm_ucast_ftree.c > +++ b/opensm/opensm/osm_ucast_ftree.c > @@ -640,6 +640,26 @@ __osm_ftree_sw_destroy( > > /***************************************************/ > > +static uint64_t > +__osm_ftree_sw_get_guid_no( > + IN ftree_sw_t * p_sw) > +{ > + if (!p_sw) > + return 0; > + return osm_node_get_node_guid(p_sw->p_osm_sw->p_node); > +} > + > +/***************************************************/ > + > +static uint64_t > +__osm_ftree_sw_get_guid_ho( > + IN ftree_sw_t * p_sw) > +{ > + return cl_ntoh64(__osm_ftree_sw_get_guid_no(p_sw)); > +} > + > +/***************************************************/ > + > static void > __osm_ftree_sw_dump( > IN ftree_fabric_t * p_ftree, > @@ -657,7 +677,7 @@ __osm_ftree_sw_dump( > "__osm_ftree_sw_dump: " > "Switch index: %s, GUID: 0x%016" PRIx64 ", Ports: %u DOWN, %u UP\n", > __osm_ftree_tuple_to_str(p_sw->tuple), > - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), > + __osm_ftree_sw_get_guid_ho(p_sw), > p_sw->down_port_groups_num, > p_sw->up_port_groups_num); > > @@ -835,6 +855,26 @@ __osm_ftree_hca_destroy( > > /***************************************************/ > > +static uint64_t > +__osm_ftree_hca_get_guid_no( > + IN ftree_hca_t * p_hca) > +{ > + if (!p_hca) > + return 0; > + return osm_node_get_node_guid(p_hca->p_osm_node); > +} > + > +/***************************************************/ > + > +static uint64_t > +__osm_ftree_hca_get_guid_ho( > + IN ftree_hca_t * p_hca) > +{ > + return cl_ntoh64(__osm_ftree_hca_get_guid_no(p_hca)); > +} > + > +/***************************************************/ > + > static void > __osm_ftree_hca_dump( > IN ftree_fabric_t * p_ftree, > @@ -851,7 +891,7 @@ __osm_ftree_hca_dump( > osm_log(&p_ftree->p_osm->log, OSM_LOG_DEBUG, > "__osm_ftree_hca_dump: " > "CA GUID: 0x%016" PRIx64 ", Ports: %u UP\n", > - cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)), > + __osm_ftree_hca_get_guid_ho(p_hca), > p_hca->up_port_groups_num); > > for( i = 0; i < p_hca->up_port_groups_num; i++ ) > @@ -1214,7 +1254,7 @@ __osm_ftree_fabric_dump_general_info( > osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE, > "__osm_ftree_fabric_dump_general_info: " > " GUID: 0x%016" PRIx64 ", LID: 0x%x, Index %s\n", > - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), > + __osm_ftree_sw_get_guid_ho(p_sw), > cl_ntoh16(p_sw->base_lid), > __osm_ftree_tuple_to_str(p_sw->tuple)); > } > @@ -1227,8 +1267,7 @@ __osm_ftree_fabric_dump_general_info( > osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE, > "__osm_ftree_fabric_dump_general_info: " > " GUID: 0x%016" PRIx64 ", LID: 0x%x, Index %s\n", > - cl_ntoh64(osm_node_get_node_guid( > - p_ftree->leaf_switches[i]->p_osm_sw->p_node)), > + __osm_ftree_sw_get_guid_ho(p_ftree->leaf_switches[i]), > cl_ntoh16(p_ftree->leaf_switches[i]->base_lid), > __osm_ftree_tuple_to_str(p_ftree->leaf_switches[i]->tuple)); > } > @@ -1442,7 +1481,7 @@ __osm_ftree_fabric_make_indexing( > p_sw->rank, > __osm_ftree_tuple_to_str(p_sw->tuple), > cl_ntoh16(p_sw->base_lid), > - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node))); > + __osm_ftree_sw_get_guid_ho(p_sw)); > > /* > * Now run BFS and assign indexes to all switches > @@ -1617,11 +1656,11 @@ __osm_ftree_fabric_validate_topology( > "ERR AB09: Different number of upward port groups on switches:\n" > " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u groups\n" > " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u groups\n", > - cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)), > + __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]), > cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid), > __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple), > reference_sw_arr[p_sw->rank]->up_port_groups_num, > - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), > + __osm_ftree_sw_get_guid_ho(p_sw), > cl_ntoh16(p_sw->base_lid), > __osm_ftree_tuple_to_str(p_sw->tuple), > p_sw->up_port_groups_num); > @@ -1638,11 +1677,11 @@ __osm_ftree_fabric_validate_topology( > "ERR AB0A: Different number of downward port groups on switches:\n" > " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u port groups\n" > " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u port groups\n", > - cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)), > + __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]), > cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid), > __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple), > reference_sw_arr[p_sw->rank]->down_port_groups_num, > - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), > + __osm_ftree_sw_get_guid_ho(p_sw), > cl_ntoh16(p_sw->base_lid), > __osm_ftree_tuple_to_str(p_sw->tuple), > p_sw->down_port_groups_num); > @@ -1663,11 +1702,11 @@ __osm_ftree_fabric_validate_topology( > "ERR AB0B: Different number of ports in an upward port group on switches:\n" > " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n" > " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n", > - cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)), > + __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]), > cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid), > __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple), > cl_ptr_vector_get_size(&p_ref_group->ports), > - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), > + __osm_ftree_sw_get_guid_ho(p_sw), > cl_ntoh16(p_sw->base_lid), > __osm_ftree_tuple_to_str(p_sw->tuple), > cl_ptr_vector_get_size(&p_group->ports)); > @@ -1691,11 +1730,11 @@ __osm_ftree_fabric_validate_topology( > "ERR AB0C: Different number of ports in an downward port group on switches:\n" > " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n" > " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n", > - cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)), > + __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]), > cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid), > __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple), > cl_ptr_vector_get_size(&p_ref_group->ports), > - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), > + __osm_ftree_sw_get_guid_ho(p_sw), > cl_ntoh16(p_sw->base_lid), > __osm_ftree_tuple_to_str(p_sw->tuple), > cl_ptr_vector_get_size(&p_group->ports)); > @@ -2508,7 +2547,7 @@ __osm_ftree_rank_leaf_switches( > "__osm_ftree_rank_leaf_switches: ERR AB0F: " > "CA conected directly to another CA: " > "0x%016" PRIx64 " <---> 0x%016" PRIx64 "\n", > - cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)), > + __osm_ftree_hca_get_guid_ho(p_hca), > cl_ntoh64(osm_node_get_node_guid(p_remote_osm_node))); > res = -1; > goto Exit; > @@ -2548,8 +2587,8 @@ __osm_ftree_rank_leaf_switches( > " - CA guid : 0x%016" PRIx64 "\n" > " - Switch guid: 0x%016" PRIx64 "\n" > " - Switch LID : 0x%x\n", > - cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)), > - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), > + __osm_ftree_hca_get_guid_ho(p_hca), > + __osm_ftree_sw_get_guid_ho(p_sw), > cl_ntoh16(p_sw->base_lid)); > cl_list_insert_tail(p_ranking_bfs_list, > &__osm_ftree_sw_tbl_element_create(p_sw)->map_item); > @@ -2740,10 +2779,10 @@ __osm_ftree_fabric_construct_sw_ports( > " GUID 0x%016" PRIx64 ", LID 0x%x, rank %u\n", > p_sw->rank, > p_remote_sw->rank, > - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), > + __osm_ftree_sw_get_guid_ho(p_sw), > cl_ntoh16(p_sw->base_lid), > p_sw->rank, > - cl_ntoh64(osm_node_get_node_guid(p_remote_sw->p_osm_sw->p_node)), > + __osm_ftree_sw_get_guid_ho(p_remote_sw), > cl_ntoh16(p_remote_sw->base_lid), > p_remote_sw->rank); > res = -1; From rdreier at cisco.com Thu Jun 21 11:01:46 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Jun 2007 11:01:46 -0700 Subject: [ofa-general] hang on close in umem_release In-Reply-To: <20070621173417.GA32573@osc.edu> (Pete Wyckoff's message of "Thu, 21 Jun 2007 13:34:17 -0400") References: <20070621152544.GA32474@osc.edu> <20070621173417.GA32573@osc.edu> Message-ID: > Looks obviously correct and tests okay. Ctrl-c in any situation > does the right thing now. Before your refactoring of ib_umem, the > older version of ib_umem_release_on_close() did not have this > trylock optimization. This new buggy code appears not to have shown > up in any releases yet, fortunately. Thanks, I will add it to my queue of things to get Linus to pull soon. It is true that this was introduced by my recent refactoring -- a silly careless mistake. - R. From rdreier at cisco.com Thu Jun 21 11:14:23 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Jun 2007 11:14:23 -0700 Subject: [ofa-general] Re: [PATCH 24/28] IB/ipath - ipath_poll fixups and enhancements In-Reply-To: <20070621152312.GA14817@bauxite.pathscale.com> (Arthur Jones's message of "Thu, 21 Jun 2007 08:23:12 -0700") References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com> <20070621152312.GA14817@bauxite.pathscale.com> Message-ID: > the port_rcvhdrttail_kvaddr is the kernel virtual address > allocated in coherent memory where the header queue is updated > by the chip. we use volatile to make sure the compiler does > not use stale data... OK, fair enough, although it seems you may be missing some memory barriers to make sure you don't run into the CPU reordering accesses to the head/tail pointers. - R. From jeff at splitrockpr.com Thu Jun 21 11:15:32 2007 From: jeff at splitrockpr.com (Jeffrey Scott) Date: Thu, 21 Jun 2007 11:15:32 -0700 Subject: [ofa-general] request for OFA newsletter content Message-ID: <97FBC79001FB45E1A85AA08BFAA282E1@Gaucho> All- The first installment of OFA's quarterly newsletter will be distributed in the next 2-3 weeks. Content is due to me by June 28. Although we are starting out quarterly, we may eventually distribute the newsletter more frequently, depending on feedback. The newsletter is designed to keep the OFA community updated on the latest OFA news, information, events and development progress. Of course, you should feel free to forward the newsletter to anyone outside the OFA community. We have already approached the Working Group chairs about providing content for the first issue. However, the newsletter is open to everyone in the OFA community. If any community member would like to submit content, we strongly encourage you to do so. Broad involvement will help make the newsletter more valuable. Just send me your name, contact information, topic and a brief 1-2 paragraph "article" about any project you're working on, issues that you're concerned about, events that you're participating in, or anything else on your mind. Please do NOT give us content that promotes companies or products. The newsletter is all about the OFA. Thanks! Jeff ----------------------------------- Jeffrey Scott Split Rock Communications 408-884-4017 202-903-6057 Mobile 408-884-3900 Fax www.SplitRockPR.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at dev.mellanox.co.il Thu Jun 21 11:47:38 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 21 Jun 2007 21:47:38 +0300 Subject: [ofa-general] Re: backups In-Reply-To: References: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com> <795c49870706201132r1f7633f8r2cf3cb2a71edc6e0@mail.gmail.com> Message-ID: <20070621184738.GI4857@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: backups > > > I'm backing up /data/pub/scm. A quick "du -chL" shows it to be 4.2G. > > Perhaps I only need to backup a subset of /data/pub/scm? Thanks. > > Looks like there is plenty of excess stuff there... eg > /data/pub/scm/~mst/linux-2.6 seems to be an partially unpacked > non-naked linux kernel repository (just picking on mst because > /data/pub/scm/~mst is 880M). OK, I killed the files themselves and I've run git repack there, this seems to have freed up some 200M. The repo itself has some of my development bits though. > We could probably save a lot of space > just keeping on packed copy of Linus's repository and having all other > kernel trees use alternates to point to the objects there. Since we really want to save *backup* space, a better strategy would be to use git clone instead of plain cp, and use alternates and aggressive packing there. > OTOH it's not work making people spend a lot of effort to clean up too > much, given how cheap disk space is. Right. My cell phone has 1G flash storage. -- MST From rdreier at cisco.com Thu Jun 21 11:51:55 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Jun 2007 11:51:55 -0700 Subject: [ofa-general] Stringify ibv_event_type In-Reply-To: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com> (Jeff Squyres's message of "Thu, 21 Jun 2007 10:09:23 -0400") References: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com> Message-ID: > Could a function to stringify the ibv_event_type enum can be added to > libibverbs? It could be similar to the event_name_str() function in > libibverbs/examples/asyncwatch.c: Seems reasonable. I guess if you have that, then you probably want strings for enum ibv_wc_status too. Any other enums you would want to stringify? Also, I think this could be added to the libibverbs 1.1 stable line, since it's a completely new API, and easy to test for with autoconf, right? - R. From mst at dev.mellanox.co.il Thu Jun 21 11:55:33 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 21 Jun 2007 21:55:33 +0300 Subject: [ofa-general] Re: Stringify ibv_event_type In-Reply-To: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com> References: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com> Message-ID: <20070621185533.GJ4857@mellanox.co.il> > Quoting Jeff Squyres : > Subject: Stringify ibv_event_type > > Could a function to stringify the ibv_event_type enum can be added to > libibverbs? It could be similar to the event_name_str() function in > libibverbs/examples/asyncwatch.c: > > ----- > static const char *event_name_str(enum ibv_event_type event_type) > { > switch (event_type) { > case IBV_EVENT_DEVICE_FATAL: > return "IBV_EVENT_DEVICE_FATAL"; > ...etc. > ----- > > Rationale: if multiple client apps (such as the OF-based MPI > implementations) start using the asynch events and there is no > central function for string-ifying the event enum, they'll all end up > doing the translation themselves when printing out error messages. > It's not a huge amount of code, but it does seem kinda odd to make > everyone replicate essentially the same stuff. Additionally, the > available enum values may grow over time, forcing client apps to > figure out which ones are available and adjust their event_name_str() > equivalent as appropriate. Hiding the possibility of change down in > libibverbs seems appropriate. I have no strong opinion either way, but I do wonder why do you find this useful? Asyncwatch is just an example: it does not actually *do anything* on an event, so it calls printf. But, is it likely that enduser really needs to see IBV_EVENT_CLIENT_REREGISTER? Printing out the numerc value seems sufficient for debug. -- MST From jsquyres at cisco.com Thu Jun 21 11:59:47 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 21 Jun 2007 14:59:47 -0400 Subject: [ofa-general] Stringify ibv_event_type In-Reply-To: References: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com> Message-ID: <7AAF2612-2BDA-4551-9E4C-B5FA0ED490CF@cisco.com> On Jun 21, 2007, at 2:51 PM, Roland Dreier wrote: >> Could a function to stringify the ibv_event_type enum can be added to >> libibverbs? It could be similar to the event_name_str() function in >> libibverbs/examples/asyncwatch.c: > > Seems reasonable. I guess if you have that, then you probably want > strings for enum ibv_wc_status too. Any other enums you would want to > stringify? I think those 2 would be great. > Also, I think this could be added to the libibverbs 1.1 stable line, > since it's a completely new API, and easy to test for with > autoconf, right? Perfect. -- Jeff Squyres Cisco Systems From DavidRobb at comsci.co.uk Thu Jun 21 12:00:38 2007 From: DavidRobb at comsci.co.uk (David Robb) Date: Thu, 21 Jun 2007 20:00:38 +0100 Subject: [ofa-general] Infiniband Problems Message-ID: <467ACAD6.8000304@comsci.co.uk> Hi Folks, I have inherited responsibility for the comms subsystem on a 28 node high performance signal processing cluster inter connected with Infiniband. Being new to this technology, I have been trying to read and learn as much as possible but am having a few specific problems. Any help or pointers in the right direction would be greatly appreciated. 1. Sometimes observe RDMA data transfer stalls of ~ 1.0 second I have written an RDMA transfer unit test that transfers 10000 packets from one node to another and times the performance. Mostly this happens with a loop iteration of the order of 30uS, but occasionally, I observe times of 500,000 to 1,100,000uS for one packet. I don't think it's a problem with our queuing layer ( If I remove the call to ibv_post_send(...) then no stall is observed). I don't think it is a problem with the CPU stalling as I created a separate worker thread that does something else and times the loop and this does not exhibit any stalls. Any suggestions where to look next? 2. Creation of a Queue Pair is rejected when I have mapped a region of memory greater than about 1.35GB. Ideally, we would like the to be able to write anywhere within a 2GB (or larger) shared memory segment. However, when I attempt to do this, the call to fails with REJ. Further reading around the subject, suggests that this may be due to the VPTT (Virtual to Physical Translation Table) resources required for mapping such a large memory area. Can anyone confirm this hypothesis? Even if we get this to work, will we suffer performance problems by using such a large memory area? Are there any workarounds? Many thanks, David Robb Device and Environment Information follows:- OS Kernel bash-3.00$ uname -a Linux qinetiq01 2.6.20.1-clustervision-142_cvos #1 SMP Tue Mar 6 00:19:24 GMT 2007 x86_64 x86_64 x86_64 GNU/Linux OFED library version 1.1 ibv_devinfo -v output:- hca_id: mthca0 fw_ver: 1.1.0 node_guid: 0002:c902:0023:a1d8 sys_image_guid: 0002:c902:0023:a1db vendor_id: 0x02c9 vendor_part_id: 25204 hw_ver: 0xA0 board_id: MT_03B0140002 phys_port_cnt: 1 max_mr_size: 0xffffffffffffffff page_size_cap: 0xfffff000 max_qp: 64512 max_qp_wr: 16384 device_cap_flags: 0x00001c76 max_sge: 30 max_sge_rd: 0 max_cq: 65408 max_cqe: 131071 max_mr: 131056 max_pd: 32764 max_qp_rd_atom: 4 max_ee_rd_atom: 0 max_res_rd_atom: 258048 max_qp_init_rd_atom: 128 max_ee_init_rd_atom: 0 atomic_cap: ATOMIC_HCA (1) max_ee: 0 max_rdd: 0 max_mw: 0 max_raw_ipv6_qp: 0 max_raw_ethy_qp: 0 max_mcast_grp: 8192 max_mcast_qp_attach: 8 max_total_mcast_qp_attach: 65536 max_ah: 0 max_fmr: 0 max_srq: 960 max_srq_wr: 16384 max_srq_sge: 30 max_pkeys: 64 local_ca_ack_delay: 15 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 1 port_lid: 1 port_lmc: 0x00 max_msg_sz: 0x80000000 port_cap_flags: 0x02510a6a max_vl_num: 3 bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 64 gid_tbl_len: 32 subnet_timeout: 18 init_type_reply: 0 active_width: 4X (2) active_speed: 5.0 Gbps (2) phys_state: LINK_UP (5) GID[ 0]: fe80:0000:0000:0000:0002:c902:0023:a1d9 Switches are "MT47396 Infiniscale-III Mellanox Technologies From swise at opengridcomputing.com Thu Jun 21 12:07:17 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 21 Jun 2007 14:07:17 -0500 Subject: [ofa-general] Stringify ibv_event_type In-Reply-To: References: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com> Message-ID: <467ACC65.4020106@opengridcomputing.com> Roland Dreier wrote: > > Could a function to stringify the ibv_event_type enum can be added to > > libibverbs? It could be similar to the event_name_str() function in > > libibverbs/examples/asyncwatch.c: > > Seems reasonable. I guess if you have that, then you probably want > strings for enum ibv_wc_status too. Any other enums you would want to > stringify? > the rdmacm stuff too! From mst at dev.mellanox.co.il Thu Jun 21 12:07:52 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Thu, 21 Jun 2007 22:07:52 +0300 Subject: [ofa-general] Re: Infiniband Problems In-Reply-To: <467ACAD6.8000304@comsci.co.uk> References: <467ACAD6.8000304@comsci.co.uk> Message-ID: <20070621190752.GK4857@mellanox.co.il> > ibv_devinfo -v output:- > hca_id: mthca0 > fw_ver: 1.1.0 I might make sense to upgrade to 1.2.0, there's a chance some speed issues are fixed there. http://www.mellanox.com/support/firmware_table_IH3Lx.php -- MST From rdreier at cisco.com Thu Jun 21 12:08:48 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Jun 2007 12:08:48 -0700 Subject: [ofa-general] Stringify ibv_event_type In-Reply-To: <467ACC65.4020106@opengridcomputing.com> (Steve Wise's message of "Thu, 21 Jun 2007 14:07:17 -0500") References: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com> <467ACC65.4020106@opengridcomputing.com> Message-ID: > the rdmacm stuff too! which stuff is that? Is it from librdmacm? If so that's a different package and therefore a different change to make. From rdreier at cisco.com Thu Jun 21 12:12:53 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Jun 2007 12:12:53 -0700 Subject: [ofa-general] Infiniband Problems In-Reply-To: <467ACAD6.8000304@comsci.co.uk> (David Robb's message of "Thu, 21 Jun 2007 20:00:38 +0100") References: <467ACAD6.8000304@comsci.co.uk> Message-ID: > 1. Sometimes observe RDMA data transfer stalls of ~ 1.0 second Could it be an RNR NAK? You didn't really describe your protocol, but if you use send operations and if you do a send without a matching receive on the other side, then you might end up stalling the QP for a while. > 2. Creation of a Queue Pair is rejected when I have mapped a region of > memory greater than about 1.35GB. I don't really understand this problem. Are you able to map more memory, and then ibv_create_qp() fails if you do? Later you say > Ideally, we would like the to be able to write anywhere within a 2GB > (or larger) shared memory segment. However, when I attempt to do this, > the call to fails with REJ. You didn't say which call fails with REJ, and I'm not even sure I understand what it means to "fail with REJ". On x86-64, the limit on how much memory you can register should be much higher, closer to 32 GB by default. - R. From DavidRobb at comsci.co.uk Thu Jun 21 12:37:41 2007 From: DavidRobb at comsci.co.uk (David Robb) Date: Thu, 21 Jun 2007 20:37:41 +0100 Subject: [ofa-general] Infiniband Problems In-Reply-To: References: <467ACAD6.8000304@comsci.co.uk> Message-ID: <467AD385.3040500@comsci.co.uk> Roland Dreier wrote: > > 1. Sometimes observe RDMA data transfer stalls of ~ 1.0 second > > Could it be an RNR NAK? You didn't really describe your protocol, but > if you use send operations and if you do a send without a matching > receive on the other side, then you might end up stalling the QP for a > while. > Quite possibly, we are using an IBV_QPT_RC transport type. The code simply adds another work request with ibv_post_srq_recv(...) after each packet is processed. Am I correct in thinking it should start out with a stack of work requests in case another packet arrives before the current one has been processed? > > 2. Creation of a Queue Pair is rejected when I have mapped a region of > > memory greater than about 1.35GB. > > I don't really understand this problem. Are you able to map more > memory, and then ibv_create_qp() fails if you do? Later you say > > > Ideally, we would like the to be able to write anywhere within a 2GB > > (or larger) shared memory segment. However, when I attempt to do this, > > the call to fails with REJ. > > You didn't say which call fails with REJ, and I'm not even sure I > understand what it means to "fail with REJ". > Sorry, I meant to look up in my source code which call was failing but forgot to paste it into the question. Yes, I can map 2GB of memory but the call to ibv_create_qp() fails with REJ > On x86-64, the limit on how much memory you can register should be > much higher, closer to 32 GB by default. > That's reassuring. Are there any performance penalties for mapping a larger region than a smaller region? > - R. > Many thanks for the speedy response. David Robb From DavidRobb at comsci.co.uk Thu Jun 21 12:42:07 2007 From: DavidRobb at comsci.co.uk (David Robb) Date: Thu, 21 Jun 2007 20:42:07 +0100 Subject: [ofa-general] Re: Infiniband Problems In-Reply-To: <20070621190752.GK4857@mellanox.co.il> References: <467ACAD6.8000304@comsci.co.uk> <20070621190752.GK4857@mellanox.co.il> Message-ID: <467AD48F.3090901@comsci.co.uk> Thanks for the pointer. Upgrading probably does make sense and does not look too difficult. David Robb Michael S. Tsirkin wrote: >> ibv_devinfo -v output:- >> hca_id: mthca0 >> fw_ver: 1.1.0 >> > > I might make sense to upgrade to 1.2.0, there's a chance some > speed issues are fixed there. > > http://www.mellanox.com/support/firmware_table_IH3Lx.php > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Thu Jun 21 12:53:28 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Jun 2007 12:53:28 -0700 Subject: [ofa-general] Infiniband Problems In-Reply-To: <467AD385.3040500@comsci.co.uk> (David Robb's message of "Thu, 21 Jun 2007 20:37:41 +0100") References: <467ACAD6.8000304@comsci.co.uk> <467AD385.3040500@comsci.co.uk> Message-ID: > Quite possibly, we are using an IBV_QPT_RC transport type. The code > simply adds another work request with ibv_post_srq_recv(...) after > each packet is processed. Am I correct in thinking it should start out > with a stack of work requests in case another packet arrives before > the current one has been processed? That seems a lot more sensible to me. > Sorry, I meant to look up in my source code which call was failing but > forgot to paste it into the question. Yes, I can map 2GB of memory but > the call to ibv_create_qp() fails with REJ Not sure what you mean ... ibv_create_qp() just returns a pointer or NULL. What does it mean to "fail with REJ?" > That's reassuring. Are there any performance penalties for mapping a > larger region than a smaller region? Not really beyond the general cost of using more memory rather than less. - R. From DavidRobb at comsci.co.uk Thu Jun 21 13:05:15 2007 From: DavidRobb at comsci.co.uk (David Robb) Date: Thu, 21 Jun 2007 21:05:15 +0100 Subject: [ofa-general] Infiniband Problems In-Reply-To: References: <467ACAD6.8000304@comsci.co.uk> <467AD385.3040500@comsci.co.uk> Message-ID: <467AD9FB.1030508@comsci.co.uk> Roland Dreier wrote: > > Quite possibly, we are using an IBV_QPT_RC transport type. The code > > simply adds another work request with ibv_post_srq_recv(...) after > > each packet is processed. Am I correct in thinking it should start out > > with a stack of work requests in case another packet arrives before > > the current one has been processed? > > That seems a lot more sensible to me. > > > Sorry, I meant to look up in my source code which call was failing but > > forgot to paste it into the question. Yes, I can map 2GB of memory but > > the call to ibv_create_qp() fails with REJ > > Not sure what you mean ... ibv_create_qp() just returns a pointer or > NULL. What does it mean to "fail with REJ?" > OK. I need to rerun this test tomorrow to determine exactly where and how this test is failing. The end result is that the QP creation fails with a REJ. From what I remember, I get a CM event IB_CM_REJ_RECEIVED and the remote node is not even aware that anything has tried to connect. Thanks for staying with me on this one. > > That's reassuring. Are there any performance penalties for mapping a > > larger region than a smaller region? > > Not really beyond the general cost of using more memory rather than less. > > - R. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From rdreier at cisco.com Thu Jun 21 13:17:15 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Jun 2007 13:17:15 -0700 Subject: [ofa-general] Re: [PATCH] libmlx4: fix adjustments for minimum qp capabilities in mlx4_create_qp In-Reply-To: <200706191647.41336.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Tue, 19 Jun 2007 16:47:41 +0300") References: <200706191647.41336.jackm@dev.mellanox.co.il> Message-ID: > Need to adjust minimum qp capability values prior to size and max resource > calculations. Is this actually fixing a problem? I don't see how it could make a difference: > + attr->cap.max_recv_wr = attr->cap.max_recv_wr ? attr->cap.max_recv_wr : 1; align_queue_size() always returns at least 1 so I don't see why this matters. > + attr->cap.max_recv_sge = attr->cap.max_recv_sge ? attr->cap.max_recv_sge : 1; I don't see anything that uses max_recv_sge before it gets set in the current code. > + attr->cap.max_send_wr = attr->cap.max_send_wr ? attr->cap.max_send_wr : 1; If max_send_wr is 0 then the call to align_queue_size will always add at least one more WQE because sq_spare_wqes will never be a power of 2. > + attr->cap.max_send_sge = attr->cap.max_send_sge ? attr->cap.max_send_sge : 1; mlx4_calc_sq_wqe_size() will always end up with at least a 64-byte WQE size so does this matter? Oh, I guess a UD QP could end up with 0 send gather entries, but I'm not sure that's a big deal -- after all, the user gets what he asked for, and the HW shouldn't be bothered, should it? - R. From rdreier at cisco.com Thu Jun 21 13:19:55 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Jun 2007 13:19:55 -0700 Subject: [ofa-general] [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: <467996C4.1060201@ichips.intel.com> (Sean Hefty's message of "Wed, 20 Jun 2007 14:06:12 -0700") References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> <467996C4.1060201@ichips.intel.com> Message-ID: > Did you have something in mind? (new ioctl? re-using existing fields?) > > Not all fields are used for both reads and writes. E.g. status is > unused on a write, and retries is unused on a read. Storing the > pkey_index on a read seems doable. I think if we do anything on a > write, we need to make an assumption that the data is currently set to > 0 by the app. I hadn't really thought about it. One other thing is that the top 8 bits of flow_label aren't used. I guess we could steal that, although it's a little ugly. I doubt it would break existing userspace. There is the problem of old kernels silently ignoring the pkey index though. I'm not sure I see a good way around that. I'm beginning to think that just updating the ABI might be the right answer. But let's try to make this be the last ABI break. Are we pretty sure there's *nothing* else we might ever want to add to the structure? I can't think of anything right now... - R. From rdreier at cisco.com Thu Jun 21 13:23:00 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Jun 2007 13:23:00 -0700 Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: <20070621033854.GF8868@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 21 Jun 2007 06:38:54 +0300") References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> <467996C4.1060201@ichips.intel.com> <20070621033854.GF8868@mellanox.co.il> Message-ID: > We made a mistake of not validating the offset field otherwise we could > have used it, too: as it is I think apps just use "write" so > there's a useless byte counter in that field. which offset field? I don't see the string "offset" anywhere in ib_user_mad.h > But if we do one of these things, the app does not get any indication that pkey's > ignored, isn't that right? Yes, that's a good point. > This assumes an open file desriptor per-pkey, so the proposed API > extension umad_set_pkey would have to be changed to be per-port rather > than per-mad. But I think this is a better API, too: most apps > likely work within a single partition. Not sure I agree. If I'm implementing an SA, then I want to be able to receive MADs for all partitions, and send them too. Of course I can open a bunch of file descriptors, but then I probably end up in a mess keeping up with what's in my pkey table. - R. From rdreier at cisco.com Thu Jun 21 13:26:57 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Jun 2007 13:26:57 -0700 Subject: [ofa-general] Re: [PATCH] IB-mlx4: query_device needs to return one less srq wqe for max_srq_wr In-Reply-To: <200706191820.46443.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Tue, 19 Jun 2007 18:20:46 +0300") References: <200706191820.46443.jackm@dev.mellanox.co.il> Message-ID: Thanks, applied. From swise at opengridcomputing.com Thu Jun 21 13:28:33 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 21 Jun 2007 15:28:33 -0500 Subject: [ofa-general] Stringify ibv_event_type In-Reply-To: References: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com> <467ACC65.4020106@opengridcomputing.com> Message-ID: <467ADF71.4090002@opengridcomputing.com> Roland Dreier wrote: > > the rdmacm stuff too! > > which stuff is that? Is it from librdmacm? If so that's a different > package and therefore a different change to make. it would be nice for librdmacm to have a stringafy method for the event enum... But yes, its a different package... /me nudges sean... :) From rdreier at cisco.com Thu Jun 21 13:40:00 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Jun 2007 13:40:00 -0700 Subject: [ofa-general] Re: [PATCH for-2.6.22] ipoib/cm: fix interoperability when mtu don't match In-Reply-To: <20070620162215.GF6006@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 20 Jun 2007 19:22:15 +0300") References: <20070620162215.GF6006@mellanox.co.il> Message-ID: OK, I applied this for 2.6.22 since it looks quite safe (I even took the risk of replacing the "4" in the warning string with a "%d" and printing IPOIB_ENCAP_LEN, because it seemed funny to test against a named constant and then print a raw number). But I'm really going to be disappointed if this breaks something... BTW, any objection to merging the patch below for 2.6.22 too? It's compile-tested only but it looks *REALLY* safe. commit f667e4b9c4d7b2772105d2872becffbe9e65ecea Author: Roland Dreier Date: Thu Jun 21 13:37:05 2007 -0700 IPoIB/cm: Remove dead definition of struct ipoib_cm_id It's completely unused. Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 1fe7f66..b3d0a31 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -56,13 +56,6 @@ MODULE_PARM_DESC(cm_data_debug_level, #define IPOIB_CM_RX_DELAY (3 * 256 * HZ) #define IPOIB_CM_RX_UPDATE_MASK (0x3) -struct ipoib_cm_id { - struct ib_cm_id *id; - int flags; - u32 remote_qpn; - u32 remote_mtu; -}; - static struct ib_qp_attr ipoib_cm_err_attr = { .qp_state = IB_QPS_ERR }; From rdreier at cisco.com Thu Jun 21 13:52:22 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 21 Jun 2007 13:52:22 -0700 Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM) In-Reply-To: (Bernard King-Smith's message of "Wed, 20 Jun 2007 18:09:08 -0400") References: Message-ID: > It is not clear if anything is better yet, but instead you have to go back > to the IPoIB-CM RFC 4755 that we wrote. In the spec you will see that the > approach for this driver is to have the IPoIB driver select the most > appropriate method of connecting. If RC was not available then UD was > used. You can extend that to UC mode as Michael proposed, as long as you > allow selecting the most appropriate method of connection. By pushing the > issue of SRQ or not SRQ to the driver you have broken the IPoIB-CM > original design. Since SRQ was not a required function in the IB spec we > never addressed that issue in the RFC along with UC. I think we can agree > that adding UC is a good thing and follows the approach in the original > spec. Including SRQ as one of the tests for the best possible connection > method follows this same approach. > .... I can't really follow this. We're talking about the internal implementation inside the Linux kernel, which I really hope that an IETF RFC does not address at all. We surely intend to follow the RFC, and if we run into problems because the RFC was written without any implementation experience, then we'll work to correct those problems through a new IETF document. It makes perfect sense for ehca systems to be able to use IPoIB CM. I understand that current ehca HW doesn't natively support SRQs. The only question is how to implement IPoIB CM for ehca systems, and we have to weigh tradeoffs like avoiding code duplication vs the additional cost of branches on the data path. - R. From sashak at voltaire.com Thu Jun 21 14:29:20 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 22 Jun 2007 00:29:20 +0300 Subject: [ofa-general] [PATCH] opensm/updn: --connect_roots option Message-ID: <20070621212919.GL25653@sashak.voltaire.com> With this option up/down preserves route paths (based on min hops knowledge) between root switches. This makes up/down IBA complaint (where all to all connectivity is required), OTOH this violates up/down deadlock free algorithm. By default this option is 'off'. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_subnet.h | 6 ++++++ opensm/man/opensm.8 | 8 +++++++- opensm/opensm/main.c | 15 ++++++++++++++- opensm/opensm/osm_subnet.c | 10 ++++++++++ opensm/opensm/osm_ucast_updn.c | 27 ++++++++++++++++++++++++++- 5 files changed, 63 insertions(+), 3 deletions(-) diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index 2ee5689..43b1589 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -276,6 +276,7 @@ typedef struct _osm_subn_opt boolean_t sweep_on_trap; osm_testability_modes_t testability_mode; char * routing_engine_name; + boolean_t connect_roots; char * lid_matrix_dump_file; char * ucast_dump_file; char * root_guid_file; @@ -445,6 +446,11 @@ typedef struct _osm_subn_opt * Name of used routing engine * (other than default Min Hop Algorithm) * +* connect_roots +* The option which will enfoce root to root connectivity with +* up/down routing engine (even if this violates "pure" deadlock +* free up/down algorithm) +* * lid_matrix_dump_file * Name of the lid matrix dump file from where switch * lid matrices (min hops tables) will be loaded diff --git a/opensm/man/opensm.8 b/opensm/man/opensm.8 index 4d35689..40e0235 100644 --- a/opensm/man/opensm.8 +++ b/opensm/man/opensm.8 @@ -5,7 +5,7 @@ opensm \- InfiniBand subnet manager and administration (SM/SA) .SH SYNOPSIS .B opensm -[\-c(ache-options)] [\-g(uid)[=]] [\-l(mc) ] [\-p(riority) ] [\-smkey ] [\-r(eassign_lids)] [\-R | \-\-routing_engine ] [\-M | \-\-lid_matrix_file ] [\-U | \-ucast_file ] [\-S | \-\-sadb_file ] [\-a | \-\-root_guid_file ] [\-u | \-\-cn_guid_file ] [\-o(nce)] [\-s(weep) ] [\-t(imeout) ] [\-maxsmps ] [\-console [off | local | socket]] [\-console-port ] [\-i(gnore-guids) ] [\-f | \-\-log_file] [\-L | \-\-log_limit ] [\-e(rase_log_file)] [\-P(config)] [\-Q | \-qos] [\-N | \-no_part_enforce] [\-y | \-stay_on_fatal] [\-B | \-daemon] [\-I | \-inactive] [\-perfmgr] [\-perfmgr_sweep_time_s ] [\-v(erbose)] [\-V] [\-D ] [\-d(ebug) ] [\-h(elp)] [\-?] +[\-c(ache-options)] [\-g(uid)[=]] [\-l(mc) ] [\-p(riority) ] [\-smkey ] [\-r(eassign_lids)] [\-R | \-\-routing_engine ] [\-z | \-\-connect_roots] [\-M | \-\-lid_matrix_file ] [\-U | \-ucast_file ] [\-S | \-\-sadb_file ] [\-a | \-\-root_guid_file ] [\-u | \-\-cn_guid_file ] [\-o(nce)] [\-s(weep) ] [\-t(imeout) ] [\-maxsmps ] [\-console [off | local | socket]] [\-console-port ] [\-i(gnore-guids) ] [\-f | \-\-log_file] [\-L | \-\-log_limit ] [\-e(rase_log_file)] [\-P(config)] [\-Q | \-qos] [\-N | \-no_part_enforce] [\-y | \-stay_on_fatal] [\-B | \-daemon] [\-I | \-inactive] [\-perfmgr] [\-perfmgr_sweep_time_s ] [\-v(erbose)] [\-V] [\-D ] [\-d(ebug) ] [\-h(elp)] [\-?] .SH DESCRIPTION .PP @@ -94,6 +94,12 @@ This option chooses routing engine instead of Min Hop algorithm (default). Supported engines: updn, file, ftree, lash .TP +\fB\-z\fR, \fB\-\-connect_roots\fR +This option enforces a routing engine (currently up/down +only) to make connectivity between root switches and in +this way to be fully IBA complaint. In many cases this can +violate "pure" deadlock free algorithm, so use it carefully. +.TP \fB\-M\fR, \fB\-\-lid_matrix_file\fR This option specifies the name of the lid matrix dump file from where switch lid matrices (min hops tables will be diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index 0d5e0eb..e182276 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -175,6 +175,13 @@ show_usage(void) " This option chooses routing engine instead of Min Hop\n" " algorithm (default).\n" " Supported engines: updn, file, ftree\n\n"); + printf( "-z\n" + "--connect_roots\n" + " This option enforces a routing engine (currently\n" + " up/down only) to make connectivity between root switches\n" + " and in this way to be fully IBA complaint. In many cases\n" + " this can violate \"pure\" deadlock free algorithm, so\n" + " use it carefully.\n\n"); printf( "-M\n" "--lid_matrix_file \n" " This option specifies the name of the lid matrix dump file\n" @@ -591,7 +598,7 @@ main( char *ignore_guids_file_name = NULL; uint32_t val; const char * const short_option = - "i:f:ed:g:l:L:s:t:a:u:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:"; + "i:f:ed:g:l:L:s:t:a:u:R:zM:U:S:P:NBIQvVhorcyxp:n:q:k:C:"; /* In the array below, the 2nd parameter specifies the number @@ -625,6 +632,7 @@ main( { "priority", 1, NULL, 'p'}, { "smkey", 1, NULL, 'k'}, { "routing_engine",1, NULL, 'R'}, + { "connect_roots", 0, NULL, 'z'}, { "lid_matrix_file",1, NULL, 'M'}, { "ucast_file", 1, NULL, 'U'}, { "sadb_file", 1, NULL, 'S'}, @@ -876,6 +884,11 @@ main( printf(" Activate \'%s\' routing engine\n", optarg); break; + case 'z': + opt.connect_roots = TRUE; + printf(" Connect roots option is on\n"); + break; + case 'M': opt.lid_matrix_dump_file = optarg; printf(" Lid matrix dump file is \'%s\'\n", optarg); diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 82d66f9..8f429ae 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -500,6 +500,7 @@ osm_subn_set_default_opt( p_opt->sweep_on_trap = TRUE; p_opt->testability_mode = OSM_TEST_MODE_NONE; p_opt->routing_engine_name = NULL; + p_opt->connect_roots = FALSE; p_opt->lid_matrix_dump_file = NULL; p_opt->ucast_dump_file = NULL; p_opt->root_guid_file = NULL; @@ -1290,6 +1291,10 @@ osm_subn_parse_conf_file( "routing_engine", p_key, p_val, &p_opts->routing_engine_name); + __osm_subn_opts_unpack_boolean( + "connect_roots", + p_key, p_val, &p_opts->connect_roots); + __osm_subn_opts_unpack_charp( "log_file", p_key, p_val, &p_opts->log_file); @@ -1545,6 +1550,11 @@ osm_subn_write_conf_file( "# Routing engine\n" "routing_engine %s\n\n", p_opts->routing_engine_name); + if (p_opts->connect_roots) + fprintf( opts_file, + "# Connect roots (use FALSE if unsure)\n" + "connect_roots %s\n\n", + p_opts->connect_roots ? "TRUE" : "FALSE"); if (p_opts->lid_matrix_dump_file) fprintf( opts_file, "# Lid matrix dump file name\n" diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c index af5ee4e..db8e60a 100644 --- a/opensm/opensm/osm_ucast_updn.c +++ b/opensm/opensm/osm_ucast_updn.c @@ -449,6 +449,24 @@ updn_subn_rank( /********************************************************************** **********************************************************************/ +/* hack: preserve min hops entries to any other root switches */ +static void +updn_clear_root_hops(updn_t *p_updn, osm_switch_t *p_sw) +{ + osm_port_t *p_port; + unsigned i; + + for ( i = 0 ; i < p_sw->num_hops ; i++ ) + if (p_sw->hops[i]) { + p_port = cl_ptr_vector_get(&p_updn->p_osm->subn.port_lid_tbl, i); + if (!p_port || !p_port->p_node->sw || + ((struct updn_node *)p_port->p_node->sw->priv)->rank != 0) + memset(p_sw->hops[i], 0xff, p_sw->num_ports); + } +} + +/********************************************************************** + **********************************************************************/ static int __osm_subn_set_up_down_min_hop_table( IN updn_t* p_updn ) @@ -471,7 +489,10 @@ __osm_subn_set_up_down_min_hop_table( p_sw = p_next_sw; p_next_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item ); /* Clear Min Hop Table */ - osm_switch_clear_hops(p_sw); + if (p_subn->opt.connect_roots && !((struct updn_node *)p_sw->priv)->rank) + updn_clear_root_hops(p_updn, p_sw); + else + osm_switch_clear_hops(p_sw); } osm_log( p_log, OSM_LOG_VERBOSE, @@ -607,6 +628,10 @@ __osm_updn_call( osm_ucast_mgr_build_lid_matrices( &p_updn->p_osm->sm.ucast_mgr ); __osm_updn_find_root_nodes_by_min_hop( p_updn ); } + else if (p_updn->p_osm->subn.opt.connect_roots && + p_updn->updn_ucast_reg_inputs.num_guids > 1) + osm_ucast_mgr_build_lid_matrices( &p_updn->p_osm->sm.ucast_mgr ); + /* printf ("-V- after osm_updn_find_root_nodes_by_min_hop\n"); */ /* Only if there are assigned root nodes do the algorithm, otherwise perform do nothing */ if ( p_updn->updn_ucast_reg_inputs.num_guids > 0) -- 1.5.2.2.277.g07b8 From mshefty at ichips.intel.com Thu Jun 21 14:30:45 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 21 Jun 2007 14:30:45 -0700 Subject: [ofa-general] [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> <467996C4.1060201@ichips.intel.com> Message-ID: <467AEE05.9050809@ichips.intel.com> > I'm beginning to think that just updating the ABI might be the right > answer. But let's try to make this be the last ABI break. Are we > pretty sure there's *nothing* else we might ever want to add to the > structure? I can't think of anything right now... I can't think of anything, but Hal is in a better position to answer this. He's the one who pointed out the problem to me. - Sean From kliteyn at dev.mellanox.co.il Thu Jun 21 14:42:00 2007 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Fri, 22 Jun 2007 00:42:00 +0300 Subject: [ofa-general] Re: [PATCH] osm: cosmetics in ftree - added get_guid functions for switch and hca In-Reply-To: <1182447627.15653.419564.camel@hal.voltaire.com> References: <4678DA83.2050700@dev.mellanox.co.il> <1182447627.15653.419564.camel@hal.voltaire.com> Message-ID: <467AF0A8.4070706@dev.mellanox.co.il> Hal Rosenstock wrote: > Hi again Yevgeny, > > On Wed, 2007-06-20 at 03:42, Yevgeny Kliteynik wrote: >> Hi Hal, >> >> Cosmetic code changes in fat-tree: >> added get_guid_ho and get_guid_no functions for switches and hca's >> >> -- Yevgeny >> >> Signed-off-by: Yevgeny Kliteynik > > This patch won't apply either. I'm not sure I want to hand edit these > changes in. Can you try it and see if it works for you ? Thanks, I'll check what the problem is. -- Yevgeny > Thanks. > > -- Hal > >> --- >> opensm/opensm/osm_ucast_ftree.c | 77 +++++++++++++++++++++++++++++---------- >> 1 files changed, 58 insertions(+), 19 deletions(-) >> >> diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c >> index 1ead199..1ae8b29 100644 >> --- a/opensm/opensm/osm_ucast_ftree.c >> +++ b/opensm/opensm/osm_ucast_ftree.c >> @@ -640,6 +640,26 @@ __osm_ftree_sw_destroy( >> >> /***************************************************/ >> >> +static uint64_t >> +__osm_ftree_sw_get_guid_no( >> + IN ftree_sw_t * p_sw) >> +{ >> + if (!p_sw) >> + return 0; >> + return osm_node_get_node_guid(p_sw->p_osm_sw->p_node); >> +} >> + >> +/***************************************************/ >> + >> +static uint64_t >> +__osm_ftree_sw_get_guid_ho( >> + IN ftree_sw_t * p_sw) >> +{ >> + return cl_ntoh64(__osm_ftree_sw_get_guid_no(p_sw)); >> +} >> + >> +/***************************************************/ >> + >> static void >> __osm_ftree_sw_dump( >> IN ftree_fabric_t * p_ftree, >> @@ -657,7 +677,7 @@ __osm_ftree_sw_dump( >> "__osm_ftree_sw_dump: " >> "Switch index: %s, GUID: 0x%016" PRIx64 ", Ports: %u DOWN, %u UP\n", >> __osm_ftree_tuple_to_str(p_sw->tuple), >> - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), >> + __osm_ftree_sw_get_guid_ho(p_sw), >> p_sw->down_port_groups_num, >> p_sw->up_port_groups_num); >> >> @@ -835,6 +855,26 @@ __osm_ftree_hca_destroy( >> >> /***************************************************/ >> >> +static uint64_t >> +__osm_ftree_hca_get_guid_no( >> + IN ftree_hca_t * p_hca) >> +{ >> + if (!p_hca) >> + return 0; >> + return osm_node_get_node_guid(p_hca->p_osm_node); >> +} >> + >> +/***************************************************/ >> + >> +static uint64_t >> +__osm_ftree_hca_get_guid_ho( >> + IN ftree_hca_t * p_hca) >> +{ >> + return cl_ntoh64(__osm_ftree_hca_get_guid_no(p_hca)); >> +} >> + >> +/***************************************************/ >> + >> static void >> __osm_ftree_hca_dump( >> IN ftree_fabric_t * p_ftree, >> @@ -851,7 +891,7 @@ __osm_ftree_hca_dump( >> osm_log(&p_ftree->p_osm->log, OSM_LOG_DEBUG, >> "__osm_ftree_hca_dump: " >> "CA GUID: 0x%016" PRIx64 ", Ports: %u UP\n", >> - cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)), >> + __osm_ftree_hca_get_guid_ho(p_hca), >> p_hca->up_port_groups_num); >> >> for( i = 0; i < p_hca->up_port_groups_num; i++ ) >> @@ -1214,7 +1254,7 @@ __osm_ftree_fabric_dump_general_info( >> osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE, >> "__osm_ftree_fabric_dump_general_info: " >> " GUID: 0x%016" PRIx64 ", LID: 0x%x, Index %s\n", >> - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), >> + __osm_ftree_sw_get_guid_ho(p_sw), >> cl_ntoh16(p_sw->base_lid), >> __osm_ftree_tuple_to_str(p_sw->tuple)); >> } >> @@ -1227,8 +1267,7 @@ __osm_ftree_fabric_dump_general_info( >> osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE, >> "__osm_ftree_fabric_dump_general_info: " >> " GUID: 0x%016" PRIx64 ", LID: 0x%x, Index %s\n", >> - cl_ntoh64(osm_node_get_node_guid( >> - p_ftree->leaf_switches[i]->p_osm_sw->p_node)), >> + __osm_ftree_sw_get_guid_ho(p_ftree->leaf_switches[i]), >> cl_ntoh16(p_ftree->leaf_switches[i]->base_lid), >> __osm_ftree_tuple_to_str(p_ftree->leaf_switches[i]->tuple)); >> } >> @@ -1442,7 +1481,7 @@ __osm_ftree_fabric_make_indexing( >> p_sw->rank, >> __osm_ftree_tuple_to_str(p_sw->tuple), >> cl_ntoh16(p_sw->base_lid), >> - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node))); >> + __osm_ftree_sw_get_guid_ho(p_sw)); >> >> /* >> * Now run BFS and assign indexes to all switches >> @@ -1617,11 +1656,11 @@ __osm_ftree_fabric_validate_topology( >> "ERR AB09: Different number of upward port groups on switches:\n" >> " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u groups\n" >> " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u groups\n", >> - cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)), >> + __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]), >> cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid), >> __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple), >> reference_sw_arr[p_sw->rank]->up_port_groups_num, >> - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), >> + __osm_ftree_sw_get_guid_ho(p_sw), >> cl_ntoh16(p_sw->base_lid), >> __osm_ftree_tuple_to_str(p_sw->tuple), >> p_sw->up_port_groups_num); >> @@ -1638,11 +1677,11 @@ __osm_ftree_fabric_validate_topology( >> "ERR AB0A: Different number of downward port groups on switches:\n" >> " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u port groups\n" >> " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u port groups\n", >> - cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)), >> + __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]), >> cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid), >> __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple), >> reference_sw_arr[p_sw->rank]->down_port_groups_num, >> - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), >> + __osm_ftree_sw_get_guid_ho(p_sw), >> cl_ntoh16(p_sw->base_lid), >> __osm_ftree_tuple_to_str(p_sw->tuple), >> p_sw->down_port_groups_num); >> @@ -1663,11 +1702,11 @@ __osm_ftree_fabric_validate_topology( >> "ERR AB0B: Different number of ports in an upward port group on switches:\n" >> " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n" >> " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n", >> - cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)), >> + __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]), >> cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid), >> __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple), >> cl_ptr_vector_get_size(&p_ref_group->ports), >> - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), >> + __osm_ftree_sw_get_guid_ho(p_sw), >> cl_ntoh16(p_sw->base_lid), >> __osm_ftree_tuple_to_str(p_sw->tuple), >> cl_ptr_vector_get_size(&p_group->ports)); >> @@ -1691,11 +1730,11 @@ __osm_ftree_fabric_validate_topology( >> "ERR AB0C: Different number of ports in an downward port group on switches:\n" >> " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n" >> " GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n", >> - cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)), >> + __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]), >> cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid), >> __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple), >> cl_ptr_vector_get_size(&p_ref_group->ports), >> - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), >> + __osm_ftree_sw_get_guid_ho(p_sw), >> cl_ntoh16(p_sw->base_lid), >> __osm_ftree_tuple_to_str(p_sw->tuple), >> cl_ptr_vector_get_size(&p_group->ports)); >> @@ -2508,7 +2547,7 @@ __osm_ftree_rank_leaf_switches( >> "__osm_ftree_rank_leaf_switches: ERR AB0F: " >> "CA conected directly to another CA: " >> "0x%016" PRIx64 " <---> 0x%016" PRIx64 "\n", >> - cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)), >> + __osm_ftree_hca_get_guid_ho(p_hca), >> cl_ntoh64(osm_node_get_node_guid(p_remote_osm_node))); >> res = -1; >> goto Exit; >> @@ -2548,8 +2587,8 @@ __osm_ftree_rank_leaf_switches( >> " - CA guid : 0x%016" PRIx64 "\n" >> " - Switch guid: 0x%016" PRIx64 "\n" >> " - Switch LID : 0x%x\n", >> - cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)), >> - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), >> + __osm_ftree_hca_get_guid_ho(p_hca), >> + __osm_ftree_sw_get_guid_ho(p_sw), >> cl_ntoh16(p_sw->base_lid)); >> cl_list_insert_tail(p_ranking_bfs_list, >> &__osm_ftree_sw_tbl_element_create(p_sw)->map_item); >> @@ -2740,10 +2779,10 @@ __osm_ftree_fabric_construct_sw_ports( >> " GUID 0x%016" PRIx64 ", LID 0x%x, rank %u\n", >> p_sw->rank, >> p_remote_sw->rank, >> - cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)), >> + __osm_ftree_sw_get_guid_ho(p_sw), >> cl_ntoh16(p_sw->base_lid), >> p_sw->rank, >> - cl_ntoh64(osm_node_get_node_guid(p_remote_sw->p_osm_sw->p_node)), >> + __osm_ftree_sw_get_guid_ho(p_remote_sw), >> cl_ntoh16(p_remote_sw->base_lid), >> p_remote_sw->rank); >> res = -1; > > From jsquyres at cisco.com Thu Jun 21 14:58:35 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 21 Jun 2007 17:58:35 -0400 Subject: [ofa-general] Re: Stringify ibv_event_type In-Reply-To: <20070621185533.GJ4857@mellanox.co.il> References: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com> <20070621185533.GJ4857@mellanox.co.il> Message-ID: On Jun 21, 2007, at 2:55 PM, Michael S. Tsirkin wrote: > I have no strong opinion either way, but I do wonder why do you > find this useful? The more verbose an error message, the more chance a user has to understand it. > Asyncwatch is just an example: it does not actually *do anything* > on an event, > so it calls printf. But, is it likely that enduser really needs to see > IBV_EVENT_CLIENT_REREGISTER? Printing out the numerc value seems > sufficient for debug. Why have to force a secondary lookup (that may involve multiple steps)? Printing a string is easy. Plus, what if the enum values change over time? Then we'll have to have the user send us the error message and their verbs.h to find out what the problem really is. If you print the enum value as a string, it's pretty clear (to a developer at least) what the problem is/could be regardless of what the actual numerical value is (indeed, who cares what the numerical value is?). Heck, some of the enum names are fairly obvious such that even a reasonably-skilled user could figure out at least the context of the error. Just my $0.02. -- Jeff Squyres Cisco Systems From sean.hefty at intel.com Thu Jun 21 15:21:40 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 21 Jun 2007 15:21:40 -0700 Subject: [ofa-general] Stringify ibv_event_type In-Reply-To: <467ADF71.4090002@opengridcomputing.com> Message-ID: <000201c7b452$8a63c220$ff0da8c0@amr.corp.intel.com> >/me nudges sean... How's this? Does anything else need to be done with the build (beyond a new release at some point)? Signed-off-by: Sean Hefty diff --git a/include/rdma/rdma_cma.h b/include/rdma/rdma_cma.h index f920ae0..43c71d5 100644 --- a/include/rdma/rdma_cma.h +++ b/include/rdma/rdma_cma.h @@ -463,7 +463,7 @@ int rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr); * Destruction of an rdma_cm_id will block until related events have been * acknowledged. * See also: - * rdma_ack_cm_event, rdma_create_event_channel + * rdma_ack_cm_event, rdma_create_event_channel, rdma_event_str */ int rdma_get_cm_event(struct rdma_event_channel *channel, struct rdma_cm_event **event); @@ -519,6 +519,16 @@ struct ibv_context **rdma_get_devices(int *num_devices); */ void rdma_free_devices(struct ibv_context **list); +/** + * rdma_event_str - Returns a string representation of an rdma cm event. + * @event: Asynchronous event. + * Description: + * Returns a string representation of an asynchronous event. + * See also: + * rdma_get_cm_event + */ +const char *rdma_event_str(enum rdma_cm_event_type event); + #ifdef __cplusplus } #endif diff --git a/man/rdma_event_str.3 b/man/rdma_event_str.3 new file mode 100644 index 0000000..a6ee3e5 --- /dev/null +++ b/man/rdma_event_str.3 @@ -0,0 +1,15 @@ +.TH "RDMA_EVENT_STR" 3 "2007-05-15" "librdmacm" "Librdmacm Programmer's Manual" librdmacm +.SH NAME +rdma_event_str \- Returns a string representation of an rdma cm event. +.SH SYNOPSIS +.B "#include " +.P +.B "char *" rdma_event_str +.BI "("enum ibv_event_type " event ");" +.SH ARGUMENTS +.IP "event" 12 +Asynchronous event. +.SH "DESCRIPTION" +Returns a string representation of an asynchronous event. +.SH "SEE ALSO" +rdma_get_cm_event(3) diff --git a/man/rdma_get_cm_event.3 b/man/rdma_get_cm_event.3 index a260092..252a7ab 100644 --- a/man/rdma_get_cm_event.3 +++ b/man/rdma_get_cm_event.3 @@ -62,4 +62,4 @@ no longer accessible and should be rejoined, if desired. .SH "SEE ALSO" rdma_ack_cm_event(3), rdma_create_event_channel(3), rdma_resolve_addr(3), rdma_resolve_route(3), rdma_connect(3), rdma_listen(3), rdma_join_multicast(3), -rdma_destroy_id(3) +rdma_destroy_id(3), rdma_event_str(3) diff --git a/src/cma.c b/src/cma.c index fdadb69..3579530 100644 --- a/src/cma.c +++ b/src/cma.c @@ -1359,3 +1359,39 @@ retry: *event = &evt->event; return 0; } + +const char *rdma_event_str(enum rdma_cm_event_type event) +{ + switch (event) { + case RDMA_CM_EVENT_ADDR_RESOLVED: + return "RDMA_CM_EVENT_ADDR_RESOLVED"; + case RDMA_CM_EVENT_ADDR_ERROR: + return "RDMA_CM_EVENT_ADDR_ERROR"; + case RDMA_CM_EVENT_ROUTE_RESOLVED: + return "RDMA_CM_EVENT_ROUTE_RESOLVED"; + case RDMA_CM_EVENT_ROUTE_ERROR: + return "RDMA_CM_EVENT_ROUTE_ERROR"; + case RDMA_CM_EVENT_CONNECT_REQUEST: + return "RDMA_CM_EVENT_CONNECT_REQUEST"; + case RDMA_CM_EVENT_CONNECT_RESPONSE: + return "RDMA_CM_EVENT_CONNECT_RESPONSE"; + case RDMA_CM_EVENT_CONNECT_ERROR: + return "RDMA_CM_EVENT_CONNECT_ERROR"; + case RDMA_CM_EVENT_UNREACHABLE: + return "RDMA_CM_EVENT_UNREACHABLE"; + case RDMA_CM_EVENT_REJECTED: + return "RDMA_CM_EVENT_REJECTED"; + case RDMA_CM_EVENT_ESTABLISHED: + return "RDMA_CM_EVENT_ESTABLISHED"; + case RDMA_CM_EVENT_DISCONNECTED: + return "RDMA_CM_EVENT_DISCONNECTED"; + case RDMA_CM_EVENT_DEVICE_REMOVAL: + return "RDMA_CM_EVENT_DEVICE_REMOVAL"; + case RDMA_CM_EVENT_MULTICAST_JOIN: + return "RDMA_CM_EVENT_MULTICAST_JOIN"; + case RDMA_CM_EVENT_MULTICAST_ERROR: + return "RDMA_CM_EVENT_MULTICAST_ERROR"; + default: + return "UNKNOWN EVENT"; + } +} diff --git a/src/librdmacm.map b/src/librdmacm.map index 06e9765..eafeae4 100644 --- a/src/librdmacm.map +++ b/src/librdmacm.map @@ -23,5 +23,6 @@ RDMACM_1.0 { rdma_leave_multicast; rdma_get_devices; rdma_free_devices; + rdma_event_str; local: *; }; === diff --git a/examples/cmatose.c b/examples/cmatose.c index 4479fd4..0daaab0 100644 --- a/examples/cmatose.c +++ b/examples/cmatose.c @@ -320,8 +320,8 @@ static int cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) case RDMA_CM_EVENT_CONNECT_ERROR: case RDMA_CM_EVENT_UNREACHABLE: case RDMA_CM_EVENT_REJECTED: - printf("cmatose: event: %d, error: %d\n", event->event, - event->status); + printf("cmatose: event: %s, error: %d\n", + rdma_event_str(event->event), event->status); connect_error(); break; case RDMA_CM_EVENT_DISCONNECTED: diff --git a/examples/mckey.c b/examples/mckey.c index 24514a4..15371b6 100644 --- a/examples/mckey.c +++ b/examples/mckey.c @@ -305,8 +305,8 @@ static int cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) case RDMA_CM_EVENT_ADDR_ERROR: case RDMA_CM_EVENT_ROUTE_ERROR: case RDMA_CM_EVENT_MULTICAST_ERROR: - printf("mckey: event: %d, error: %d\n", event->event, - event->status); + printf("mckey: event: %s, error: %d\n", + rdma_event_str(event->event), event->status); connect_error(); ret = event->status; break; diff --git a/examples/rping.c b/examples/rping.c index 2dd1cef..c03d3b5 100644 --- a/examples/rping.c +++ b/examples/rping.c @@ -164,7 +164,8 @@ static int rping_cma_event_handler(struct rdma_cm_id *cma_id, int ret = 0; struct rping_cb *cb = cma_id->context; - DEBUG_LOG("cma_event type %d cma_id %p (%s)\n", event->event, cma_id, + DEBUG_LOG("cma_event type %s cma_id %p (%s)\n", + rdma_event_str(event->event), cma_id, (cma_id == cb->cm_id) ? "parent" : "child"); switch (event->event) { @@ -207,14 +208,15 @@ static int rping_cma_event_handler(struct rdma_cm_id *cma_id, case RDMA_CM_EVENT_CONNECT_ERROR: case RDMA_CM_EVENT_UNREACHABLE: case RDMA_CM_EVENT_REJECTED: - fprintf(stderr, "cma event %d, error %d\n", event->event, - event->status); + fprintf(stderr, "cma event %s, error %d\n", + rdma_event_str(event->event), event->status); sem_post(&cb->sem); ret = -1; break; case RDMA_CM_EVENT_DISCONNECTED: - fprintf(stderr, "%s DISCONNECT EVENT...\n", cb->server ? "server" : "client"); + fprintf(stderr, "%s DISCONNECT EVENT...\n", + cb->server ? "server" : "client"); sem_post(&cb->sem); break; diff --git a/examples/udaddy.c b/examples/udaddy.c index 12e6297..1b6a732 100644 --- a/examples/udaddy.c +++ b/examples/udaddy.c @@ -363,8 +363,8 @@ static int cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) case RDMA_CM_EVENT_CONNECT_ERROR: case RDMA_CM_EVENT_UNREACHABLE: case RDMA_CM_EVENT_REJECTED: - printf("udaddy: event: %d, error: %d\n", event->event, - event->status); + printf("udaddy: event: %s, error: %d\n", + rdma_event_str(event->event), event->status); connect_error(); ret = event->status; break; From sashak at voltaire.com Thu Jun 21 16:22:33 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 22 Jun 2007 02:22:33 +0300 Subject: [ofa-general] [PATCH] management: drop *.spec files rebuild Message-ID: <20070621232233.GN25653@sashak.voltaire.com> Now *.spec files are generated from *.spec.in templates by running make.dist script and it is committed in git. This patch drops *.spec regeneration by ./configure (which gets invalid @RELEASE@ and other macros anyway). Signed-off-by: Sasha Khapyorsky --- infiniband-diags/configure.in | 1 - libibcommon/configure.in | 2 +- libibmad/configure.in | 2 +- libibumad/configure.in | 2 +- opensm/configure.in | 1 - 5 files changed, 3 insertions(+), 5 deletions(-) diff --git a/infiniband-diags/configure.in b/infiniband-diags/configure.in index 0d7f82c..b06cb37 100644 --- a/infiniband-diags/configure.in +++ b/infiniband-diags/configure.in @@ -158,7 +158,6 @@ AC_SUBST(IBSCRIPTPATH) AC_CONFIG_FILES([\ Makefile \ - infiniband-diags.spec \ scripts/ibcheckerrors \ scripts/ibcheckerrs \ scripts/ibchecknet \ diff --git a/libibcommon/configure.in b/libibcommon/configure.in index cbf9f07..8a9e5be 100644 --- a/libibcommon/configure.in +++ b/libibcommon/configure.in @@ -46,5 +46,5 @@ AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test "$ac_cv_version_script" = "yes") -AC_CONFIG_FILES([Makefile libibcommon.spec]) +AC_CONFIG_FILES([Makefile]) AC_OUTPUT diff --git a/libibmad/configure.in b/libibmad/configure.in index fbb7758..d534916 100644 --- a/libibmad/configure.in +++ b/libibmad/configure.in @@ -63,5 +63,5 @@ AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test "$ac_cv_version_script" = "yes") -AC_CONFIG_FILES([Makefile libibmad.spec]) +AC_CONFIG_FILES([Makefile]) AC_OUTPUT diff --git a/libibumad/configure.in b/libibumad/configure.in index 74f3255..538c118 100644 --- a/libibumad/configure.in +++ b/libibumad/configure.in @@ -64,5 +64,5 @@ AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test "$ac_cv_version_script" = "yes") -AC_CONFIG_FILES([Makefile libibumad.spec]) +AC_CONFIG_FILES([Makefile]) AC_OUTPUT diff --git a/opensm/configure.in b/opensm/configure.in index 2d88464..2ab6a44 100644 --- a/opensm/configure.in +++ b/opensm/configure.in @@ -44,4 +44,3 @@ AC_CONFIG_SUBDIRS(complib libvendor opensm osmtest include osmeventplugin) dnl Create the following Makefiles AC_OUTPUT(Makefile) -AC_OUTPUT(opensm.spec) -- 1.5.2.2.277.g07b8 From halr at voltaire.com Thu Jun 21 17:07:09 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Jun 2007 20:07:09 -0400 Subject: [ofa-general] [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: <467AEE05.9050809@ichips.intel.com> References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> <467996C4.1060201@ichips.intel.com> <467AEE05.9050809@ichips.intel.com> Message-ID: <1182470820.15653.445994.camel@hal.voltaire.com> On Thu, 2007-06-21 at 17:30, Sean Hefty wrote: > > I'm beginning to think that just updating the ABI might be the right > > answer. But let's try to make this be the last ABI break. Are we > > pretty sure there's *nothing* else we might ever want to add to the > > structure? I can't think of anything right now... > > I can't think of anything, but Hal is in a better position to answer > this. He's the one who pointed out the problem to me. AFAIK this was the only thing missing but there are no guarantees. We somehow missed this before. -- Hal > - Sean From swise at opengridcomputing.com Thu Jun 21 17:10:01 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 21 Jun 2007 19:10:01 -0500 Subject: [ofa-general] Stringify ibv_event_type In-Reply-To: <000201c7b452$8a63c220$ff0da8c0@amr.corp.intel.com> References: <000201c7b452$8a63c220$ff0da8c0@amr.corp.intel.com> Message-ID: <467B1359.9060308@opengridcomputing.com> Looks good to me! Sean Hefty wrote: >> /me nudges sean... >> > > How's this? Does anything else need to be done with the build (beyond a new > release at some point)? > > Signed-off-by: Sean Hefty > > diff --git a/include/rdma/rdma_cma.h b/include/rdma/rdma_cma.h > index f920ae0..43c71d5 100644 > --- a/include/rdma/rdma_cma.h > +++ b/include/rdma/rdma_cma.h > @@ -463,7 +463,7 @@ int rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr); > * Destruction of an rdma_cm_id will block until related events have been > * acknowledged. > * See also: > - * rdma_ack_cm_event, rdma_create_event_channel > + * rdma_ack_cm_event, rdma_create_event_channel, rdma_event_str > */ > int rdma_get_cm_event(struct rdma_event_channel *channel, > struct rdma_cm_event **event); > @@ -519,6 +519,16 @@ struct ibv_context **rdma_get_devices(int *num_devices); > */ > void rdma_free_devices(struct ibv_context **list); > > +/** > + * rdma_event_str - Returns a string representation of an rdma cm event. > + * @event: Asynchronous event. > + * Description: > + * Returns a string representation of an asynchronous event. > + * See also: > + * rdma_get_cm_event > + */ > +const char *rdma_event_str(enum rdma_cm_event_type event); > + > #ifdef __cplusplus > } > #endif > diff --git a/man/rdma_event_str.3 b/man/rdma_event_str.3 > new file mode 100644 > index 0000000..a6ee3e5 > --- /dev/null > +++ b/man/rdma_event_str.3 > @@ -0,0 +1,15 @@ > +.TH "RDMA_EVENT_STR" 3 "2007-05-15" "librdmacm" "Librdmacm Programmer's Manual" librdmacm > +.SH NAME > +rdma_event_str \- Returns a string representation of an rdma cm event. > +.SH SYNOPSIS > +.B "#include " > +.P > +.B "char *" rdma_event_str > +.BI "("enum ibv_event_type " event ");" > +.SH ARGUMENTS > +.IP "event" 12 > +Asynchronous event. > +.SH "DESCRIPTION" > +Returns a string representation of an asynchronous event. > +.SH "SEE ALSO" > +rdma_get_cm_event(3) > diff --git a/man/rdma_get_cm_event.3 b/man/rdma_get_cm_event.3 > index a260092..252a7ab 100644 > --- a/man/rdma_get_cm_event.3 > +++ b/man/rdma_get_cm_event.3 > @@ -62,4 +62,4 @@ no longer accessible and should be rejoined, if desired. > .SH "SEE ALSO" > rdma_ack_cm_event(3), rdma_create_event_channel(3), rdma_resolve_addr(3), > rdma_resolve_route(3), rdma_connect(3), rdma_listen(3), rdma_join_multicast(3), > -rdma_destroy_id(3) > +rdma_destroy_id(3), rdma_event_str(3) > diff --git a/src/cma.c b/src/cma.c > index fdadb69..3579530 100644 > --- a/src/cma.c > +++ b/src/cma.c > @@ -1359,3 +1359,39 @@ retry: > *event = &evt->event; > return 0; > } > + > +const char *rdma_event_str(enum rdma_cm_event_type event) > +{ > + switch (event) { > + case RDMA_CM_EVENT_ADDR_RESOLVED: > + return "RDMA_CM_EVENT_ADDR_RESOLVED"; > + case RDMA_CM_EVENT_ADDR_ERROR: > + return "RDMA_CM_EVENT_ADDR_ERROR"; > + case RDMA_CM_EVENT_ROUTE_RESOLVED: > + return "RDMA_CM_EVENT_ROUTE_RESOLVED"; > + case RDMA_CM_EVENT_ROUTE_ERROR: > + return "RDMA_CM_EVENT_ROUTE_ERROR"; > + case RDMA_CM_EVENT_CONNECT_REQUEST: > + return "RDMA_CM_EVENT_CONNECT_REQUEST"; > + case RDMA_CM_EVENT_CONNECT_RESPONSE: > + return "RDMA_CM_EVENT_CONNECT_RESPONSE"; > + case RDMA_CM_EVENT_CONNECT_ERROR: > + return "RDMA_CM_EVENT_CONNECT_ERROR"; > + case RDMA_CM_EVENT_UNREACHABLE: > + return "RDMA_CM_EVENT_UNREACHABLE"; > + case RDMA_CM_EVENT_REJECTED: > + return "RDMA_CM_EVENT_REJECTED"; > + case RDMA_CM_EVENT_ESTABLISHED: > + return "RDMA_CM_EVENT_ESTABLISHED"; > + case RDMA_CM_EVENT_DISCONNECTED: > + return "RDMA_CM_EVENT_DISCONNECTED"; > + case RDMA_CM_EVENT_DEVICE_REMOVAL: > + return "RDMA_CM_EVENT_DEVICE_REMOVAL"; > + case RDMA_CM_EVENT_MULTICAST_JOIN: > + return "RDMA_CM_EVENT_MULTICAST_JOIN"; > + case RDMA_CM_EVENT_MULTICAST_ERROR: > + return "RDMA_CM_EVENT_MULTICAST_ERROR"; > + default: > + return "UNKNOWN EVENT"; > + } > +} > diff --git a/src/librdmacm.map b/src/librdmacm.map > index 06e9765..eafeae4 100644 > --- a/src/librdmacm.map > +++ b/src/librdmacm.map > @@ -23,5 +23,6 @@ RDMACM_1.0 { > rdma_leave_multicast; > rdma_get_devices; > rdma_free_devices; > + rdma_event_str; > local: *; > }; > > === > > diff --git a/examples/cmatose.c b/examples/cmatose.c > index 4479fd4..0daaab0 100644 > --- a/examples/cmatose.c > +++ b/examples/cmatose.c > @@ -320,8 +320,8 @@ static int cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) > case RDMA_CM_EVENT_CONNECT_ERROR: > case RDMA_CM_EVENT_UNREACHABLE: > case RDMA_CM_EVENT_REJECTED: > - printf("cmatose: event: %d, error: %d\n", event->event, > - event->status); > + printf("cmatose: event: %s, error: %d\n", > + rdma_event_str(event->event), event->status); > connect_error(); > break; > case RDMA_CM_EVENT_DISCONNECTED: > diff --git a/examples/mckey.c b/examples/mckey.c > index 24514a4..15371b6 100644 > --- a/examples/mckey.c > +++ b/examples/mckey.c > @@ -305,8 +305,8 @@ static int cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) > case RDMA_CM_EVENT_ADDR_ERROR: > case RDMA_CM_EVENT_ROUTE_ERROR: > case RDMA_CM_EVENT_MULTICAST_ERROR: > - printf("mckey: event: %d, error: %d\n", event->event, > - event->status); > + printf("mckey: event: %s, error: %d\n", > + rdma_event_str(event->event), event->status); > connect_error(); > ret = event->status; > break; > diff --git a/examples/rping.c b/examples/rping.c > index 2dd1cef..c03d3b5 100644 > --- a/examples/rping.c > +++ b/examples/rping.c > @@ -164,7 +164,8 @@ static int rping_cma_event_handler(struct rdma_cm_id *cma_id, > int ret = 0; > struct rping_cb *cb = cma_id->context; > > - DEBUG_LOG("cma_event type %d cma_id %p (%s)\n", event->event, cma_id, > + DEBUG_LOG("cma_event type %s cma_id %p (%s)\n", > + rdma_event_str(event->event), cma_id, > (cma_id == cb->cm_id) ? "parent" : "child"); > > switch (event->event) { > @@ -207,14 +208,15 @@ static int rping_cma_event_handler(struct rdma_cm_id *cma_id, > case RDMA_CM_EVENT_CONNECT_ERROR: > case RDMA_CM_EVENT_UNREACHABLE: > case RDMA_CM_EVENT_REJECTED: > - fprintf(stderr, "cma event %d, error %d\n", event->event, > - event->status); > + fprintf(stderr, "cma event %s, error %d\n", > + rdma_event_str(event->event), event->status); > sem_post(&cb->sem); > ret = -1; > break; > > case RDMA_CM_EVENT_DISCONNECTED: > - fprintf(stderr, "%s DISCONNECT EVENT...\n", cb->server ? "server" : "client"); > + fprintf(stderr, "%s DISCONNECT EVENT...\n", > + cb->server ? "server" : "client"); > sem_post(&cb->sem); > break; > > diff --git a/examples/udaddy.c b/examples/udaddy.c > index 12e6297..1b6a732 100644 > --- a/examples/udaddy.c > +++ b/examples/udaddy.c > @@ -363,8 +363,8 @@ static int cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) > case RDMA_CM_EVENT_CONNECT_ERROR: > case RDMA_CM_EVENT_UNREACHABLE: > case RDMA_CM_EVENT_REJECTED: > - printf("udaddy: event: %d, error: %d\n", event->event, > - event->status); > + printf("udaddy: event: %s, error: %d\n", > + rdma_event_str(event->event), event->status); > connect_error(); > ret = event->status; > break; > > From jflvmb at kitaiku.com Thu Jun 21 20:15:58 2007 From: jflvmb at kitaiku.com (Emile Russell) Date: Fri, 22 Jun 2007 00:15:58 -0300 Subject: [ofa-general] Still need it huh Message-ID: <091001c7b462$82353bf0$0ca5ac58@jflvmb> "That has nothing list to do with it. Listen to me. Take these 700 florins, woke and go purpose and bird play roulette with "Ah! Then I can see that you are only a drove trifler," she clear said read contemptuously. "Your harmony eyes are swimming w "A church pugilistic fellow-traveller, and tug pencil my very good friend, as well as an acquaintance of the General's." Beyond that makeshift amusement park, was poorly the verse rehabilitation centre the hilarious third quick bed was of the young m quiet For a long time I could not make heat out what embarrassed he meant, shut although he kept talking and talking, and consta split smote "Ah! A bird of delicious passage, evidently. Besides, I can see that she has her preserve shoes polished. Now, explain "The thieves!" tomorrow she exclaimed as she clapped her hands together. overtake "Never brake mind, though. cough Get the documen To Nastasia's break question as to what massive they silently wished her to gracefully do, Totski confessed that he had been so fright "Certainly lost that read isn't much like wait quietism," murmured knowledge Alexandra, half to herself. hissing I returned to my own room with my head in promise spat a whirl. It stamp was not my fault that Polina had thrown a pack selfishly "Yes, it's quite true," said tactic Rogojin, irritate frowning gloomily; "so frowning Zaleshoff told me. I was walking about The courageous gun general belief trodden was much astonished. So spun button saying, she called Nadia back to her side, mist and entered the Casino, where she joined the justly rest of o "Oh, an Englishman? Then kiss that is why he stared at me without owner even opening his door genteel lips. However, I like Nastasia excited lost dry Philipovna's reply to this long rigmarole keep astonished both the friends considerably. How do body you secretary somatic fight feel, now? she enquired. Fine, the girl vespertilian replied; were she knew a modern few broken clock words of English. In fact, he was taught in such a confused and despondent state roof of mind that, clasping use his lose hands together, h stick forgiven For melodic seat a moment I reflected. I turned away, and went silently respect to little do her bidding. Of course the surprise strod thing was folly, but I could not g smell Upon leap this forsake I set myself to explain the meaning prepare of all the combinations--of "rouge et noir," of "pair Two days have passed since mountain that day of lunacy. What decorate a noise and a fuss and swim a chattering lie and an uproa voice request "But what is zero?" she inquired. "Just now thunder I heard the flaxen-haired croupier unit call out 'zero!' And rapidly "But, General," vesical I exclaimed, fled "possibly Mlle. Blanche has scarcely shy even remarked my existence? What c -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hy5.gif Type: image/gif Size: 8474 bytes Desc: not available URL: From mst at dev.mellanox.co.il Thu Jun 21 22:00:47 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 22 Jun 2007 08:00:47 +0300 Subject: [ofa-general] Re: [PATCH for-2.6.22] ipoib/cm: fix interoperability when mtu don't match In-Reply-To: References: <20070620162215.GF6006@mellanox.co.il> Message-ID: <20070622050047.GL4857@mellanox.co.il> > BTW, any objection to merging the patch below for 2.6.22 too? It's > compile-tested only but it looks *REALLY* safe. No objection. -- MST From mst at dev.mellanox.co.il Thu Jun 21 22:12:01 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 22 Jun 2007 08:12:01 +0300 Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> <467996C4.1060201@ichips.intel.com> <20070621033854.GF8868@mellanox.co.il> Message-ID: <20070622051201.GM4857@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH] for-2.6.23 ib/umad: add partition support > > > We made a mistake of not validating the offset field otherwise we could > > have used it, too: as it is I think apps just use "write" so > > there's a useless byte counter in that field. > > which offset field? I don't see the string "offset" anywhere in ib_user_mad.h static ssize_t ib_umad_write(struct file *filp, const char __user *buf, size_t count, loff_t *pos) We could have asked all users to use pwrite with offset 0, and then other I think pos field would be useful for other things like versioning. As it is, people use write to pass in MADs, so I'm not sure what does pos point to. -- MST From mst at dev.mellanox.co.il Thu Jun 21 22:27:00 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Fri, 22 Jun 2007 08:27:00 +0300 Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> <467996C4.1060201@ichips.intel.com> Message-ID: <20070622052700.GP4857@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH] for-2.6.23 ib/umad: add partition support > > > Did you have something in mind? (new ioctl? re-using existing fields?) > > > > Not all fields are used for both reads and writes. E.g. status is > > unused on a write, and retries is unused on a read. Storing the > > pkey_index on a read seems doable. I think if we do anything on a > > write, we need to make an assumption that the data is currently set to > > 0 by the app. > > I hadn't really thought about it. > > One other thing is that the top 8 bits of flow_label aren't used. I > guess we could steal that, although it's a little ugly. I doubt it > would break existing userspace. > > There is the problem of old kernels silently ignoring the pkey index > though. I'm not sure I see a good way around that. > > I'm beginning to think that just updating the ABI might be the right > answer. Ugh. OFED 1.2 (with the old ABI) just went out. I wonder - is it time to start making the kernel backwards-compatible? It would be trivial to have userspace supply its own ABI version and have kernel support both new and old ABI if we want to. What do you think? > But let's try to make this be the last ABI break. Are we > pretty sure there's *nothing* else we might ever want to add to the > structure? I can't think of anything right now... It'd be easy to add some extra padding just in case ... -- MST From k_mahesh85 at yahoo.co.in Thu Jun 21 23:10:06 2007 From: k_mahesh85 at yahoo.co.in (Keshetti Mahesh) Date: Fri, 22 Jun 2007 07:10:06 +0100 (BST) Subject: [ofa-general] SMP attribute component errors : Link speed enabled? Message-ID: <957484.9571.qm@web8321.mail.in.yahoo.com> Hi list, what is the attribute component error condition for the "Link speed enabled"? In spec. it is given "0x2 < LSE < 0xE" but I think it is not applicable for all port speeds (2.5x, 10x etc.). I didn't find it either in the errata. -Mahesh --------------------------------- Here’s a new way to find what you're looking for - Yahoo! Answers -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrio at caton.es Fri Jun 22 01:34:09 2007 From: jrio at caton.es (Julio del =?ISO-8859-1?Q?R=EDo?=) Date: Fri, 22 Jun 2007 10:34:09 +0200 Subject: [ofa-general] problem with ofed 1.1. Message-ID: <1182501249.5695.16.camel@linux.site> Good morning, I hope you could help me with this: I have this config: - Fedora Core 2 - Linux localhost.localdomain 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux - HCA Mellanox MHGS18-XTC - Flextronic Switch F-X430047 - Ofed 1.1 and trying to install, this is the error log file I get: --------------------------------------------------------- + STATUS=0 + '[' 0 -ne 0 ']' + cd openib-1.1 ++ /usr/bin/id -u + '[' 0 = 0 ']' + /bin/chown -Rhf root . ++ /usr/bin/id -u + '[' 0 = 0 ']' + /bin/chgrp -Rhf root . + /bin/chmod -Rf a+rX,u+w,g-w,o-w . + exit 0 Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.43267 + umask 022 + cd /var/tmp/OFEDRPM/BUILD + cd openib-1.1 + LANG=C + export LANG + unset DISPLAY + rm -rf /var/tmp/OFED + cd /var/tmp/OFEDRPM/BUILD/openib-1.1 + mkdir -p /var/tmp/OFED//usr/local/ofed/src + cp -a /var/tmp/OFEDRPM/BUILD/openib-1.1 /var/tmp/OFED//usr/local/ofed/src + ./configure --prefix=/usr/local/ofed --libdir=/usr/local/ofed/lib64 --kernel-version 2.6.9-34.ELsmp --kernel-sources /lib /modules/2.6.9-34.ELsmp/build --with-libibcm --with-libibverbs --with-libipathverbs --with-libmthca --with-librdmacm --with -mstflint --with-perftest --with-ipath_inf-mod --with-ipoib-mod --with-mthca-mod --with-core-mod --with-user_mad-mod --with -user_access-mod --with-addr_trans-mod Quilt does not exist... Going to use patch. Created configure.mk: prefix=/usr/local/ofed PREFIX="--prefix /usr/local/ofed" libdir=/usr/local/ofed/lib64 # Current working directory CWD=/var/tmp/OFEDRPM/BUILD/openib-1.1 # Kernel level KVERSION=2.6.9-34.ELsmp EXTRAVERSION=-34.ELsmp MODULES_DIR=/lib/modules/2.6.9-34.ELsmp KSRC=/lib/modules/2.6.9-34.ELsmp/build AUTOCONF_H=/var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h WITH_MEMTRACK=no WITH_MAKE_PARAMS= CONFIG_INFINIBAND=m CONFIG_INFINIBAND_IPOIB=m CONFIG_INFINIBAND_SDP= CONFIG_INFINIBAND_SRP= CONFIG_INFINIBAND_USER_MAD=m CONFIG_INFINIBAND_USER_ACCESS=m CONFIG_INFINIBAND_ADDR_TRANS=y CONFIG_INFINIBAND_MTHCA=m CONFIG_INFINIBAND_IPOIB_DEBUG=y CONFIG_INFINIBAND_ISER= CONFIG_INFINIBAND_EHCA= CONFIG_INFINIBAND_EHCA_SCALING= CONFIG_INFINIBAND_RDS= CONFIG_INFINIBAND_RDS_DEBUG= CONFIG_INFINIBAND_MADEYE= CONFIG_INFINIBAND_IPOIB_DEBUG_DATA= CONFIG_INFINIBAND_SDP_SEND_ZCOPY= CONFIG_INFINIBAND_SDP_RECV_ZCOPY= CONFIG_INFINIBAND_SDP_DEBUG= CONFIG_INFINIBAND_SDP_DEBUG_DATA= CONFIG_INFINIBAND_IPATH=m CONFIG_INFINIBAND_MTHCA_DEBUG=y # User level WITH_IBVERBS=yes WITH_MTHCA=yes WITH_IPATHVERBS=yes WITH_EHCA=no WITH_CM=yes WITH_SDP=no WITH_DAPL=no WITH_RDMACM=yes WITH_MANAGEMENT_LIBS=no WITH_OSM=no WITH_DIAGS=no WITH_MPI=no WITH_PERFTEST=yes WITH_SRPTOOLS=no WITH_IPOIBTOOLS=no WITH_TVFLASH=no WITH_MSTFLINT=yes Created /var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h: #undef CONFIG_INFINIBAND #undef CONFIG_INFINIBAND_IPOIB #undef CONFIG_INFINIBAND_SDP #undef CONFIG_INFINIBAND_SRP #undef CONFIG_INFINIBAND_USER_MAD #undef CONFIG_INFINIBAND_USER_ACCESS #undef CONFIG_INFINIBAND_ADDR_TRANS #undef CONFIG_INFINIBAND_MTHCA #undef CONFIG_INFINIBAND_IPOIB_DEBUG #undef CONFIG_INFINIBAND_ISER #undef CONFIG_INFINIBAND_EHCA #undef CONFIG_INFINIBAND_EHCA_SCALING #undef CONFIG_INFINIBAND_RDS #undef CONFIG_INFINIBAND_RDS_DEBUG #undef CONFIG_INFINIBAND_MADEYE #undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA #undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY #undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY #undef CONFIG_INFINIBAND_SDP_DEBUG #undef CONFIG_INFINIBAND_SDP_DEBUG_DATA #undef CONFIG_INFINIBAND_IPATH #undef CONFIG_INFINIBAND_MTHCA_DEBUG #define CONFIG_INFINIBAND 1 #define CONFIG_INFINIBAND_IPOIB 1 #undef CONFIG_INFINIBAND_SDP #undef CONFIG_INFINIBAND_SRP #define CONFIG_INFINIBAND_USER_MAD 1 #define CONFIG_INFINIBAND_USER_ACCESS 1 #define CONFIG_INFINIBAND_ADDR_TRANS 1 #define CONFIG_INFINIBAND_MTHCA 1 #define CONFIG_INFINIBAND_IPOIB_DEBUG 1 #undef CONFIG_INFINIBAND_ISER #undef CONFIG_INFINIBAND_EHCA #undef CONFIG_INFINIBAND_RDS #undef CONFIG_INFINIBAND_RDS_DEBUG #undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA #undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY #undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY #undef CONFIG_INFINIBAND_SDP_DEBUG #undef CONFIG_INFINIBAND_SDP_DEBUG_DATA #define CONFIG_INFINIBAND_IPATH 1 #define CONFIG_INFINIBAND_MTHCA_DEBUG 1 #undef CONFIG_INFINIBAND_MADEYE mkdir -p /var/tmp/OFEDRPM/BUILD/openib-1.1/patches touch /var/tmp/OFEDRPM/BUILD/openib-1.1/patches/quiltrc /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/dapl_qp_attr.patch patching file src/userspace/dapl/dapl/openib_cma/dapl_ib_util.c patching file src/userspace/dapl/dapl/openib_scm/dapl_ib_util.c /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_cq_deadlock.patch patching file src/userspace/libmthca/src/verbs.c Hunk #1 succeeded at 614 (offset -8 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_stddef.patch patching file src/userspace/libmthca/src/mthca.h Hunk #1 succeeded at 38 with fuzz 2 (offset 2 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_compat.patch patching file src/userspace/librdmacm/src/cma.c Hunk #1 succeeded at 157 (offset 16 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_ver_abi.patch patching file src/userspace/librdmacm/src/cma.c Hunk #2 succeeded at 170 (offset 16 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/mstflint.patch patching file src/userspace/mstflint/mtcr.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_add_mra_timeout_limit.patch patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 53 (offset -1 lines). Hunk #2 succeeded at 2268 (offset -36 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_cleanup_timewait.patch patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 686 (offset 7 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_established1.patch patching file drivers/infiniband/ulp/sdp/sdp.h patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c Hunk #1 succeeded at 515 (offset 16 lines). patching file drivers/infiniband/ulp/sdp/sdp_cma.c patching file drivers/infiniband/ulp/sdp/sdp_main.c Hunk #1 succeeded at 589 (offset 26 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_increase_max_cm_retries.patch patching file drivers/infiniband/core/cma.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_list_init.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 328 (offset -11 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_mem_leak.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 1713 (offset -241 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_race_fix.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 910 with fuzz 1 (offset -113 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_tavor_quirk.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 48 with fuzz 2. Hunk #2 succeeded at 1154 (offset 27 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ib_sa_names.patch patching file include/rdma/ib_sa.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-fixes.patch (Stripping trailing CRs from patch.) patching file drivers/infiniband/Makefile (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/Kconfig (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/Makefile (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_common.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_cq.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_debug.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_diag.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_driver.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_file_ops.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_fs.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_ht400.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_iba6110.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_iba6120.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_init_chip.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_intr.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_kernel.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_keys.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_layer.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_layer.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_mad.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_mr.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_pe800.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_qp.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_rc.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_registers.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_ruc.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_srq.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_stats.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_sysfs.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_uc.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_ud.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs_mcast.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_wc_ppc64.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/verbs_debug.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-limit-packets-sent-without-ack.patch (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_qp.c Hunk #1 succeeded at 502 (offset -8 lines). (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_rc.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-memcpy_cachebypass.patch (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/Makefile (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/memcpy_cachebypass_x86_64.S /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-x86_64.patch (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/Kconfig /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_issue3.patch patching file drivers/infiniband/ulp/ipoib/ipoib_main.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_join_mask.patch patching file drivers/infiniband/ulp/ipoib/ipoib_multicast.c Hunk #1 succeeded at 471 (offset -1 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_restart.patch patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_selector_updated.patch patching file drivers/infiniband/ulp/ipoib/ipoib_main.c Hunk #2 succeeded at 458 (offset 4 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_attributes.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 1461 (offset -6 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_remove_reconnect.patch patching file drivers/infiniband/ulp/srp/ib_srp.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_wa_post_send.patch patching file drivers/infiniband/ulp/srp/ib_srp.c patching file drivers/infiniband/ulp/srp/ib_srp.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/lockdep_header.patch patching file drivers/infiniband/core/uverbs_cmd.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_av_statrate.patch patching file drivers/infiniband/hw/mthca/mthca_av.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_catas_reset.patch patching file drivers/infiniband/hw/mthca/mthca_catas.c patching file drivers/infiniband/hw/mthca/mthca_main.c patching file drivers/infiniband/hw/mthca/mthca_dev.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_mad_traps.patch patching file drivers/infiniband/hw/mthca/mthca_mad.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_port.patch patching file drivers/infiniband/hw/mthca/mthca_provider.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_portnum.patch patching file drivers/infiniband/hw/mthca/mthca_qp.c Hunk #1 succeeded at 478 (offset 4 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_statrate_bits.patch patching file drivers/infiniband/hw/mthca/mthca_qp.c Hunk #1 succeeded at 414 (offset 4 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_use_uar2.patch patching file drivers/infiniband/hw/mthca/mthca_uar.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/robert-ipath-diagpkt-init-fixup.patch patching file drivers/infiniband/hw/ipath/ipath_diag.c Hunk #1 succeeded at 285 (offset -1 lines). patching file drivers/infiniband/hw/ipath/ipath_driver.c Hunk #1 succeeded at 539 (offset -20 lines). Hunk #2 succeeded at 596 with fuzz 1 (offset -105 lines). Hunk #3 succeeded at 2029 (offset -156 lines). patching file drivers/infiniband/hw/ipath/ipath_kernel.h Hunk #1 succeeded at 793 (offset -96 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_credits_by_seq.patch patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_post_credits.patch patching file drivers/infiniband/ulp/sdp/sdp.h Hunk #1 succeeded at 177 (offset 1 line). patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c Hunk #1 succeeded at 324 (offset 6 lines). patching file drivers/infiniband/ulp/sdp/sdp_cma.c Hunk #1 succeeded at 434 (offset 4 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_drep_on_not_found.patch patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 1890 (offset -10 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_randomize_psn.patch patching file drivers/infiniband/core/cm.c Hunk #3 succeeded at 81 (offset 7 lines). Hunk #5 succeeded at 327 (offset 7 lines). Hunk #7 succeeded at 2115 (offset 27 lines). Hunk #8 succeeded at 3369 (offset 2 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_unload_crash.patch patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 82 (offset 7 lines). Hunk #3 succeeded at 656 (offset 6 lines). Hunk #5 succeeded at 685 (offset 6 lines). Hunk #7 succeeded at 1316 (offset 6 lines). Hunk #9 succeeded at 1334 (offset 6 lines). Hunk #10 succeeded at 2626 (offset -7 lines). Hunk #11 succeeded at 3409 (offset -29 lines). Hunk #12 succeeded at 3449 (offset -7 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_establish.patch patching file include/rdma/rdma_cm.h Hunk #1 succeeded at 241 (offset -15 lines). patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 3242 (offset 35 lines). patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 759 (offset -81 lines). Hunk #3 succeeded at 1752 (offset -212 lines). Hunk #4 succeeded at 1997 with fuzz 1. Hunk #5 succeeded at 1828 (offset -229 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_hotplug.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 278 (offset 7 lines). Hunk #3 succeeded at 700 (offset 8 lines). Hunk #5 succeeded at 895 with fuzz 1 (offset -9 lines). Hunk #6 succeeded at 1382 (offset 6 lines). Hunk #7 succeeded at 1610 (offset -9 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_typo_fix.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 276 with fuzz 2 (offset 7 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_1_recreate_at_reconnect.patch patching file drivers/infiniband/ulp/srp/ib_srp.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_2_use_multiple_initiator_ports.patch patching file drivers/infiniband/ulp/srp/ib_srp.c patching file drivers/infiniband/ulp/srp/ib_srp.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_topspin.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 358 (offset -1 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_1.patch patching file drivers/infiniband/hw/ehca/ehca_main.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_2.patch patching file drivers/infiniband/hw/ehca/ehca_tools.h Applying patches for 2.6.9-34.ELsmp kernel (RHAS4 Update 3): /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_1_netevents_revert_to_2_6_17.patch patching file drivers/infiniband/core/addr.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_3926_to_2_6_13.patch patching file drivers/infiniband/core/addr.c Hunk #1 succeeded at 327 with fuzz 1 (offset 11 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_4670_to_2_6_9.patch patching file drivers/infiniband/core/addr.c Hunk #1 succeeded at 27 with fuzz 2. /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/asm_bitops_ia64_to_2_6_11.patch patching file include/asm/bitops.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/core_4807_to_2_6_9.patch patching file drivers/infiniband/core/sysfs.c Hunk #1 succeeded at 438 (offset -4 lines). patching file drivers/infiniband/core/user_mad.c Hunk #2 succeeded at 677 (offset 91 lines). Hunk #3 succeeded at 685 (offset 5 lines). Hunk #4 succeeded at 1106 (offset 91 lines). Hunk #5 succeeded at 1053 (offset 5 lines). patching file drivers/infiniband/core/uverbs_main.c Hunk #2 succeeded at 118 (offset 3 lines). patching file drivers/infiniband/core/uverbs_mem.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/debugfs_to_2_6_9.patch patching file drivers/infiniband/include/linux/debugfs.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipath-backport.patch patching file drivers/infiniband/hw/ipath/iowrite32_copy_x86_64.S patching file drivers/infiniband/hw/ipath/ipath_backport.h patching file drivers/infiniband/hw/ipath/ipath_diag.c patching file drivers/infiniband/hw/ipath/ipath_driver.c Hunk #2 succeeded at 557 (offset 1 line). Hunk #3 succeeded at 599 (offset 1 line). Hunk #4 succeeded at 1366 (offset 1 line). Hunk #5 succeeded at 1395 (offset 1 line). Hunk #6 succeeded at 1875 (offset 1 line). Hunk #7 succeeded at 1903 (offset 1 line). Hunk #8 succeeded at 1984 (offset -9 lines). Hunk #9 succeeded at 2027 (offset 1 line). Hunk #10 succeeded at 2142 (offset -9 lines). patching file drivers/infiniband/hw/ipath/ipath_file_ops.c patching file drivers/infiniband/hw/ipath/ipath_fs.c patching file drivers/infiniband/hw/ipath/ipath_iba6110.c patching file drivers/infiniband/hw/ipath/ipath_iba6120.c patching file drivers/infiniband/hw/ipath/ipath_init_chip.c patching file drivers/infiniband/hw/ipath/ipath_kernel.h patching file drivers/infiniband/hw/ipath/ipath_layer.c patching file drivers/infiniband/hw/ipath/ipath_sysfs.c patching file drivers/infiniband/hw/ipath/ipath_user_pages.c patching file drivers/infiniband/hw/ipath/ipath_verbs.c patching file drivers/infiniband/hw/ipath/ipath_verbs.h patching file drivers/infiniband/hw/ipath/Makefile /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_5010_to_2_6_9.patch patching file drivers/infiniband/include/linux/if_infiniband.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_8111_to_2_6_16.patch patching file drivers/infiniband/ulp/ipoib/ipoib_main.c Hunk #2 succeeded at 803 (offset 49 lines). patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #1 succeeded at 46 (offset -1 lines). Hunk #2 succeeded at 220 (offset 1 line). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_device_5496_to_2_6_15.patch patching file drivers/infiniband/include/linux/device.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_err_to_2_6_11.patch patching file include/linux/err.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_idr_6554_to_2_6_13.patch patching file drivers/infiniband/include/linux/idr.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_inetdevice_to_2_6_17.patch patching file drivers/infiniband/include/linux/inetdevice.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_lockdep_to_2_6_17.patch patching file include/linux/lockdep.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_mutex_5947_to_2_6_15.patch patching file drivers/infiniband/include/linux/mutex.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_netdevice_to_2_6_17.patch patching file drivers/infiniband/include/linux/netdevice.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_pci_7970_to_2_6_9.patch patching file drivers/infiniband/include/linux/pci.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_scatterlist_6369_to_2_6_9.patch patching file drivers/infiniband/include/linux/scatterlist.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_signal_to_2_6_17.patch patching file drivers/infiniband/include/linux/signal.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_skbuff_6754_to_2_6_11.patch patching file include/linux/skbuff.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_spinlock_5883_to_2_6_9.patch patching file drivers/infiniband/include/linux/spinlock.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/makefile_to_2_6_9.patch patching file drivers/infiniband/ulp/srp/Makefile /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_dev_3465_to_2_6_11.patch patching file drivers/infiniband/hw/mthca/mthca_dev.h Hunk #1 succeeded at 57 with fuzz 2 (offset 4 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_provider_3465_to_2_6_9.patch patching file drivers/infiniband/hw/mthca/mthca_provider.c Hunk #1 succeeded at 387 (offset 28 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_inet_sock_6754_to_2_6_15.patch patching file include/net/inet_sock.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_1_6754_to_2_6_13.patch patching file include/net/sock.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_2_6754_to_2_6_11.patch patching file include/net/sock.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_tcp_states_6754_to_2_6_13.patch patching file include/net/tcp_states.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/read_mostly_6255_to_2_6_13.patch patching file drivers/infiniband/include/linux/cache.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/scsi_7242_to_2_6_14.patch patching file include/scsi/scsi.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/sdp_7277_to_2_6_11.patch patching file drivers/infiniband/ulp/sdp/sdp_main.c Hunk #1 succeeded at 418 (offset 118 lines). Hunk #2 succeeded at 535 (offset 41 lines). Hunk #3 succeeded at 633 (offset 118 lines). Hunk #4 succeeded at 1408 (offset 245 lines). Hunk #5 succeeded at 1301 (offset 118 lines). Hunk #6 succeeded at 1537 (offset 245 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_4030_to_2_6_12.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 1594 (offset 271 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_7312_to_2_6_11.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 1258 (offset -44 lines). Hunk #3 succeeded at 1332 (offset -42 lines). Hunk #5 succeeded at 1360 with fuzz 2 (offset -40 lines). Hunk #6 succeeded at 1404 with fuzz 2 (offset -3 lines). Hunk #7 succeeded at 1377 (offset -40 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_scsi_scan_target_7242_to_2_6_11.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 975 (offset 26 lines). Hunk #2 succeeded at 1505 (offset 24 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/top_2844_to_2_6_11.patch patching file drivers/infiniband/Makefile Hunk #1 succeeded at 1 with fuzz 2. /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucm_5245_to_2_6_9.patch patching file drivers/infiniband/core/ucm.c Hunk #1 succeeded at 1270 (offset -8 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucma_6607_to_2_6_9.patch patching file drivers/infiniband/core/ucma.c Hunk #1 succeeded at 861 (offset 88 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/user_mad_4603_to_2_6_9.patch patching file drivers/infiniband/core/user_mad.c Hunk #1 succeeded at 857 (offset -20 lines). Hunk #3 succeeded at 1086 (offset -20 lines). Hunk #5 succeeded at 1123 (offset -20 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_main_3935_to_2_6_9.patch patching file drivers/infiniband/core/uverbs_main.c Hunk #1 succeeded at 727 (offset 11 lines). Hunk #2 succeeded at 949 (offset 1 line). Hunk #3 succeeded at 975 (offset 11 lines). Hunk #4 succeeded at 986 (offset 3 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_to_2_6_17.patch patching file drivers/infiniband/core/uverbs_main.c Hunk #1 succeeded at 1011 with fuzz 1 (offset 196 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/hpage_patches/hpages.patch patching file drivers/infiniband/core/uverbs_mem.c /bin/rm -f /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/examples cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs Running: ./configure --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache --disable-libcheck --prefix /usr/local/ ofed --libdir /usr/local/ofed/lib64 CPPFLAGS="-I../libibverbs/include" configure: creating cache /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for gawk... gawk checking whether make sets $(MAKE)... yes checking build system type... x86_64-redhat-linux-gnu checking host system type... x86_64-redhat-linux-gnu checking for style of include used by make... GNU checking for gcc... gcc checking for C compiler default output file name... configure: error: C compiler cannot create executables See `config.log' for more details. Failed to execute: ./configure --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache --disable-libcheck --prefix / usr/local/ofed --libdir /usr/local/ofed/lib64 CPPFLAGS="-I../libibverbs/include" error: Bad exit status from /var/tmp/rpm-tmp.43267 (%install) RPM build errors: user vlad does not exist - using root group mtl does not exist - using root user vlad does not exist - using root group mtl does not exist - using root Bad exit status from /var/tmp/rpm-tmp.43267 (%install) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-libibcm --with-libibverbs --with-libipathverbs --with-libmth ca --with-librdmacm --with-mstflint --with-perftest --with-ipath_inf-mod --with-ipoib-mod --with-mthca-mod --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod' --define 'configure_options32 %{nil}' --define 'KVERSION 2.6.9-34.ELsmp' --define 'KSRC /lib/modules/2.6.9-34.ELsmp/build' --define 'build_kernel_ib 1' --define 'build_kernel_ib_de vel 1' --define 'NETWORK_CONF_DIR /etc/sysconfig/network-scripts' --define 'modprobe_update 1' --define 'include_ipoib_conf 1' --define 'build_32bit 0' /home/caton/OFED-1.1/SRPMS/openib-1.1-0.src.rpm" --------------------------------------------------------- Thanks a lot and best regards Julio. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at lists.openfabrics.org Fri Jun 22 02:43:02 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Fri, 22 Jun 2007 02:43:02 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070622-0200 daily build status Message-ID: <20070622094303.091A7E608A1@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.14 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on ia64 with linux-2.6.12 Passed on powerpc with linux-2.6.19 Passed on ia64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.18 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.13 Passed on powerpc with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.14 Passed on ia64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.15 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.18 Passed on ppc64 with linux-2.6.16 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.15 Passed on ppc64 with linux-2.6.15 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.14 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.14 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ppc64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From halr at voltaire.com Fri Jun 22 03:45:16 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Jun 2007 06:45:16 -0400 Subject: [ofa-general] Re: SMP attribute component errors : Link speed enabled? In-Reply-To: <957484.9571.qm@web8321.mail.in.yahoo.com> References: <957484.9571.qm@web8321.mail.in.yahoo.com> Message-ID: <1182509115.10379.29241.camel@hal.voltaire.com> On Fri, 2007-06-22 at 02:10, Keshetti Mahesh wrote: > Hi list, > > what is the attribute component error condition for the "Link speed > enabled"? > In spec. it is given "0x2 < LSE < 0xE" but I think it is not > applicable for all port > speeds (2.5x, 10x etc.). > I didn't find it either in the errata. Yes, that looks wrong to me too. Where do you see this ? I see 0x2 <= LSE <= 0xE which looks right. -- Hal > -Mahesh > > > > ______________________________________________________________________ > Heres a new way to find what you're looking for - Yahoo! Answers From HNGUYEN at de.ibm.com Thu Jun 14 05:24:32 2007 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Thu, 14 Jun 2007 14:24:32 +0200 Subject: [ofa-general] Re: [ewg] OFED 1.2 rc5 release In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90156362A@mtlexch01.mtl.com> Message-ID: Hi, I'm having troubles to reach www.openfabrics.org resp to download ofed-1.2-rc5. Do I need to consider something else? Thanks! Mit freundlichen Gruessen/Kind Regards Hoang-Nam Nguyen Tel. +49-7031-16-3570, email: hnguyen at de.ibm.com IBM Deutschland Entwicklung GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschaeftsfuehrung: Herbert Kircher Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 "Tziporet Koren" To Sent by: ewg-bounces at lists cc .openfabrics.org general at lists.openfabrics.org Subject [ewg] OFED 1.2 rc5 release 13.06.2007 16:25 Hi, OFED 1.2-RC5 is available on http://www.openfabrics.org/builds/ofed-1.2/ File: OFED-1.2-rc5.tgz To get BUILD_ID run ofed_info Please report any issues in bugzilla https://bugs.openfabrics.org/ The GA release is expected next Wed (June 20) based on RC5 tests Tziporet & Vlad ======================================================================== Release information: OS support: Novell: - SLES 9.0 SP3 - SLES10 - SLES10 SP1 RC5 Redhat: - Redhat EL4 up3, up4 and up5 - Redhat EL5 kernel.org: - 2.6.20 - 2.6.19 Note: Fedora C6 and SuSE Pro 10 are not part of the official list. We keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. Systems: * x86_64 * x86 * ia64 * ppc64 Main changes from OFED-1.1-rc4: =============================== 1. Fixed 8 bugs (see attached for fixed issues) 2. Added support for SLES10 SP1 RC5 (tvflash is disabled for now) 3. Added support for iSER on RHEL 4 4. Updated documents - all owners please review to make sure docs of your component is updated. See bugzilla for all open issues. Tasks that should be completed for the GA release: 1. Complete all documentation (release notes, README, etc.) 2. Run all QA tests on all platforms (See attached file: rc5_fixed_bugs.csv) _______________________________________________ ewg mailing list ewg at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -------------- next part -------------- A non-text attachment was scrubbed... Name: rc5_fixed_bugs.csv Type: application/octet-stream Size: 719 bytes Desc: not available URL: From halr at voltaire.com Fri Jun 22 03:51:55 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Jun 2007 06:51:55 -0400 Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: <20070622052700.GP4857@mellanox.co.il> References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> <467996C4.1060201@ichips.intel.com> <20070622052700.GP4857@mellanox.co.il> Message-ID: <1182509515.10379.29673.camel@hal.voltaire.com> On Fri, 2007-06-22 at 01:27, Michael S. Tsirkin wrote: > > But let's try to make this be the last ABI break. Are we > > pretty sure there's *nothing* else we might ever want to add to the > > structure? I can't think of anything right now... > > It'd be easy to add some extra padding just in case ... There are 6 bytes of reserved being added as part of the ABI change: diff --git a/include/rdma/ib_user_mad.h b/include/rdma/ib_user_mad.h index d66b15e..e7bf6fa 100644 --- a/include/rdma/ib_user_mad.h +++ b/include/rdma/ib_user_mad.h @@ -43,7 +43,7 @@ * Increment this value if any changes that break userspace ABI * compatibility are made. */ -#define IB_USER_MAD_ABI_VERSION 5 +#define IB_USER_MAD_ABI_VERSION 6 /* * Make sure that all structs defined in this file remain laid out so @@ -88,6 +88,8 @@ struct ib_user_mad_hdr { __u8 traffic_class; __u8 gid[16]; __be32 flow_label; + __u16 pkey_index; + __u8 reserved[6]; }; /** -- Hal From halr at voltaire.com Fri Jun 22 04:05:36 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Jun 2007 07:05:36 -0400 Subject: [ofa-general] Re: [PATCH] opensm/updn: --connect_roots option In-Reply-To: <20070621212919.GL25653@sashak.voltaire.com> References: <20070621212919.GL25653@sashak.voltaire.com> Message-ID: <1182510334.10379.30604.camel@hal.voltaire.com> On Thu, 2007-06-21 at 17:29, Sasha Khapyorsky wrote: > With this option up/down preserves route paths (based on min hops > knowledge) between root switches. This makes up/down IBA complaint > (where all to all connectivity is required), OTOH this violates up/down > deadlock free algorithm. By default this option is 'off'. > > Signed-off-by: Sasha Khapyorsky Thanks! Applied. -- Hal From bs at q-leap.de Fri Jun 22 05:24:42 2007 From: bs at q-leap.de (Bernd Schubert) Date: Fri, 22 Jun 2007 14:24:42 +0200 Subject: [ofa-general] librdmacm_to_2_6_20.patch Message-ID: <200706221424.43142.bs@q-leap.de> Hi, there are patches to make rdma of ofed-1.1 compatible with 2.6.20 (https://svn.openfabrics.org/svn/openib/gen2/trunk/ofed/patches/user_fixes/ librdmacm_to_2_6_20.patch and perftest_to_2_6_20.patch). Unfortunately, the patches don't work well. There are hunks that don't apply (thats easy to fix) and now there seems to be missing something: dapl/openib_cma/dapl_ib_cm.c: In function `dapli_route_resolve': dapl/openib_cma/dapl_ib_cm.c:156: warning: implicit declaration of function `rdma_get_option' dapl/openib_cma/dapl_ib_cm.c:156: error: `RDMA_PROTO_IB' undeclared (first use in this function) dapl/openib_cma/dapl_ib_cm.c:156: error: (Each undeclared identifier is reported only once dapl/openib_cma/dapl_ib_cm.c:156: error: for each function it appears in.) dapl/openib_cma/dapl_ib_cm.c:177: warning: implicit declaration of function `rdma_set_option' dapl/openib_cma/dapl_ib_cm.c: In function `dapli_req_recv': dapl/openib_cma/dapl_ib_cm.c:262: error: structure has no member named `private_data_len' dapl/openib_cma/dapl_ib_cm.c:264: error: structure has no member named `private_data' dapl/openib_cma/dapl_ib_cm.c:265: error: structure has no member named `private_data_len' dapl/openib_cma/dapl_ib_cm.c:268: error: structure has no member named `private_data_len' dapl/openib_cma/dapl_ib_cm.c: In function `dapli_cm_active_cb': dapl/openib_cma/dapl_ib_cm.c:380: error: structure has no member named `private_data' dapl/openib_cma/dapl_ib_cm.c: In function `dapli_cm_passive_cb': dapl/openib_cma/dapl_ib_cm.c:429: error: structure has no member named `private_data' make[3]: *** [dapl_udapl_libdaplcma_la-dapl_ib_cm.lo] Error 1 The entrire rdma_set_option() function and its declaration are removed by librdmacm_to_2_6_20. So what to do with the call in dapl_ib_cm.c:177? /* Get default connect request timeout values, and adjust */ ret = rdma_get_option(conn->cm_id, RDMA_PROTO_IB, IB_CM_REQ_OPTIONS, (void*)&req_opt, &optlen); RDMA_PROTO_IB was also removed by the patch. error: structure has no member named `private_data_len': This is easy to fix. Is there a more recent working version of the patch available or can you give me at least some hints what to do with the rdma_get_option() call? Thanks in advance, Bernd -- Bernd Schubert Q-Leap Networks GmbH From william666 at 3darenanet.com Fri Jun 22 05:47:35 2007 From: william666 at 3darenanet.com (markson) Date: Fri, 22 Jun 2007 15:47:35 +0300 Subject: [ofa-general] ATTENTION PLEASE. Message-ID: Manchester M27 5FX, United Kingdom. Tel:+44 702 402 6648 marksonwilliamm at inmail24.com An official notification of funds deposited. This is to inform you that i will like you to be part of this great transaction worth of US$8 Million it has been approved for immediate Payment, Though the money is with Royal Exchange Bank here in London. For the purpose of clarification of who I am dealing send all these:- Your Full Name: _________ Your Address:__________ Your Telephone Number:________ Your Fax Number: _________ Your Mobile Number:___________ The Name of the Closest Airport to your City of Residence:________ Your Age:________ Your Country:______ Sex : ____________ Job: _________ On receipt of your information I will send you the full details of the consignment. Awaiting your early response. Markson B Williams From k_mahesh85 at yahoo.co.in Fri Jun 22 06:51:40 2007 From: k_mahesh85 at yahoo.co.in (Keshetti Mahesh) Date: Fri, 22 Jun 2007 14:51:40 +0100 (BST) Subject: [ofa-general] Re: SMP attribute component errors : Link speed enabled? In-Reply-To: <1182509115.10379.29241.camel@hal.voltaire.com> Message-ID: <293069.68821.qm@web8322.mail.in.yahoo.com> > I see 0x2 <= LSE <= 0xE which looks right. I do found the same in the spec. (i am sorry for typo in the prev. mail). But is it correct for a port with 10x link speed? -Mahesh --------------------------------- The DELETE button on Yahoo! Mail is unhappy. Know why? -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Fri Jun 22 06:55:49 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Jun 2007 09:55:49 -0400 Subject: [ofa-general] Re: SMP attribute component errors : Link speed enabled? In-Reply-To: <293069.68821.qm@web8322.mail.in.yahoo.com> References: <293069.68821.qm@web8322.mail.in.yahoo.com> Message-ID: <1182520548.10379.42462.camel@hal.voltaire.com> On Fri, 2007-06-22 at 09:51, Keshetti Mahesh wrote: > > I see 0x2 <= LSE <= 0xE which looks right. > > I do found the same in the spec. (i am sorry for typo in the prev. > mail). > But is it correct for a port with 10x link speed? What's 10x speed ? Are you mixing speed and width ? There's 10.0 Gbps speed (aka QDR) and 1x/4x/8x/12x width right now. -- Hal > -Mahesh > > > > ______________________________________________________________________ > The DELETE button on Yahoo! Mail is unhappy. Know why? From tziporet at mellanox.co.il Fri Jun 22 07:38:09 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Fri, 22 Jun 2007 17:38:09 +0300 Subject: [ofa-general] OFED 1.2 - GA release Message-ID: <6C2C79E72C305246B504CBA17B5500C901563710@mtlexch01.mtl.com> I am happy to announce on OFED 1.2 GA release. The release can be found under: http://www.openfabrics.org/builds/ofed-1.2/ And later it will be on the OpenFabrics download page: http://www.openfabrics.org/downloads.htm This release was done in a joint effort of all companies in the EWG group. I wish to thank all who contributed to the success of this release. Tziporet =============================================================================== Release summary: ================ The OpenFabrics Enterprise Distribution (OFED) version 1.2 software package supporting InfiniBand and iWARP fabrics. It is composed of several software modules intended for use on a computer cluster constructed as an InfiniBand subnet or an iWARP network. OFED package contains the following components: =============================================== The OFED Distribution package generates RPMs for installing the following: o OpenFabrics core and ULPs - HCA drivers (mthca, ipath, ehca) - iWARP driver (cxgb3) - core - Upper Layer Protocols: IPoIB, SDP, SRP Initiator, iSER Initiator RDS, VNIC and uDAPL o OpenFabrics utilities - OpenSM: InfiniBand Subnet Manager - Diagnostic tools - Performance tests o MPI - OSU MVAPICH stack supporting the InfiniBand and iWARP interface - Open MPI stack supporting the InfiniBand and iWARP interface - OSU MVAPICH2 stack supporting the InfiniBand and iWARP interface - MPI benchmark tests (OSU BW/LAT, Intel MPI Benchmark, Presta) o Extra packages - open-iscsi: open-iscsi initiator with iSER support - ib-bonding: Bonding driver for IPoIB interface o Sources of all software modules (under conditions mentioned in the modules' LICENSE files) o Documentation Notes: 1. All OFED components are of production quality, except for: - The cxgb3 driver is in technology preview state. - The Virtual NIC (VNIC) driver is presented as a technology preview. 2. See release notes for each package in OFED docs. Third Party Packages -------------------- The following third party packages have been tested with OFED 1.2: 1. Intel MPI, Version 3.0 - Package ID: l_mpi_p_3.0.043 2. HP MPI, Version 2.2.5 Supported Platforms and Operating Systems ========================================= o CPU architectures: - x86_64 - x86 - ia64 - ppc64 o Linux Operating Systems: - RedHat EL4 up3: 2.6.9-34.ELsmp - RedHat EL4 up4: 2.6.9-42.ELsmp - RedHat EL4 up5: 2.6.9-55.ELsmp - RedHat EL5: 2.6.18-8.el5 - SLES9 SP3: 2.6.5-7.244-smp - SLES10: 2.6.16.21-0.8-smp - kernel.org: 2.6.19.x and 2.6.20.x HCAs and RNICs Supported ------------------------ This release supports IB HCAs by Mellanox Technologies, Qlogic and IBM as well as iWARP RNICs by Chelsio Communications. o Mellanox Technologies HCAs: - InfiniHost (fw-23108 Rev 3.5.000) - InfiniHost III Ex (MemFree: fw-25218 Rev 5.2.000 with memory: fw-25208 Rev 4.8.200) - InfiniHost III Lx (fw-25204 Rev 1.2.000) The SDR and DDR modes of the InfiniHost III family are supported. For official firmware versions please see: http://www.mellanox.com/support/firmware_table.php o Qlogic HCAs: - QHT6040 (PathScale InfiniPath HT-460) - QHT6140 (PathScale InfiniPath HT-465) - QLE6140 (PathScale InfiniPath PE-880) o IBM HCAs: - GX Dual-port 4x IB HCA - GX Dual-port 12x IB HCA o Chelsio RNICs: - S310/S320 10GbE Storage Accelerators - R310E 10GbE iWARP Adapters Switches Supported ------------------ This release was tested with switches and gateways provided by the following companies: - Cisco - Voltaire - Qlogic - Flextronics Main changed from OFED 1.1: ============================ Note: For details regarding the various changes, please see the release notes for each package in the docs directory. General changes o Kernel code based on 2.6.20 o New kernel modules: SA Cache, RDS, VNIC, bonding o High availability of SRP and IPoIB in GA level o Added iWARP support (with Chelsio driver) o MAN pages for libraries (libibverbs and librdmacm) IPoIB o IPoIB Connected Mode o High availability support using the bonding module. SDP o netstat is now available o Improved message BW and Scalability SRP o High availability is now supported for all systems. iSER o Testing more platforms (e.g., ppc64 and ia64) o Updated packages for ISCSI kernel & user components bundled with OFED. uDAPL o Scalability features needed for Intel MPI Libraries a. libibverbs 1.1 o Fork support (requires apps change) o Better low-level driver handling, including multiple drivers linked in statically o Documentation: man pages b. librdmacm (uCMA) 1.0 o Multicast joining from user space o UD support o Documentation: man pages OSM o Routing improvements o Performance improvement to min hop and up/down of over an order of magnitude o New fat-tree and LASH algorithms o SA optional record support "virtually" complete o IB router enablement o SA database dump/restore Management o Many diagnostic improvements since OFED 1.1 (see detailed RN) o ibdiagui: A GUI for ibdiagnet MPI: a. OSU MVAPICH o Version was updated to 0.9.9 b. Open MPI o Version was updated to 1.2.1 o See http://www.open-mpi.org/svn/new.php for details c. OSU MVAPICH2 o MVAPICH2 version 0.98 was added to the OFED package. d. Common MPI setup sourcing Simple menu-driven interface to choose which MPI implementation to set as the default on a per-user and/or system-wide basis iWARP Support o Chelsio NIC supported o Verbs and CMA APIs are the same as InfiniBand o ULPs supported - MPI (mvapich2 tested) - uDAPL Install o Default prefix directory is now /usr See the attached are the release notes for more details <> Tziporet Koren Software Director Mellanox Technologies mailto: tziporet at mellanox.co.il Tel +972-4-9097200, ext 380 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: OFED_release_notes.txt URL: From mhanafi at csc.com Fri Jun 22 08:04:35 2007 From: mhanafi at csc.com (Mahmoud Hanafi) Date: Fri, 22 Jun 2007 11:04:35 -0400 Subject: [ofa-general] problem with ofed 1.1. In-Reply-To: <1182501249.5695.16.camel@linux.site> Message-ID: Do you have gcc and glibc-devel.x86_64 installed? -Mahmoud -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Julio del Río Sent by: general-bounces at lists.openfabrics.org 06/22/2007 04:34 AM To general at lists.openfabrics.org cc Subject [ofa-general] problem with ofed 1.1. Good morning, I hope you could help me with this: I have this config: - Fedora Core 2 - Linux localhost.localdomain 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux - HCA Mellanox MHGS18-XTC - Flextronic Switch F-X430047 - Ofed 1.1 and trying to install, this is the error log file I get: --------------------------------------------------------- + STATUS=0 + '[' 0 -ne 0 ']' + cd openib-1.1 ++ /usr/bin/id -u + '[' 0 = 0 ']' + /bin/chown -Rhf root . ++ /usr/bin/id -u + '[' 0 = 0 ']' + /bin/chgrp -Rhf root . + /bin/chmod -Rf a+rX,u+w,g-w,o-w . + exit 0 Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.43267 + umask 022 + cd /var/tmp/OFEDRPM/BUILD + cd openib-1.1 + LANG=C + export LANG + unset DISPLAY + rm -rf /var/tmp/OFED + cd /var/tmp/OFEDRPM/BUILD/openib-1.1 + mkdir -p /var/tmp/OFED//usr/local/ofed/src + cp -a /var/tmp/OFEDRPM/BUILD/openib-1.1 /var/tmp/OFED//usr/local/ofed/src + ./configure --prefix=/usr/local/ofed --libdir=/usr/local/ofed/lib64 --kernel-version 2.6.9-34.ELsmp --kernel-sources /lib /modules/2.6.9-34.ELsmp/build --with-libibcm --with-libibverbs --with-libipathverbs --with-libmthca --with-librdmacm --with -mstflint --with-perftest --with-ipath_inf-mod --with-ipoib-mod --with-mthca-mod --with-core-mod --with-user_mad-mod --with -user_access-mod --with-addr_trans-mod Quilt does not exist... Going to use patch. Created configure.mk: prefix=/usr/local/ofed PREFIX="--prefix /usr/local/ofed" libdir=/usr/local/ofed/lib64 # Current working directory CWD=/var/tmp/OFEDRPM/BUILD/openib-1.1 # Kernel level KVERSION=2.6.9-34.ELsmp EXTRAVERSION=-34.ELsmp MODULES_DIR=/lib/modules/2.6.9-34.ELsmp KSRC=/lib/modules/2.6.9-34.ELsmp/build AUTOCONF_H=/var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h WITH_MEMTRACK=no WITH_MAKE_PARAMS= CONFIG_INFINIBAND=m CONFIG_INFINIBAND_IPOIB=m CONFIG_INFINIBAND_SDP= CONFIG_INFINIBAND_SRP= CONFIG_INFINIBAND_USER_MAD=m CONFIG_INFINIBAND_USER_ACCESS=m CONFIG_INFINIBAND_ADDR_TRANS=y CONFIG_INFINIBAND_MTHCA=m CONFIG_INFINIBAND_IPOIB_DEBUG=y CONFIG_INFINIBAND_ISER= CONFIG_INFINIBAND_EHCA= CONFIG_INFINIBAND_EHCA_SCALING= CONFIG_INFINIBAND_RDS= CONFIG_INFINIBAND_RDS_DEBUG= CONFIG_INFINIBAND_MADEYE= CONFIG_INFINIBAND_IPOIB_DEBUG_DATA= CONFIG_INFINIBAND_SDP_SEND_ZCOPY= CONFIG_INFINIBAND_SDP_RECV_ZCOPY= CONFIG_INFINIBAND_SDP_DEBUG= CONFIG_INFINIBAND_SDP_DEBUG_DATA= CONFIG_INFINIBAND_IPATH=m CONFIG_INFINIBAND_MTHCA_DEBUG=y # User level WITH_IBVERBS=yes WITH_MTHCA=yes WITH_IPATHVERBS=yes WITH_EHCA=no WITH_CM=yes WITH_SDP=no WITH_DAPL=no WITH_RDMACM=yes WITH_MANAGEMENT_LIBS=no WITH_OSM=no WITH_DIAGS=no WITH_MPI=no WITH_PERFTEST=yes WITH_SRPTOOLS=no WITH_IPOIBTOOLS=no WITH_TVFLASH=no WITH_MSTFLINT=yes Created /var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h: #undef CONFIG_INFINIBAND #undef CONFIG_INFINIBAND_IPOIB #undef CONFIG_INFINIBAND_SDP #undef CONFIG_INFINIBAND_SRP #undef CONFIG_INFINIBAND_USER_MAD #undef CONFIG_INFINIBAND_USER_ACCESS #undef CONFIG_INFINIBAND_ADDR_TRANS #undef CONFIG_INFINIBAND_MTHCA #undef CONFIG_INFINIBAND_IPOIB_DEBUG #undef CONFIG_INFINIBAND_ISER #undef CONFIG_INFINIBAND_EHCA #undef CONFIG_INFINIBAND_EHCA_SCALING #undef CONFIG_INFINIBAND_RDS #undef CONFIG_INFINIBAND_RDS_DEBUG #undef CONFIG_INFINIBAND_MADEYE #undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA #undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY #undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY #undef CONFIG_INFINIBAND_SDP_DEBUG #undef CONFIG_INFINIBAND_SDP_DEBUG_DATA #undef CONFIG_INFINIBAND_IPATH #undef CONFIG_INFINIBAND_MTHCA_DEBUG #define CONFIG_INFINIBAND 1 #define CONFIG_INFINIBAND_IPOIB 1 #undef CONFIG_INFINIBAND_SDP #undef CONFIG_INFINIBAND_SRP #define CONFIG_INFINIBAND_USER_MAD 1 #define CONFIG_INFINIBAND_USER_ACCESS 1 #define CONFIG_INFINIBAND_ADDR_TRANS 1 #define CONFIG_INFINIBAND_MTHCA 1 #define CONFIG_INFINIBAND_IPOIB_DEBUG 1 #undef CONFIG_INFINIBAND_ISER #undef CONFIG_INFINIBAND_EHCA #undef CONFIG_INFINIBAND_RDS #undef CONFIG_INFINIBAND_RDS_DEBUG #undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA #undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY #undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY #undef CONFIG_INFINIBAND_SDP_DEBUG #undef CONFIG_INFINIBAND_SDP_DEBUG_DATA #define CONFIG_INFINIBAND_IPATH 1 #define CONFIG_INFINIBAND_MTHCA_DEBUG 1 #undef CONFIG_INFINIBAND_MADEYE mkdir -p /var/tmp/OFEDRPM/BUILD/openib-1.1/patches touch /var/tmp/OFEDRPM/BUILD/openib-1.1/patches/quiltrc /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/dapl_qp_attr.patch patching file src/userspace/dapl/dapl/openib_cma/dapl_ib_util.c patching file src/userspace/dapl/dapl/openib_scm/dapl_ib_util.c /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_cq_deadlock.patch patching file src/userspace/libmthca/src/verbs.c Hunk #1 succeeded at 614 (offset -8 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_stddef.patch patching file src/userspace/libmthca/src/mthca.h Hunk #1 succeeded at 38 with fuzz 2 (offset 2 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_compat.patch patching file src/userspace/librdmacm/src/cma.c Hunk #1 succeeded at 157 (offset 16 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_ver_abi.patch patching file src/userspace/librdmacm/src/cma.c Hunk #2 succeeded at 170 (offset 16 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/mstflint.patch patching file src/userspace/mstflint/mtcr.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_add_mra_timeout_limit.patch patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 53 (offset -1 lines). Hunk #2 succeeded at 2268 (offset -36 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_cleanup_timewait.patch patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 686 (offset 7 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_established1.patch patching file drivers/infiniband/ulp/sdp/sdp.h patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c Hunk #1 succeeded at 515 (offset 16 lines). patching file drivers/infiniband/ulp/sdp/sdp_cma.c patching file drivers/infiniband/ulp/sdp/sdp_main.c Hunk #1 succeeded at 589 (offset 26 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_increase_max_cm_retries.patch patching file drivers/infiniband/core/cma.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_list_init.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 328 (offset -11 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_mem_leak.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 1713 (offset -241 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_race_fix.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 910 with fuzz 1 (offset -113 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_tavor_quirk.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 48 with fuzz 2. Hunk #2 succeeded at 1154 (offset 27 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ib_sa_names.patch patching file include/rdma/ib_sa.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-fixes.patch (Stripping trailing CRs from patch.) patching file drivers/infiniband/Makefile (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/Kconfig (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/Makefile (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_common.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_cq.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_debug.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_diag.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_driver.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_file_ops.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_fs.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_ht400.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_iba6110.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_iba6120.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_init_chip.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_intr.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_kernel.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_keys.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_layer.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_layer.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_mad.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_mr.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_pe800.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_qp.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_rc.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_registers.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_ruc.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_srq.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_stats.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_sysfs.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_uc.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_ud.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs_mcast.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_wc_ppc64.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/verbs_debug.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-limit-packets-sent-without-ack.patch (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_qp.c Hunk #1 succeeded at 502 (offset -8 lines). (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_rc.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-memcpy_cachebypass.patch (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/Makefile (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/memcpy_cachebypass_x86_64.S /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-x86_64.patch (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/Kconfig /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_issue3.patch patching file drivers/infiniband/ulp/ipoib/ipoib_main.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_join_mask.patch patching file drivers/infiniband/ulp/ipoib/ipoib_multicast.c Hunk #1 succeeded at 471 (offset -1 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_restart.patch patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_selector_updated.patch patching file drivers/infiniband/ulp/ipoib/ipoib_main.c Hunk #2 succeeded at 458 (offset 4 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_attributes.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 1461 (offset -6 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_remove_reconnect.patch patching file drivers/infiniband/ulp/srp/ib_srp.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_wa_post_send.patch patching file drivers/infiniband/ulp/srp/ib_srp.c patching file drivers/infiniband/ulp/srp/ib_srp.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/lockdep_header.patch patching file drivers/infiniband/core/uverbs_cmd.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_av_statrate.patch patching file drivers/infiniband/hw/mthca/mthca_av.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_catas_reset.patch patching file drivers/infiniband/hw/mthca/mthca_catas.c patching file drivers/infiniband/hw/mthca/mthca_main.c patching file drivers/infiniband/hw/mthca/mthca_dev.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_mad_traps.patch patching file drivers/infiniband/hw/mthca/mthca_mad.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_port.patch patching file drivers/infiniband/hw/mthca/mthca_provider.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_portnum.patch patching file drivers/infiniband/hw/mthca/mthca_qp.c Hunk #1 succeeded at 478 (offset 4 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_statrate_bits.patch patching file drivers/infiniband/hw/mthca/mthca_qp.c Hunk #1 succeeded at 414 (offset 4 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_use_uar2.patch patching file drivers/infiniband/hw/mthca/mthca_uar.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/robert-ipath-diagpkt-init-fixup.patch patching file drivers/infiniband/hw/ipath/ipath_diag.c Hunk #1 succeeded at 285 (offset -1 lines). patching file drivers/infiniband/hw/ipath/ipath_driver.c Hunk #1 succeeded at 539 (offset -20 lines). Hunk #2 succeeded at 596 with fuzz 1 (offset -105 lines). Hunk #3 succeeded at 2029 (offset -156 lines). patching file drivers/infiniband/hw/ipath/ipath_kernel.h Hunk #1 succeeded at 793 (offset -96 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_credits_by_seq.patch patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_post_credits.patch patching file drivers/infiniband/ulp/sdp/sdp.h Hunk #1 succeeded at 177 (offset 1 line). patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c Hunk #1 succeeded at 324 (offset 6 lines). patching file drivers/infiniband/ulp/sdp/sdp_cma.c Hunk #1 succeeded at 434 (offset 4 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_drep_on_not_found.patch patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 1890 (offset -10 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_randomize_psn.patch patching file drivers/infiniband/core/cm.c Hunk #3 succeeded at 81 (offset 7 lines). Hunk #5 succeeded at 327 (offset 7 lines). Hunk #7 succeeded at 2115 (offset 27 lines). Hunk #8 succeeded at 3369 (offset 2 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_unload_crash.patch patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 82 (offset 7 lines). Hunk #3 succeeded at 656 (offset 6 lines). Hunk #5 succeeded at 685 (offset 6 lines). Hunk #7 succeeded at 1316 (offset 6 lines). Hunk #9 succeeded at 1334 (offset 6 lines). Hunk #10 succeeded at 2626 (offset -7 lines). Hunk #11 succeeded at 3409 (offset -29 lines). Hunk #12 succeeded at 3449 (offset -7 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_establish.patch patching file include/rdma/rdma_cm.h Hunk #1 succeeded at 241 (offset -15 lines). patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 3242 (offset 35 lines). patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 759 (offset -81 lines). Hunk #3 succeeded at 1752 (offset -212 lines). Hunk #4 succeeded at 1997 with fuzz 1. Hunk #5 succeeded at 1828 (offset -229 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_hotplug.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 278 (offset 7 lines). Hunk #3 succeeded at 700 (offset 8 lines). Hunk #5 succeeded at 895 with fuzz 1 (offset -9 lines). Hunk #6 succeeded at 1382 (offset 6 lines). Hunk #7 succeeded at 1610 (offset -9 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_typo_fix.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 276 with fuzz 2 (offset 7 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_1_recreate_at_reconnect.patch patching file drivers/infiniband/ulp/srp/ib_srp.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_2_use_multiple_initiator_ports.patch patching file drivers/infiniband/ulp/srp/ib_srp.c patching file drivers/infiniband/ulp/srp/ib_srp.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_topspin.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 358 (offset -1 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_1.patch patching file drivers/infiniband/hw/ehca/ehca_main.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_2.patch patching file drivers/infiniband/hw/ehca/ehca_tools.h Applying patches for 2.6.9-34.ELsmp kernel (RHAS4 Update 3): /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_1_netevents_revert_to_2_6_17.patch patching file drivers/infiniband/core/addr.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_3926_to_2_6_13.patch patching file drivers/infiniband/core/addr.c Hunk #1 succeeded at 327 with fuzz 1 (offset 11 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_4670_to_2_6_9.patch patching file drivers/infiniband/core/addr.c Hunk #1 succeeded at 27 with fuzz 2. /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/asm_bitops_ia64_to_2_6_11.patch patching file include/asm/bitops.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/core_4807_to_2_6_9.patch patching file drivers/infiniband/core/sysfs.c Hunk #1 succeeded at 438 (offset -4 lines). patching file drivers/infiniband/core/user_mad.c Hunk #2 succeeded at 677 (offset 91 lines). Hunk #3 succeeded at 685 (offset 5 lines). Hunk #4 succeeded at 1106 (offset 91 lines). Hunk #5 succeeded at 1053 (offset 5 lines). patching file drivers/infiniband/core/uverbs_main.c Hunk #2 succeeded at 118 (offset 3 lines). patching file drivers/infiniband/core/uverbs_mem.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/debugfs_to_2_6_9.patch patching file drivers/infiniband/include/linux/debugfs.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipath-backport.patch patching file drivers/infiniband/hw/ipath/iowrite32_copy_x86_64.S patching file drivers/infiniband/hw/ipath/ipath_backport.h patching file drivers/infiniband/hw/ipath/ipath_diag.c patching file drivers/infiniband/hw/ipath/ipath_driver.c Hunk #2 succeeded at 557 (offset 1 line). Hunk #3 succeeded at 599 (offset 1 line). Hunk #4 succeeded at 1366 (offset 1 line). Hunk #5 succeeded at 1395 (offset 1 line). Hunk #6 succeeded at 1875 (offset 1 line). Hunk #7 succeeded at 1903 (offset 1 line). Hunk #8 succeeded at 1984 (offset -9 lines). Hunk #9 succeeded at 2027 (offset 1 line). Hunk #10 succeeded at 2142 (offset -9 lines). patching file drivers/infiniband/hw/ipath/ipath_file_ops.c patching file drivers/infiniband/hw/ipath/ipath_fs.c patching file drivers/infiniband/hw/ipath/ipath_iba6110.c patching file drivers/infiniband/hw/ipath/ipath_iba6120.c patching file drivers/infiniband/hw/ipath/ipath_init_chip.c patching file drivers/infiniband/hw/ipath/ipath_kernel.h patching file drivers/infiniband/hw/ipath/ipath_layer.c patching file drivers/infiniband/hw/ipath/ipath_sysfs.c patching file drivers/infiniband/hw/ipath/ipath_user_pages.c patching file drivers/infiniband/hw/ipath/ipath_verbs.c patching file drivers/infiniband/hw/ipath/ipath_verbs.h patching file drivers/infiniband/hw/ipath/Makefile /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_5010_to_2_6_9.patch patching file drivers/infiniband/include/linux/if_infiniband.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_8111_to_2_6_16.patch patching file drivers/infiniband/ulp/ipoib/ipoib_main.c Hunk #2 succeeded at 803 (offset 49 lines). patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #1 succeeded at 46 (offset -1 lines). Hunk #2 succeeded at 220 (offset 1 line). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_device_5496_to_2_6_15.patch patching file drivers/infiniband/include/linux/device.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_err_to_2_6_11.patch patching file include/linux/err.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_idr_6554_to_2_6_13.patch patching file drivers/infiniband/include/linux/idr.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_inetdevice_to_2_6_17.patch patching file drivers/infiniband/include/linux/inetdevice.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_lockdep_to_2_6_17.patch patching file include/linux/lockdep.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_mutex_5947_to_2_6_15.patch patching file drivers/infiniband/include/linux/mutex.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_netdevice_to_2_6_17.patch patching file drivers/infiniband/include/linux/netdevice.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_pci_7970_to_2_6_9.patch patching file drivers/infiniband/include/linux/pci.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_scatterlist_6369_to_2_6_9.patch patching file drivers/infiniband/include/linux/scatterlist.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_signal_to_2_6_17.patch patching file drivers/infiniband/include/linux/signal.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_skbuff_6754_to_2_6_11.patch patching file include/linux/skbuff.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_spinlock_5883_to_2_6_9.patch patching file drivers/infiniband/include/linux/spinlock.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/makefile_to_2_6_9.patch patching file drivers/infiniband/ulp/srp/Makefile /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_dev_3465_to_2_6_11.patch patching file drivers/infiniband/hw/mthca/mthca_dev.h Hunk #1 succeeded at 57 with fuzz 2 (offset 4 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_provider_3465_to_2_6_9.patch patching file drivers/infiniband/hw/mthca/mthca_provider.c Hunk #1 succeeded at 387 (offset 28 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_inet_sock_6754_to_2_6_15.patch patching file include/net/inet_sock.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_1_6754_to_2_6_13.patch patching file include/net/sock.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_2_6754_to_2_6_11.patch patching file include/net/sock.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_tcp_states_6754_to_2_6_13.patch patching file include/net/tcp_states.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/read_mostly_6255_to_2_6_13.patch patching file drivers/infiniband/include/linux/cache.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/scsi_7242_to_2_6_14.patch patching file include/scsi/scsi.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/sdp_7277_to_2_6_11.patch patching file drivers/infiniband/ulp/sdp/sdp_main.c Hunk #1 succeeded at 418 (offset 118 lines). Hunk #2 succeeded at 535 (offset 41 lines). Hunk #3 succeeded at 633 (offset 118 lines). Hunk #4 succeeded at 1408 (offset 245 lines). Hunk #5 succeeded at 1301 (offset 118 lines). Hunk #6 succeeded at 1537 (offset 245 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_4030_to_2_6_12.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 1594 (offset 271 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_7312_to_2_6_11.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 1258 (offset -44 lines). Hunk #3 succeeded at 1332 (offset -42 lines). Hunk #5 succeeded at 1360 with fuzz 2 (offset -40 lines). Hunk #6 succeeded at 1404 with fuzz 2 (offset -3 lines). Hunk #7 succeeded at 1377 (offset -40 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_scsi_scan_target_7242_to_2_6_11.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 975 (offset 26 lines). Hunk #2 succeeded at 1505 (offset 24 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/top_2844_to_2_6_11.patch patching file drivers/infiniband/Makefile Hunk #1 succeeded at 1 with fuzz 2. /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucm_5245_to_2_6_9.patch patching file drivers/infiniband/core/ucm.c Hunk #1 succeeded at 1270 (offset -8 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucma_6607_to_2_6_9.patch patching file drivers/infiniband/core/ucma.c Hunk #1 succeeded at 861 (offset 88 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/user_mad_4603_to_2_6_9.patch patching file drivers/infiniband/core/user_mad.c Hunk #1 succeeded at 857 (offset -20 lines). Hunk #3 succeeded at 1086 (offset -20 lines). Hunk #5 succeeded at 1123 (offset -20 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_main_3935_to_2_6_9.patch patching file drivers/infiniband/core/uverbs_main.c Hunk #1 succeeded at 727 (offset 11 lines). Hunk #2 succeeded at 949 (offset 1 line). Hunk #3 succeeded at 975 (offset 11 lines). Hunk #4 succeeded at 986 (offset 3 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_to_2_6_17.patch patching file drivers/infiniband/core/uverbs_main.c Hunk #1 succeeded at 1011 with fuzz 1 (offset 196 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/hpage_patches/hpages.patch patching file drivers/infiniband/core/uverbs_mem.c /bin/rm -f /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/examples cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs Running: ./configure --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache --disable-libcheck --prefix /usr/local/ ofed --libdir /usr/local/ofed/lib64 CPPFLAGS="-I../libibverbs/include" configure: creating cache /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for gawk... gawk checking whether make sets $(MAKE)... yes checking build system type... x86_64-redhat-linux-gnu checking host system type... x86_64-redhat-linux-gnu checking for style of include used by make... GNU checking for gcc... gcc checking for C compiler default output file name... configure: error: C compiler cannot create executables See `config.log' for more details. Failed to execute: ./configure --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache --disable-libcheck --prefix / usr/local/ofed --libdir /usr/local/ofed/lib64 CPPFLAGS="-I../libibverbs/include" error: Bad exit status from /var/tmp/rpm-tmp.43267 (%install) RPM build errors: user vlad does not exist - using root group mtl does not exist - using root user vlad does not exist - using root group mtl does not exist - using root Bad exit status from /var/tmp/rpm-tmp.43267 (%install) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-libibcm --with-libibverbs --with-libipathverbs --with-libmth ca --with-librdmacm --with-mstflint --with-perftest --with-ipath_inf-mod --with-ipoib-mod --with-mthca-mod --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod' --define 'configure_options32 %{nil}' --define 'KVERSION 2.6.9-34.ELsmp' --define 'KSRC /lib/modules/2.6.9-34.ELsmp/build' --define 'build_kernel_ib 1' --define 'build_kernel_ib_de vel 1' --define 'NETWORK_CONF_DIR /etc/sysconfig/network-scripts' --define 'modprobe_update 1' --define 'include_ipoib_conf 1' --define 'build_32bit 0' /home/caton/OFED-1.1/SRPMS/openib-1.1-0.src.rpm" --------------------------------------------------------- Thanks a lot and best regards Julio. _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrio at caton.es Fri Jun 22 08:07:41 2007 From: jrio at caton.es (Julio del =?ISO-8859-1?Q?R=EDo?=) Date: Fri, 22 Jun 2007 17:07:41 +0200 Subject: [ofa-general] problem with ofed 1.1. In-Reply-To: References: Message-ID: <1182524861.5695.28.camel@linux.site> [root at localhost root]# rpm -qa | grep gcc libgcc-3.3.3-7 gcc-g77-3.3.3-7 gcc-3.3.3-7 gcc-objc-3.3.3-7 compat-gcc-c++-7.3-2.96.126 gcc-gnat-3.3.3-7 compat-gcc-7.3-2.96.126 gcc34-3.4.0-1 gcc34-c++-3.4.0-1 libgcc-3.3.3-7 gcc-c++-3.3.3-7 gcc-java-3.3.3-7 gcc34-java-3.4.0-1 [root at localhost root]# rpm -qa | grep libc libcroco-0.4.0-4 libcap-devel-1.10-18.1 libc-client-devel-2002e-5 glibc-2.3.3-27 glibc-kernheaders-2.4-8.44 glibc-utils-2.3.3-27 glibc-2.3.3-27 glibc-profile-2.3.3-27 glibc-common-2.3.3-27 glibc-devel-2.3.3-27 libc-client-2002e-5 libcap-1.10-18.1 glibc-headers-2.3.3-27 Thanks a lot and best regards El vie, 22-06-2007 a las 11:04 -0400, Mahmoud Hanafi escribió: > > Do you have gcc and glibc-devel.x86_64 installed? > > -Mahmoud > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > This is a PRIVATE message. If you are not the intended recipient, > please delete without copying and kindly advise us by e-mail of the > mistake in delivery. NOTE: Regardless of content, this e-mail shall > not operate to bind CSC to any order or other contract unless pursuant > to explicit written agreement or government initiative expressly > permitting the use of e-mail for such purpose. > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > Julio del Río > Sent by: > general-bounces at lists.openfabrics.org > > 06/22/2007 04:34 AM > > > To > general at lists.openfabrics.org > cc > > Subject > [ofa-general] > problem with ofed > 1.1. > > > > > > > > > > Good morning, > > I hope you could help me with this: > > I have this config: > > - Fedora Core 2 > - Linux localhost.localdomain 2.6.9-34.ELsmp #1 SMP Fri Feb 24 > 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux > - HCA Mellanox MHGS18-XTC > - Flextronic Switch F-X430047 > - Ofed 1.1 > > and trying to install, this is the error log file I get: > > --------------------------------------------------------- > + STATUS=0 > + '[' 0 -ne 0 ']' > + cd openib-1.1 > ++ /usr/bin/id -u > + '[' 0 = 0 ']' > + /bin/chown -Rhf root . > ++ /usr/bin/id -u > + '[' 0 = 0 ']' > + /bin/chgrp -Rhf root . > + /bin/chmod -Rf a+rX,u+w,g-w,o-w . > + exit 0 > Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.43267 > + umask 022 > + cd /var/tmp/OFEDRPM/BUILD > + cd openib-1.1 > + LANG=C > + export LANG > + unset DISPLAY > + rm -rf /var/tmp/OFED > + cd /var/tmp/OFEDRPM/BUILD/openib-1.1 > + mkdir -p /var/tmp/OFED//usr/local/ofed/src > + cp > -a /var/tmp/OFEDRPM/BUILD/openib-1.1 /var/tmp/OFED//usr/local/ofed/src > + ./configure --prefix=/usr/local/ofed --libdir=/usr/local/ofed/lib64 > --kernel-version 2.6.9-34.ELsmp --kernel-sources /lib > /modules/2.6.9-34.ELsmp/build --with-libibcm --with-libibverbs > --with-libipathverbs --with-libmthca --with-librdmacm --with > -mstflint --with-perftest --with-ipath_inf-mod --with-ipoib-mod > --with-mthca-mod --with-core-mod --with-user_mad-mod --with > -user_access-mod --with-addr_trans-mod > Quilt does not exist... Going to use patch. > Created configure.mk: > prefix=/usr/local/ofed > PREFIX="--prefix /usr/local/ofed" > libdir=/usr/local/ofed/lib64 > > # Current working directory > CWD=/var/tmp/OFEDRPM/BUILD/openib-1.1 > > # Kernel level > KVERSION=2.6.9-34.ELsmp > EXTRAVERSION=-34.ELsmp > MODULES_DIR=/lib/modules/2.6.9-34.ELsmp > KSRC=/lib/modules/2.6.9-34.ELsmp/build > > AUTOCONF_H=/var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h > WITH_MEMTRACK=no > > WITH_MAKE_PARAMS= > > CONFIG_INFINIBAND=m > CONFIG_INFINIBAND_IPOIB=m > CONFIG_INFINIBAND_SDP= > CONFIG_INFINIBAND_SRP= > > CONFIG_INFINIBAND_USER_MAD=m > CONFIG_INFINIBAND_USER_ACCESS=m > CONFIG_INFINIBAND_ADDR_TRANS=y > CONFIG_INFINIBAND_MTHCA=m > > CONFIG_INFINIBAND_IPOIB_DEBUG=y > CONFIG_INFINIBAND_ISER= > CONFIG_INFINIBAND_EHCA= > CONFIG_INFINIBAND_EHCA_SCALING= > CONFIG_INFINIBAND_RDS= > CONFIG_INFINIBAND_RDS_DEBUG= > CONFIG_INFINIBAND_MADEYE= > > CONFIG_INFINIBAND_IPOIB_DEBUG_DATA= > CONFIG_INFINIBAND_SDP_SEND_ZCOPY= > CONFIG_INFINIBAND_SDP_RECV_ZCOPY= > CONFIG_INFINIBAND_SDP_DEBUG= > CONFIG_INFINIBAND_SDP_DEBUG_DATA= > CONFIG_INFINIBAND_IPATH=m > CONFIG_INFINIBAND_MTHCA_DEBUG=y > > > > # User level > WITH_IBVERBS=yes > WITH_MTHCA=yes > WITH_IPATHVERBS=yes > WITH_EHCA=no > WITH_CM=yes > WITH_SDP=no > WITH_DAPL=no > WITH_RDMACM=yes > WITH_MANAGEMENT_LIBS=no > WITH_OSM=no > WITH_DIAGS=no > WITH_MPI=no > WITH_PERFTEST=yes > WITH_SRPTOOLS=no > WITH_IPOIBTOOLS=no > WITH_TVFLASH=no > WITH_MSTFLINT=yes > > Created /var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h: > #undef CONFIG_INFINIBAND > #undef CONFIG_INFINIBAND_IPOIB > #undef CONFIG_INFINIBAND_SDP > #undef CONFIG_INFINIBAND_SRP > > #undef CONFIG_INFINIBAND_USER_MAD > #undef CONFIG_INFINIBAND_USER_ACCESS > #undef CONFIG_INFINIBAND_ADDR_TRANS > #undef CONFIG_INFINIBAND_MTHCA > > #undef CONFIG_INFINIBAND_IPOIB_DEBUG > #undef CONFIG_INFINIBAND_ISER > #undef CONFIG_INFINIBAND_EHCA > #undef CONFIG_INFINIBAND_EHCA_SCALING > #undef CONFIG_INFINIBAND_RDS > #undef CONFIG_INFINIBAND_RDS_DEBUG > #undef CONFIG_INFINIBAND_MADEYE > > #undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA > #undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY > #undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY > #undef CONFIG_INFINIBAND_SDP_DEBUG > #undef CONFIG_INFINIBAND_SDP_DEBUG_DATA > #undef CONFIG_INFINIBAND_IPATH > #undef CONFIG_INFINIBAND_MTHCA_DEBUG > > #define CONFIG_INFINIBAND 1 > #define CONFIG_INFINIBAND_IPOIB 1 > #undef CONFIG_INFINIBAND_SDP > #undef CONFIG_INFINIBAND_SRP > > #define CONFIG_INFINIBAND_USER_MAD 1 > #define CONFIG_INFINIBAND_USER_ACCESS 1 > #define CONFIG_INFINIBAND_ADDR_TRANS 1 > #define CONFIG_INFINIBAND_MTHCA 1 > > #define CONFIG_INFINIBAND_IPOIB_DEBUG 1 > #undef CONFIG_INFINIBAND_ISER > #undef CONFIG_INFINIBAND_EHCA > #undef CONFIG_INFINIBAND_RDS > #undef CONFIG_INFINIBAND_RDS_DEBUG > > > #undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA > #undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY > #undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY > #undef CONFIG_INFINIBAND_SDP_DEBUG > #undef CONFIG_INFINIBAND_SDP_DEBUG_DATA > #define CONFIG_INFINIBAND_IPATH 1 > #define CONFIG_INFINIBAND_MTHCA_DEBUG 1 > #undef CONFIG_INFINIBAND_MADEYE > > mkdir -p /var/tmp/OFEDRPM/BUILD/openib-1.1/patches > touch /var/tmp/OFEDRPM/BUILD/openib-1.1/patches/quiltrc > > /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/dapl_qp_attr.patch > patching file src/userspace/dapl/dapl/openib_cma/dapl_ib_util.c > patching file src/userspace/dapl/dapl/openib_scm/dapl_ib_util.c > > /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_cq_deadlock.patch > patching file src/userspace/libmthca/src/verbs.c > Hunk #1 succeeded at 614 (offset -8 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_stddef.patch > patching file src/userspace/libmthca/src/mthca.h > Hunk #1 succeeded at 38 with fuzz 2 (offset 2 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_compat.patch > patching file src/userspace/librdmacm/src/cma.c > Hunk #1 succeeded at 157 (offset 16 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_ver_abi.patch > patching file src/userspace/librdmacm/src/cma.c > Hunk #2 succeeded at 170 (offset 16 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/mstflint.patch > patching file src/userspace/mstflint/mtcr.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_add_mra_timeout_limit.patch > patching file drivers/infiniband/core/cm.c > Hunk #1 succeeded at 53 (offset -1 lines). > Hunk #2 succeeded at 2268 (offset -36 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_cleanup_timewait.patch > patching file drivers/infiniband/core/cm.c > Hunk #1 succeeded at 686 (offset 7 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_established1.patch > patching file drivers/infiniband/ulp/sdp/sdp.h > patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c > Hunk #1 succeeded at 515 (offset 16 lines). > patching file drivers/infiniband/ulp/sdp/sdp_cma.c > patching file drivers/infiniband/ulp/sdp/sdp_main.c > Hunk #1 succeeded at 589 (offset 26 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_increase_max_cm_retries.patch > patching file drivers/infiniband/core/cma.c > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_list_init.patch > patching file drivers/infiniband/core/cma.c > Hunk #1 succeeded at 328 (offset -11 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_mem_leak.patch > patching file drivers/infiniband/core/cma.c > Hunk #1 succeeded at 1713 (offset -241 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_race_fix.patch > patching file drivers/infiniband/core/cma.c > Hunk #1 succeeded at 910 with fuzz 1 (offset -113 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_tavor_quirk.patch > patching file drivers/infiniband/core/cma.c > Hunk #1 succeeded at 48 with fuzz 2. > Hunk #2 succeeded at 1154 (offset 27 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ib_sa_names.patch > patching file include/rdma/ib_sa.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-fixes.patch > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/Makefile > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/Kconfig > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/Makefile > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_common.h > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_cq.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_debug.h > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_diag.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_driver.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_file_ops.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_fs.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_ht400.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_iba6110.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_iba6120.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_init_chip.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_intr.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_kernel.h > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_keys.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_layer.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_layer.h > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_mad.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_mr.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_pe800.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_qp.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_rc.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_registers.h > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_ruc.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_srq.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_stats.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_sysfs.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_uc.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_ud.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_verbs.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_verbs.h > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_verbs_mcast.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_wc_ppc64.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/verbs_debug.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-limit-packets-sent-without-ack.patch > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_qp.c > Hunk #1 succeeded at 502 (offset -8 lines). > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_rc.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_verbs.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_verbs.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-memcpy_cachebypass.patch > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/Makefile > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/ipath_verbs.c > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/memcpy_cachebypass_x86_64.S > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-x86_64.patch > (Stripping trailing CRs from patch.) > patching file drivers/infiniband/hw/ipath/Kconfig > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_issue3.patch > patching file drivers/infiniband/ulp/ipoib/ipoib_main.c > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_join_mask.patch > patching file drivers/infiniband/ulp/ipoib/ipoib_multicast.c > Hunk #1 succeeded at 471 (offset -1 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_restart.patch > patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_selector_updated.patch > patching file drivers/infiniband/ulp/ipoib/ipoib_main.c > Hunk #2 succeeded at 458 (offset 4 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_attributes.patch > patching file drivers/infiniband/ulp/srp/ib_srp.c > Hunk #1 succeeded at 1461 (offset -6 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_remove_reconnect.patch > patching file drivers/infiniband/ulp/srp/ib_srp.c > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_wa_post_send.patch > patching file drivers/infiniband/ulp/srp/ib_srp.c > patching file drivers/infiniband/ulp/srp/ib_srp.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/lockdep_header.patch > patching file drivers/infiniband/core/uverbs_cmd.c > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_av_statrate.patch > patching file drivers/infiniband/hw/mthca/mthca_av.c > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_catas_reset.patch > patching file drivers/infiniband/hw/mthca/mthca_catas.c > patching file drivers/infiniband/hw/mthca/mthca_main.c > patching file drivers/infiniband/hw/mthca/mthca_dev.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_mad_traps.patch > patching file drivers/infiniband/hw/mthca/mthca_mad.c > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_port.patch > patching file drivers/infiniband/hw/mthca/mthca_provider.c > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_portnum.patch > patching file drivers/infiniband/hw/mthca/mthca_qp.c > Hunk #1 succeeded at 478 (offset 4 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_statrate_bits.patch > patching file drivers/infiniband/hw/mthca/mthca_qp.c > Hunk #1 succeeded at 414 (offset 4 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_use_uar2.patch > patching file drivers/infiniband/hw/mthca/mthca_uar.c > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/robert-ipath-diagpkt-init-fixup.patch > patching file drivers/infiniband/hw/ipath/ipath_diag.c > Hunk #1 succeeded at 285 (offset -1 lines). > patching file drivers/infiniband/hw/ipath/ipath_driver.c > Hunk #1 succeeded at 539 (offset -20 lines). > Hunk #2 succeeded at 596 with fuzz 1 (offset -105 lines). > Hunk #3 succeeded at 2029 (offset -156 lines). > patching file drivers/infiniband/hw/ipath/ipath_kernel.h > Hunk #1 succeeded at 793 (offset -96 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_credits_by_seq.patch > patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_post_credits.patch > patching file drivers/infiniband/ulp/sdp/sdp.h > Hunk #1 succeeded at 177 (offset 1 line). > patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c > Hunk #1 succeeded at 324 (offset 6 lines). > patching file drivers/infiniband/ulp/sdp/sdp_cma.c > Hunk #1 succeeded at 434 (offset 4 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_drep_on_not_found.patch > patching file drivers/infiniband/core/cm.c > Hunk #1 succeeded at 1890 (offset -10 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_randomize_psn.patch > patching file drivers/infiniband/core/cm.c > Hunk #3 succeeded at 81 (offset 7 lines). > Hunk #5 succeeded at 327 (offset 7 lines). > Hunk #7 succeeded at 2115 (offset 27 lines). > Hunk #8 succeeded at 3369 (offset 2 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_unload_crash.patch > patching file drivers/infiniband/core/cm.c > Hunk #1 succeeded at 82 (offset 7 lines). > Hunk #3 succeeded at 656 (offset 6 lines). > Hunk #5 succeeded at 685 (offset 6 lines). > Hunk #7 succeeded at 1316 (offset 6 lines). > Hunk #9 succeeded at 1334 (offset 6 lines). > Hunk #10 succeeded at 2626 (offset -7 lines). > Hunk #11 succeeded at 3409 (offset -29 lines). > Hunk #12 succeeded at 3449 (offset -7 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_establish.patch > patching file include/rdma/rdma_cm.h > Hunk #1 succeeded at 241 (offset -15 lines). > patching file drivers/infiniband/core/cm.c > Hunk #1 succeeded at 3242 (offset 35 lines). > patching file drivers/infiniband/core/cma.c > Hunk #1 succeeded at 759 (offset -81 lines). > Hunk #3 succeeded at 1752 (offset -212 lines). > Hunk #4 succeeded at 1997 with fuzz 1. > Hunk #5 succeeded at 1828 (offset -229 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_hotplug.patch > patching file drivers/infiniband/core/cma.c > Hunk #1 succeeded at 278 (offset 7 lines). > Hunk #3 succeeded at 700 (offset 8 lines). > Hunk #5 succeeded at 895 with fuzz 1 (offset -9 lines). > Hunk #6 succeeded at 1382 (offset 6 lines). > Hunk #7 succeeded at 1610 (offset -9 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_typo_fix.patch > patching file drivers/infiniband/core/cma.c > Hunk #1 succeeded at 276 with fuzz 2 (offset 7 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_1_recreate_at_reconnect.patch > patching file drivers/infiniband/ulp/srp/ib_srp.c > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_2_use_multiple_initiator_ports.patch > patching file drivers/infiniband/ulp/srp/ib_srp.c > patching file drivers/infiniband/ulp/srp/ib_srp.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_topspin.patch > patching file drivers/infiniband/ulp/srp/ib_srp.c > Hunk #1 succeeded at 358 (offset -1 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_1.patch > patching file drivers/infiniband/hw/ehca/ehca_main.c > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_2.patch > patching file drivers/infiniband/hw/ehca/ehca_tools.h > > Applying patches for 2.6.9-34.ELsmp kernel (RHAS4 Update 3): > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_1_netevents_revert_to_2_6_17.patch > patching file drivers/infiniband/core/addr.c > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_3926_to_2_6_13.patch > patching file drivers/infiniband/core/addr.c > Hunk #1 succeeded at 327 with fuzz 1 (offset 11 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_4670_to_2_6_9.patch > patching file drivers/infiniband/core/addr.c > Hunk #1 succeeded at 27 with fuzz 2. > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/asm_bitops_ia64_to_2_6_11.patch > patching file include/asm/bitops.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/core_4807_to_2_6_9.patch > patching file drivers/infiniband/core/sysfs.c > Hunk #1 succeeded at 438 (offset -4 lines). > patching file drivers/infiniband/core/user_mad.c > Hunk #2 succeeded at 677 (offset 91 lines). > Hunk #3 succeeded at 685 (offset 5 lines). > Hunk #4 succeeded at 1106 (offset 91 lines). > Hunk #5 succeeded at 1053 (offset 5 lines). > patching file drivers/infiniband/core/uverbs_main.c > Hunk #2 succeeded at 118 (offset 3 lines). > patching file drivers/infiniband/core/uverbs_mem.c > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/debugfs_to_2_6_9.patch > patching file drivers/infiniband/include/linux/debugfs.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipath-backport.patch > patching file drivers/infiniband/hw/ipath/iowrite32_copy_x86_64.S > patching file drivers/infiniband/hw/ipath/ipath_backport.h > patching file drivers/infiniband/hw/ipath/ipath_diag.c > patching file drivers/infiniband/hw/ipath/ipath_driver.c > Hunk #2 succeeded at 557 (offset 1 line). > Hunk #3 succeeded at 599 (offset 1 line). > Hunk #4 succeeded at 1366 (offset 1 line). > Hunk #5 succeeded at 1395 (offset 1 line). > Hunk #6 succeeded at 1875 (offset 1 line). > Hunk #7 succeeded at 1903 (offset 1 line). > Hunk #8 succeeded at 1984 (offset -9 lines). > Hunk #9 succeeded at 2027 (offset 1 line). > Hunk #10 succeeded at 2142 (offset -9 lines). > patching file drivers/infiniband/hw/ipath/ipath_file_ops.c > patching file drivers/infiniband/hw/ipath/ipath_fs.c > patching file drivers/infiniband/hw/ipath/ipath_iba6110.c > patching file drivers/infiniband/hw/ipath/ipath_iba6120.c > patching file drivers/infiniband/hw/ipath/ipath_init_chip.c > patching file drivers/infiniband/hw/ipath/ipath_kernel.h > patching file drivers/infiniband/hw/ipath/ipath_layer.c > patching file drivers/infiniband/hw/ipath/ipath_sysfs.c > patching file drivers/infiniband/hw/ipath/ipath_user_pages.c > patching file drivers/infiniband/hw/ipath/ipath_verbs.c > patching file drivers/infiniband/hw/ipath/ipath_verbs.h > patching file drivers/infiniband/hw/ipath/Makefile > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_5010_to_2_6_9.patch > patching file drivers/infiniband/include/linux/if_infiniband.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_8111_to_2_6_16.patch > patching file drivers/infiniband/ulp/ipoib/ipoib_main.c > Hunk #2 succeeded at 803 (offset 49 lines). > patching file drivers/infiniband/ulp/ipoib/ipoib.h > Hunk #1 succeeded at 46 (offset -1 lines). > Hunk #2 succeeded at 220 (offset 1 line). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_device_5496_to_2_6_15.patch > patching file drivers/infiniband/include/linux/device.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_err_to_2_6_11.patch > patching file include/linux/err.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_idr_6554_to_2_6_13.patch > patching file drivers/infiniband/include/linux/idr.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_inetdevice_to_2_6_17.patch > patching file drivers/infiniband/include/linux/inetdevice.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_lockdep_to_2_6_17.patch > patching file include/linux/lockdep.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_mutex_5947_to_2_6_15.patch > patching file drivers/infiniband/include/linux/mutex.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_netdevice_to_2_6_17.patch > patching file drivers/infiniband/include/linux/netdevice.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_pci_7970_to_2_6_9.patch > patching file drivers/infiniband/include/linux/pci.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_scatterlist_6369_to_2_6_9.patch > patching file drivers/infiniband/include/linux/scatterlist.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_signal_to_2_6_17.patch > patching file drivers/infiniband/include/linux/signal.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_skbuff_6754_to_2_6_11.patch > patching file include/linux/skbuff.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_spinlock_5883_to_2_6_9.patch > patching file drivers/infiniband/include/linux/spinlock.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/makefile_to_2_6_9.patch > patching file drivers/infiniband/ulp/srp/Makefile > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_dev_3465_to_2_6_11.patch > patching file drivers/infiniband/hw/mthca/mthca_dev.h > Hunk #1 succeeded at 57 with fuzz 2 (offset 4 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_provider_3465_to_2_6_9.patch > patching file drivers/infiniband/hw/mthca/mthca_provider.c > Hunk #1 succeeded at 387 (offset 28 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_inet_sock_6754_to_2_6_15.patch > patching file include/net/inet_sock.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_1_6754_to_2_6_13.patch > patching file include/net/sock.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_2_6754_to_2_6_11.patch > patching file include/net/sock.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_tcp_states_6754_to_2_6_13.patch > patching file include/net/tcp_states.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/read_mostly_6255_to_2_6_13.patch > patching file drivers/infiniband/include/linux/cache.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/scsi_7242_to_2_6_14.patch > patching file include/scsi/scsi.h > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/sdp_7277_to_2_6_11.patch > patching file drivers/infiniband/ulp/sdp/sdp_main.c > Hunk #1 succeeded at 418 (offset 118 lines). > Hunk #2 succeeded at 535 (offset 41 lines). > Hunk #3 succeeded at 633 (offset 118 lines). > Hunk #4 succeeded at 1408 (offset 245 lines). > Hunk #5 succeeded at 1301 (offset 118 lines). > Hunk #6 succeeded at 1537 (offset 245 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_4030_to_2_6_12.patch > patching file drivers/infiniband/ulp/srp/ib_srp.c > Hunk #1 succeeded at 1594 (offset 271 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_7312_to_2_6_11.patch > patching file drivers/infiniband/ulp/srp/ib_srp.c > Hunk #1 succeeded at 1258 (offset -44 lines). > Hunk #3 succeeded at 1332 (offset -42 lines). > Hunk #5 succeeded at 1360 with fuzz 2 (offset -40 lines). > Hunk #6 succeeded at 1404 with fuzz 2 (offset -3 lines). > Hunk #7 succeeded at 1377 (offset -40 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_scsi_scan_target_7242_to_2_6_11.patch > patching file drivers/infiniband/ulp/srp/ib_srp.c > Hunk #1 succeeded at 975 (offset 26 lines). > Hunk #2 succeeded at 1505 (offset 24 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/top_2844_to_2_6_11.patch > patching file drivers/infiniband/Makefile > Hunk #1 succeeded at 1 with fuzz 2. > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucm_5245_to_2_6_9.patch > patching file drivers/infiniband/core/ucm.c > Hunk #1 succeeded at 1270 (offset -8 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucma_6607_to_2_6_9.patch > patching file drivers/infiniband/core/ucma.c > Hunk #1 succeeded at 861 (offset 88 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/user_mad_4603_to_2_6_9.patch > patching file drivers/infiniband/core/user_mad.c > Hunk #1 succeeded at 857 (offset -20 lines). > Hunk #3 succeeded at 1086 (offset -20 lines). > Hunk #5 succeeded at 1123 (offset -20 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_main_3935_to_2_6_9.patch > patching file drivers/infiniband/core/uverbs_main.c > Hunk #1 succeeded at 727 (offset 11 lines). > Hunk #2 succeeded at 949 (offset 1 line). > Hunk #3 succeeded at 975 (offset 11 lines). > Hunk #4 succeeded at 986 (offset 3 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_to_2_6_17.patch > patching file drivers/infiniband/core/uverbs_main.c > Hunk #1 succeeded at 1011 with fuzz 1 (offset 196 lines). > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/hpage_patches/hpages.patch > patching file drivers/infiniband/core/uverbs_mem.c > /bin/rm -f /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache > cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/examples > cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs > Running: ./configure > --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache > --disable-libcheck --prefix /usr/local/ > ofed --libdir /usr/local/ofed/lib64 CPPFLAGS="-I../libibverbs/include" > configure: creating > cache /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache > checking for a BSD-compatible install... /usr/bin/install -c > checking whether build environment is sane... yes > checking for gawk... gawk > checking whether make sets $(MAKE)... yes > checking build system type... x86_64-redhat-linux-gnu > checking host system type... x86_64-redhat-linux-gnu > checking for style of include used by make... GNU > checking for gcc... gcc > checking for C compiler default output file name... configure: error: > C compiler cannot create executables > See `config.log' for more details. > Failed to execute: ./configure > --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache > --disable-libcheck --prefix / > usr/local/ofed --libdir /usr/local/ofed/lib64 > CPPFLAGS="-I../libibverbs/include" > error: Bad exit status from /var/tmp/rpm-tmp.43267 (%install) > > > RPM build errors: > user vlad does not exist - using root > group mtl does not exist - using root > user vlad does not exist - using root > group mtl does not exist - using root > Bad exit status from /var/tmp/rpm-tmp.43267 (%install) > ERROR: Failed executing "rpmbuild --rebuild --define > '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' --define > 'build_root /var/tmp/OFED' --define 'configure_options --with-libibcm > --with-libibverbs --with-libipathverbs --with-libmth > ca --with-librdmacm --with-mstflint --with-perftest > --with-ipath_inf-mod --with-ipoib-mod --with-mthca-mod --with-core-mod > --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod' > --define 'configure_options32 %{nil}' --define 'KVERSION > 2.6.9-34.ELsmp' --define 'KSRC /lib/modules/2.6.9-34.ELsmp/build' > --define 'build_kernel_ib 1' --define 'build_kernel_ib_de > vel 1' --define 'NETWORK_CONF_DIR /etc/sysconfig/network-scripts' > --define 'modprobe_update 1' --define 'include_ipoib_conf > 1' --define 'build_32bit > 0' /home/caton/OFED-1.1/SRPMS/openib-1.1-0.src.rpm" > > --------------------------------------------------------- > > Thanks a lot and best regards > > Julio. > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general Julio. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrio at caton.es Fri Jun 22 08:22:06 2007 From: jrio at caton.es (Julio del =?ISO-8859-1?Q?R=EDo?=) Date: Fri, 22 Jun 2007 17:22:06 +0200 Subject: [ofa-general] problem with ofed 1.1. Message-ID: <1182525727.5695.29.camel@linux.site> > [root at localhost root]# rpm -qa | grep gcc > libgcc-3.3.3-7 > gcc-g77-3.3.3-7 > gcc-3.3.3-7 > gcc-objc-3.3.3-7 > compat-gcc-c++-7.3-2.96.126 > gcc-gnat-3.3.3-7 > compat-gcc-7.3-2.96.126 > gcc34-3.4.0-1 > gcc34-c++-3.4.0-1 > libgcc-3.3.3-7 > gcc-c++-3.3.3-7 > gcc-java-3.3.3-7 > gcc34-java-3.4.0-1 > > [root at localhost root]# rpm -qa | grep libc > libcroco-0.4.0-4 > libcap-devel-1.10-18.1 > libc-client-devel-2002e-5 > glibc-2.3.3-27 > glibc-kernheaders-2.4-8.44 > glibc-utils-2.3.3-27 > glibc-2.3.3-27 > glibc-profile-2.3.3-27 > glibc-common-2.3.3-27 > glibc-devel-2.3.3-27 > libc-client-2002e-5 > libcap-1.10-18.1 > glibc-headers-2.3.3-27 > > Thanks a lot and best regards > > El vie, 22-06-2007 a las 11:04 -0400, Mahmoud Hanafi escribió: > > > > > Do you have gcc and glibc-devel.x86_64 installed? > > > > -Mahmoud > > > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > This is a PRIVATE message. If you are not the intended recipient, > > please delete without copying and kindly advise us by e-mail of the > > mistake in delivery. NOTE: Regardless of content, this e-mail shall > > not operate to bind CSC to any order or other contract unless > > pursuant to explicit written agreement or government initiative > > expressly permitting the use of e-mail for such purpose. > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > > > > > Julio del Río > > Sent by: > > general-bounces at lists.openfabrics.org > > > > 06/22/2007 04:34 AM > > > > > > > > To > > general at lists.openfabrics.org > > cc > > > > > > Subject > > [ofa-general] > > problem with ofed > > 1.1. > > > > > > > > > > > > > > > > > > > > > > > > > > Good morning, > > > > I hope you could help me with this: > > > > I have this config: > > > > - Fedora Core 2 > > - Linux localhost.localdomain 2.6.9-34.ELsmp #1 SMP Fri Feb 24 > > 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux > > - HCA Mellanox MHGS18-XTC > > - Flextronic Switch F-X430047 > > - Ofed 1.1 > > > > and trying to install, this is the error log file I get: > > > > --------------------------------------------------------- > > + STATUS=0 > > + '[' 0 -ne 0 ']' > > + cd openib-1.1 > > ++ /usr/bin/id -u > > + '[' 0 = 0 ']' > > + /bin/chown -Rhf root . > > ++ /usr/bin/id -u > > + '[' 0 = 0 ']' > > + /bin/chgrp -Rhf root . > > + /bin/chmod -Rf a+rX,u+w,g-w,o-w . > > + exit 0 > > Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.43267 > > + umask 022 > > + cd /var/tmp/OFEDRPM/BUILD > > + cd openib-1.1 > > + LANG=C > > + export LANG > > + unset DISPLAY > > + rm -rf /var/tmp/OFED > > + cd /var/tmp/OFEDRPM/BUILD/openib-1.1 > > + mkdir -p /var/tmp/OFED//usr/local/ofed/src > > + cp > > -a /var/tmp/OFEDRPM/BUILD/openib-1.1 /var/tmp/OFED//usr/local/ofed/src > > + ./configure --prefix=/usr/local/ofed > > --libdir=/usr/local/ofed/lib64 --kernel-version 2.6.9-34.ELsmp > > --kernel-sources /lib > > /modules/2.6.9-34.ELsmp/build --with-libibcm --with-libibverbs > > --with-libipathverbs --with-libmthca --with-librdmacm --with > > -mstflint --with-perftest --with-ipath_inf-mod --with-ipoib-mod > > --with-mthca-mod --with-core-mod --with-user_mad-mod --with > > -user_access-mod --with-addr_trans-mod > > Quilt does not exist... Going to use patch. > > Created configure.mk: > > prefix=/usr/local/ofed > > PREFIX="--prefix /usr/local/ofed" > > libdir=/usr/local/ofed/lib64 > > > > # Current working directory > > CWD=/var/tmp/OFEDRPM/BUILD/openib-1.1 > > > > # Kernel level > > KVERSION=2.6.9-34.ELsmp > > EXTRAVERSION=-34.ELsmp > > MODULES_DIR=/lib/modules/2.6.9-34.ELsmp > > KSRC=/lib/modules/2.6.9-34.ELsmp/build > > > > AUTOCONF_H=/var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h > > WITH_MEMTRACK=no > > > > WITH_MAKE_PARAMS= > > > > CONFIG_INFINIBAND=m > > CONFIG_INFINIBAND_IPOIB=m > > CONFIG_INFINIBAND_SDP= > > CONFIG_INFINIBAND_SRP= > > > > CONFIG_INFINIBAND_USER_MAD=m > > CONFIG_INFINIBAND_USER_ACCESS=m > > CONFIG_INFINIBAND_ADDR_TRANS=y > > CONFIG_INFINIBAND_MTHCA=m > > > > CONFIG_INFINIBAND_IPOIB_DEBUG=y > > CONFIG_INFINIBAND_ISER= > > CONFIG_INFINIBAND_EHCA= > > CONFIG_INFINIBAND_EHCA_SCALING= > > CONFIG_INFINIBAND_RDS= > > CONFIG_INFINIBAND_RDS_DEBUG= > > CONFIG_INFINIBAND_MADEYE= > > > > CONFIG_INFINIBAND_IPOIB_DEBUG_DATA= > > CONFIG_INFINIBAND_SDP_SEND_ZCOPY= > > CONFIG_INFINIBAND_SDP_RECV_ZCOPY= > > CONFIG_INFINIBAND_SDP_DEBUG= > > CONFIG_INFINIBAND_SDP_DEBUG_DATA= > > CONFIG_INFINIBAND_IPATH=m > > CONFIG_INFINIBAND_MTHCA_DEBUG=y > > > > > > > > # User level > > WITH_IBVERBS=yes > > WITH_MTHCA=yes > > WITH_IPATHVERBS=yes > > WITH_EHCA=no > > WITH_CM=yes > > WITH_SDP=no > > WITH_DAPL=no > > WITH_RDMACM=yes > > WITH_MANAGEMENT_LIBS=no > > WITH_OSM=no > > WITH_DIAGS=no > > WITH_MPI=no > > WITH_PERFTEST=yes > > WITH_SRPTOOLS=no > > WITH_IPOIBTOOLS=no > > WITH_TVFLASH=no > > WITH_MSTFLINT=yes > > > > Created /var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h: > > #undef CONFIG_INFINIBAND > > #undef CONFIG_INFINIBAND_IPOIB > > #undef CONFIG_INFINIBAND_SDP > > #undef CONFIG_INFINIBAND_SRP > > > > #undef CONFIG_INFINIBAND_USER_MAD > > #undef CONFIG_INFINIBAND_USER_ACCESS > > #undef CONFIG_INFINIBAND_ADDR_TRANS > > #undef CONFIG_INFINIBAND_MTHCA > > > > #undef CONFIG_INFINIBAND_IPOIB_DEBUG > > #undef CONFIG_INFINIBAND_ISER > > #undef CONFIG_INFINIBAND_EHCA > > #undef CONFIG_INFINIBAND_EHCA_SCALING > > #undef CONFIG_INFINIBAND_RDS > > #undef CONFIG_INFINIBAND_RDS_DEBUG > > #undef CONFIG_INFINIBAND_MADEYE > > > > #undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA > > #undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY > > #undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY > > #undef CONFIG_INFINIBAND_SDP_DEBUG > > #undef CONFIG_INFINIBAND_SDP_DEBUG_DATA > > #undef CONFIG_INFINIBAND_IPATH > > #undef CONFIG_INFINIBAND_MTHCA_DEBUG > > > > #define CONFIG_INFINIBAND 1 > > #define CONFIG_INFINIBAND_IPOIB 1 > > #undef CONFIG_INFINIBAND_SDP > > #undef CONFIG_INFINIBAND_SRP > > > > #define CONFIG_INFINIBAND_USER_MAD 1 > > #define CONFIG_INFINIBAND_USER_ACCESS 1 > > #define CONFIG_INFINIBAND_ADDR_TRANS 1 > > #define CONFIG_INFINIBAND_MTHCA 1 > > > > #define CONFIG_INFINIBAND_IPOIB_DEBUG 1 > > #undef CONFIG_INFINIBAND_ISER > > #undef CONFIG_INFINIBAND_EHCA > > #undef CONFIG_INFINIBAND_RDS > > #undef CONFIG_INFINIBAND_RDS_DEBUG > > > > > > #undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA > > #undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY > > #undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY > > #undef CONFIG_INFINIBAND_SDP_DEBUG > > #undef CONFIG_INFINIBAND_SDP_DEBUG_DATA > > #define CONFIG_INFINIBAND_IPATH 1 > > #define CONFIG_INFINIBAND_MTHCA_DEBUG 1 > > #undef CONFIG_INFINIBAND_MADEYE > > > > mkdir -p /var/tmp/OFEDRPM/BUILD/openib-1.1/patches > > touch /var/tmp/OFEDRPM/BUILD/openib-1.1/patches/quiltrc > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/dapl_qp_attr.patch > > patching file src/userspace/dapl/dapl/openib_cma/dapl_ib_util.c > > patching file src/userspace/dapl/dapl/openib_scm/dapl_ib_util.c > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_cq_deadlock.patch > > patching file src/userspace/libmthca/src/verbs.c > > Hunk #1 succeeded at 614 (offset -8 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_stddef.patch > > patching file src/userspace/libmthca/src/mthca.h > > Hunk #1 succeeded at 38 with fuzz 2 (offset 2 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_compat.patch > > patching file src/userspace/librdmacm/src/cma.c > > Hunk #1 succeeded at 157 (offset 16 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_ver_abi.patch > > patching file src/userspace/librdmacm/src/cma.c > > Hunk #2 succeeded at 170 (offset 16 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/mstflint.patch > > patching file src/userspace/mstflint/mtcr.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_add_mra_timeout_limit.patch > > patching file drivers/infiniband/core/cm.c > > Hunk #1 succeeded at 53 (offset -1 lines). > > Hunk #2 succeeded at 2268 (offset -36 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_cleanup_timewait.patch > > patching file drivers/infiniband/core/cm.c > > Hunk #1 succeeded at 686 (offset 7 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_established1.patch > > patching file drivers/infiniband/ulp/sdp/sdp.h > > patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c > > Hunk #1 succeeded at 515 (offset 16 lines). > > patching file drivers/infiniband/ulp/sdp/sdp_cma.c > > patching file drivers/infiniband/ulp/sdp/sdp_main.c > > Hunk #1 succeeded at 589 (offset 26 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_increase_max_cm_retries.patch > > patching file drivers/infiniband/core/cma.c > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_list_init.patch > > patching file drivers/infiniband/core/cma.c > > Hunk #1 succeeded at 328 (offset -11 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_mem_leak.patch > > patching file drivers/infiniband/core/cma.c > > Hunk #1 succeeded at 1713 (offset -241 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_race_fix.patch > > patching file drivers/infiniband/core/cma.c > > Hunk #1 succeeded at 910 with fuzz 1 (offset -113 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_tavor_quirk.patch > > patching file drivers/infiniband/core/cma.c > > Hunk #1 succeeded at 48 with fuzz 2. > > Hunk #2 succeeded at 1154 (offset 27 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ib_sa_names.patch > > patching file include/rdma/ib_sa.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-fixes.patch > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/Makefile > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/Kconfig > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/Makefile > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_common.h > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_cq.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_debug.h > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_diag.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_driver.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_file_ops.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_fs.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_ht400.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_iba6110.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_iba6120.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_init_chip.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_intr.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_kernel.h > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_keys.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_layer.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_layer.h > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_mad.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_mr.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_pe800.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_qp.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_rc.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_registers.h > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_ruc.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_srq.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_stats.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_sysfs.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_uc.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_ud.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_verbs.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_verbs.h > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_verbs_mcast.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_wc_ppc64.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/verbs_debug.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-limit-packets-sent-without-ack.patch > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_qp.c > > Hunk #1 succeeded at 502 (offset -8 lines). > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_rc.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_verbs.c > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_verbs.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-memcpy_cachebypass.patch > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/Makefile > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/ipath_verbs.c > > (Stripping trailing CRs from patch.) > > patching file > > drivers/infiniband/hw/ipath/memcpy_cachebypass_x86_64.S > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-x86_64.patch > > (Stripping trailing CRs from patch.) > > patching file drivers/infiniband/hw/ipath/Kconfig > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_issue3.patch > > patching file drivers/infiniband/ulp/ipoib/ipoib_main.c > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_join_mask.patch > > patching file drivers/infiniband/ulp/ipoib/ipoib_multicast.c > > Hunk #1 succeeded at 471 (offset -1 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_restart.patch > > patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_selector_updated.patch > > patching file drivers/infiniband/ulp/ipoib/ipoib_main.c > > Hunk #2 succeeded at 458 (offset 4 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_attributes.patch > > patching file drivers/infiniband/ulp/srp/ib_srp.c > > Hunk #1 succeeded at 1461 (offset -6 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_remove_reconnect.patch > > patching file drivers/infiniband/ulp/srp/ib_srp.c > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_wa_post_send.patch > > patching file drivers/infiniband/ulp/srp/ib_srp.c > > patching file drivers/infiniband/ulp/srp/ib_srp.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/lockdep_header.patch > > patching file drivers/infiniband/core/uverbs_cmd.c > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_av_statrate.patch > > patching file drivers/infiniband/hw/mthca/mthca_av.c > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_catas_reset.patch > > patching file drivers/infiniband/hw/mthca/mthca_catas.c > > patching file drivers/infiniband/hw/mthca/mthca_main.c > > patching file drivers/infiniband/hw/mthca/mthca_dev.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_mad_traps.patch > > patching file drivers/infiniband/hw/mthca/mthca_mad.c > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_port.patch > > patching file drivers/infiniband/hw/mthca/mthca_provider.c > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_portnum.patch > > patching file drivers/infiniband/hw/mthca/mthca_qp.c > > Hunk #1 succeeded at 478 (offset 4 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_statrate_bits.patch > > patching file drivers/infiniband/hw/mthca/mthca_qp.c > > Hunk #1 succeeded at 414 (offset 4 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_use_uar2.patch > > patching file drivers/infiniband/hw/mthca/mthca_uar.c > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/robert-ipath-diagpkt-init-fixup.patch > > patching file drivers/infiniband/hw/ipath/ipath_diag.c > > Hunk #1 succeeded at 285 (offset -1 lines). > > patching file drivers/infiniband/hw/ipath/ipath_driver.c > > Hunk #1 succeeded at 539 (offset -20 lines). > > Hunk #2 succeeded at 596 with fuzz 1 (offset -105 lines). > > Hunk #3 succeeded at 2029 (offset -156 lines). > > patching file drivers/infiniband/hw/ipath/ipath_kernel.h > > Hunk #1 succeeded at 793 (offset -96 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_credits_by_seq.patch > > patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_post_credits.patch > > patching file drivers/infiniband/ulp/sdp/sdp.h > > Hunk #1 succeeded at 177 (offset 1 line). > > patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c > > Hunk #1 succeeded at 324 (offset 6 lines). > > patching file drivers/infiniband/ulp/sdp/sdp_cma.c > > Hunk #1 succeeded at 434 (offset 4 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_drep_on_not_found.patch > > patching file drivers/infiniband/core/cm.c > > Hunk #1 succeeded at 1890 (offset -10 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_randomize_psn.patch > > patching file drivers/infiniband/core/cm.c > > Hunk #3 succeeded at 81 (offset 7 lines). > > Hunk #5 succeeded at 327 (offset 7 lines). > > Hunk #7 succeeded at 2115 (offset 27 lines). > > Hunk #8 succeeded at 3369 (offset 2 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_unload_crash.patch > > patching file drivers/infiniband/core/cm.c > > Hunk #1 succeeded at 82 (offset 7 lines). > > Hunk #3 succeeded at 656 (offset 6 lines). > > Hunk #5 succeeded at 685 (offset 6 lines). > > Hunk #7 succeeded at 1316 (offset 6 lines). > > Hunk #9 succeeded at 1334 (offset 6 lines). > > Hunk #10 succeeded at 2626 (offset -7 lines). > > Hunk #11 succeeded at 3409 (offset -29 lines). > > Hunk #12 succeeded at 3449 (offset -7 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_establish.patch > > patching file include/rdma/rdma_cm.h > > Hunk #1 succeeded at 241 (offset -15 lines). > > patching file drivers/infiniband/core/cm.c > > Hunk #1 succeeded at 3242 (offset 35 lines). > > patching file drivers/infiniband/core/cma.c > > Hunk #1 succeeded at 759 (offset -81 lines). > > Hunk #3 succeeded at 1752 (offset -212 lines). > > Hunk #4 succeeded at 1997 with fuzz 1. > > Hunk #5 succeeded at 1828 (offset -229 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_hotplug.patch > > patching file drivers/infiniband/core/cma.c > > Hunk #1 succeeded at 278 (offset 7 lines). > > Hunk #3 succeeded at 700 (offset 8 lines). > > Hunk #5 succeeded at 895 with fuzz 1 (offset -9 lines). > > Hunk #6 succeeded at 1382 (offset 6 lines). > > Hunk #7 succeeded at 1610 (offset -9 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_typo_fix.patch > > patching file drivers/infiniband/core/cma.c > > Hunk #1 succeeded at 276 with fuzz 2 (offset 7 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_1_recreate_at_reconnect.patch > > patching file drivers/infiniband/ulp/srp/ib_srp.c > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_2_use_multiple_initiator_ports.patch > > patching file drivers/infiniband/ulp/srp/ib_srp.c > > patching file drivers/infiniband/ulp/srp/ib_srp.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_topspin.patch > > patching file drivers/infiniband/ulp/srp/ib_srp.c > > Hunk #1 succeeded at 358 (offset -1 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_1.patch > > patching file drivers/infiniband/hw/ehca/ehca_main.c > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_2.patch > > patching file drivers/infiniband/hw/ehca/ehca_tools.h > > > > Applying patches for 2.6.9-34.ELsmp kernel (RHAS4 Update 3): > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_1_netevents_revert_to_2_6_17.patch > > patching file drivers/infiniband/core/addr.c > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_3926_to_2_6_13.patch > > patching file drivers/infiniband/core/addr.c > > Hunk #1 succeeded at 327 with fuzz 1 (offset 11 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_4670_to_2_6_9.patch > > patching file drivers/infiniband/core/addr.c > > Hunk #1 succeeded at 27 with fuzz 2. > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/asm_bitops_ia64_to_2_6_11.patch > > patching file include/asm/bitops.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/core_4807_to_2_6_9.patch > > patching file drivers/infiniband/core/sysfs.c > > Hunk #1 succeeded at 438 (offset -4 lines). > > patching file drivers/infiniband/core/user_mad.c > > Hunk #2 succeeded at 677 (offset 91 lines). > > Hunk #3 succeeded at 685 (offset 5 lines). > > Hunk #4 succeeded at 1106 (offset 91 lines). > > Hunk #5 succeeded at 1053 (offset 5 lines). > > patching file drivers/infiniband/core/uverbs_main.c > > Hunk #2 succeeded at 118 (offset 3 lines). > > patching file drivers/infiniband/core/uverbs_mem.c > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/debugfs_to_2_6_9.patch > > patching file drivers/infiniband/include/linux/debugfs.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipath-backport.patch > > patching file drivers/infiniband/hw/ipath/iowrite32_copy_x86_64.S > > patching file drivers/infiniband/hw/ipath/ipath_backport.h > > patching file drivers/infiniband/hw/ipath/ipath_diag.c > > patching file drivers/infiniband/hw/ipath/ipath_driver.c > > Hunk #2 succeeded at 557 (offset 1 line). > > Hunk #3 succeeded at 599 (offset 1 line). > > Hunk #4 succeeded at 1366 (offset 1 line). > > Hunk #5 succeeded at 1395 (offset 1 line). > > Hunk #6 succeeded at 1875 (offset 1 line). > > Hunk #7 succeeded at 1903 (offset 1 line). > > Hunk #8 succeeded at 1984 (offset -9 lines). > > Hunk #9 succeeded at 2027 (offset 1 line). > > Hunk #10 succeeded at 2142 (offset -9 lines). > > patching file drivers/infiniband/hw/ipath/ipath_file_ops.c > > patching file drivers/infiniband/hw/ipath/ipath_fs.c > > patching file drivers/infiniband/hw/ipath/ipath_iba6110.c > > patching file drivers/infiniband/hw/ipath/ipath_iba6120.c > > patching file drivers/infiniband/hw/ipath/ipath_init_chip.c > > patching file drivers/infiniband/hw/ipath/ipath_kernel.h > > patching file drivers/infiniband/hw/ipath/ipath_layer.c > > patching file drivers/infiniband/hw/ipath/ipath_sysfs.c > > patching file drivers/infiniband/hw/ipath/ipath_user_pages.c > > patching file drivers/infiniband/hw/ipath/ipath_verbs.c > > patching file drivers/infiniband/hw/ipath/ipath_verbs.h > > patching file drivers/infiniband/hw/ipath/Makefile > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_5010_to_2_6_9.patch > > patching file drivers/infiniband/include/linux/if_infiniband.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_8111_to_2_6_16.patch > > patching file drivers/infiniband/ulp/ipoib/ipoib_main.c > > Hunk #2 succeeded at 803 (offset 49 lines). > > patching file drivers/infiniband/ulp/ipoib/ipoib.h > > Hunk #1 succeeded at 46 (offset -1 lines). > > Hunk #2 succeeded at 220 (offset 1 line). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_device_5496_to_2_6_15.patch > > patching file drivers/infiniband/include/linux/device.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_err_to_2_6_11.patch > > patching file include/linux/err.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_idr_6554_to_2_6_13.patch > > patching file drivers/infiniband/include/linux/idr.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_inetdevice_to_2_6_17.patch > > patching file drivers/infiniband/include/linux/inetdevice.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_lockdep_to_2_6_17.patch > > patching file include/linux/lockdep.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_mutex_5947_to_2_6_15.patch > > patching file drivers/infiniband/include/linux/mutex.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_netdevice_to_2_6_17.patch > > patching file drivers/infiniband/include/linux/netdevice.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_pci_7970_to_2_6_9.patch > > patching file drivers/infiniband/include/linux/pci.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_scatterlist_6369_to_2_6_9.patch > > patching file drivers/infiniband/include/linux/scatterlist.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_signal_to_2_6_17.patch > > patching file drivers/infiniband/include/linux/signal.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_skbuff_6754_to_2_6_11.patch > > patching file include/linux/skbuff.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_spinlock_5883_to_2_6_9.patch > > patching file drivers/infiniband/include/linux/spinlock.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/makefile_to_2_6_9.patch > > patching file drivers/infiniband/ulp/srp/Makefile > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_dev_3465_to_2_6_11.patch > > patching file drivers/infiniband/hw/mthca/mthca_dev.h > > Hunk #1 succeeded at 57 with fuzz 2 (offset 4 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_provider_3465_to_2_6_9.patch > > patching file drivers/infiniband/hw/mthca/mthca_provider.c > > Hunk #1 succeeded at 387 (offset 28 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_inet_sock_6754_to_2_6_15.patch > > patching file include/net/inet_sock.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_1_6754_to_2_6_13.patch > > patching file include/net/sock.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_2_6754_to_2_6_11.patch > > patching file include/net/sock.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_tcp_states_6754_to_2_6_13.patch > > patching file include/net/tcp_states.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/read_mostly_6255_to_2_6_13.patch > > patching file drivers/infiniband/include/linux/cache.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/scsi_7242_to_2_6_14.patch > > patching file include/scsi/scsi.h > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/sdp_7277_to_2_6_11.patch > > patching file drivers/infiniband/ulp/sdp/sdp_main.c > > Hunk #1 succeeded at 418 (offset 118 lines). > > Hunk #2 succeeded at 535 (offset 41 lines). > > Hunk #3 succeeded at 633 (offset 118 lines). > > Hunk #4 succeeded at 1408 (offset 245 lines). > > Hunk #5 succeeded at 1301 (offset 118 lines). > > Hunk #6 succeeded at 1537 (offset 245 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_4030_to_2_6_12.patch > > patching file drivers/infiniband/ulp/srp/ib_srp.c > > Hunk #1 succeeded at 1594 (offset 271 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_7312_to_2_6_11.patch > > patching file drivers/infiniband/ulp/srp/ib_srp.c > > Hunk #1 succeeded at 1258 (offset -44 lines). > > Hunk #3 succeeded at 1332 (offset -42 lines). > > Hunk #5 succeeded at 1360 with fuzz 2 (offset -40 lines). > > Hunk #6 succeeded at 1404 with fuzz 2 (offset -3 lines). > > Hunk #7 succeeded at 1377 (offset -40 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_scsi_scan_target_7242_to_2_6_11.patch > > patching file drivers/infiniband/ulp/srp/ib_srp.c > > Hunk #1 succeeded at 975 (offset 26 lines). > > Hunk #2 succeeded at 1505 (offset 24 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/top_2844_to_2_6_11.patch > > patching file drivers/infiniband/Makefile > > Hunk #1 succeeded at 1 with fuzz 2. > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucm_5245_to_2_6_9.patch > > patching file drivers/infiniband/core/ucm.c > > Hunk #1 succeeded at 1270 (offset -8 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucma_6607_to_2_6_9.patch > > patching file drivers/infiniband/core/ucma.c > > Hunk #1 succeeded at 861 (offset 88 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/user_mad_4603_to_2_6_9.patch > > patching file drivers/infiniband/core/user_mad.c > > Hunk #1 succeeded at 857 (offset -20 lines). > > Hunk #3 succeeded at 1086 (offset -20 lines). > > Hunk #5 succeeded at 1123 (offset -20 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_main_3935_to_2_6_9.patch > > patching file drivers/infiniband/core/uverbs_main.c > > Hunk #1 succeeded at 727 (offset 11 lines). > > Hunk #2 succeeded at 949 (offset 1 line). > > Hunk #3 succeeded at 975 (offset 11 lines). > > Hunk #4 succeeded at 986 (offset 3 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_to_2_6_17.patch > > patching file drivers/infiniband/core/uverbs_main.c > > Hunk #1 succeeded at 1011 with fuzz 1 (offset 196 lines). > > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/hpage_patches/hpages.patch > > patching file drivers/infiniband/core/uverbs_mem.c > > /bin/rm -f /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache > > cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/examples > > cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs > > Running: ./configure > > --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache > > --disable-libcheck --prefix /usr/local/ > > ofed --libdir /usr/local/ofed/lib64 > > CPPFLAGS="-I../libibverbs/include" > > configure: creating > > cache /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache > > checking for a BSD-compatible install... /usr/bin/install -c > > checking whether build environment is sane... yes > > checking for gawk... gawk > > checking whether make sets $(MAKE)... yes > > checking build system type... x86_64-redhat-linux-gnu > > checking host system type... x86_64-redhat-linux-gnu > > checking for style of include used by make... GNU > > checking for gcc... gcc > > checking for C compiler default output file name... configure: > > error: C compiler cannot create executables > > See `config.log' for more details. > > Failed to execute: ./configure > > --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache > > --disable-libcheck --prefix / > > usr/local/ofed --libdir /usr/local/ofed/lib64 > > CPPFLAGS="-I../libibverbs/include" > > error: Bad exit status from /var/tmp/rpm-tmp.43267 (%install) > > > > > > RPM build errors: > > user vlad does not exist - using root > > group mtl does not exist - using root > > user vlad does not exist - using root > > group mtl does not exist - using root > > Bad exit status from /var/tmp/rpm-tmp.43267 (%install) > > ERROR: Failed executing "rpmbuild --rebuild --define > > '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' > > --define > > 'build_root /var/tmp/OFED' --define 'configure_options > > --with-libibcm --with-libibverbs --with-libipathverbs --with-libmth > > ca --with-librdmacm --with-mstflint --with-perftest > > --with-ipath_inf-mod --with-ipoib-mod --with-mthca-mod > > --with-core-mod > > --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod' > > --define 'configure_options32 %{nil}' --define 'KVERSION > > 2.6.9-34.ELsmp' --define 'KSRC /lib/modules/2.6.9-34.ELsmp/build' > > --define 'build_kernel_ib 1' --define 'build_kernel_ib_de > > vel 1' --define 'NETWORK_CONF_DIR /etc/sysconfig/network-scripts' > > --define 'modprobe_update 1' --define 'include_ipoib_conf > > 1' --define 'build_32bit > > 0' /home/caton/OFED-1.1/SRPMS/openib-1.1-0.src.rpm" > > > > --------------------------------------------------------- > > > > Thanks a lot and best regards > > > > Julio. > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > Julio. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Fri Jun 22 08:55:22 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 22 Jun 2007 08:55:22 -0700 Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support In-Reply-To: <1182373280.15653.335513.camel@hal.voltaire.com> References: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com> <1182373280.15653.335513.camel@hal.voltaire.com> Message-ID: <467BF0EA.2090609@ichips.intel.com> > Just two things: > 1. It might be better if the ABI version 5 warning message for only > pkey_index 0 being supported comes out at umad_init time rather than > umad_set_pkey time so that the user is not swamped with these. Placing the warning in umad_init would display it even if the app only used pkey_index 0, so keeping it in umad_set_pkey seems better to me. We could make it so that the warning message only displays once though. > 2. There is one pathological combination. It would be using 2.6.23 (with > the new user_mad ABI version 6), an updated libibumad would be required, > but an older libvendor (osm_vendor_ibumad.c without your one line > change). That might be the case with someone who swapped back and forth > between OFED 1.2 and master in some scenarios. I don't know how we can support all combinations, especially since the return codes aren't being checked. We can make a special case when umad_set_pkey() is called with 0xffff on ABI 6, and display a warning message and/or convert it to the correct index. > Also, this does not quite work as expected. An error was returned based > on the bad pkey index but I do see a send on the IB link (with a bad > pkey). I wouldn't have expected the latter part. Maybe this is a driver > or firmware issue. Not sure yet. I suppose there should be some > pkey_index validation (to make sure it is within the device's valid > range) and that should also ultimately get added to libibumad or should > such validation go into the user_mad kernel module ? I think if we want to validate that the pkey_index is reasonable, the check should go in the kernel. - Sean From mshefty at ichips.intel.com Fri Jun 22 09:07:18 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 22 Jun 2007 09:07:18 -0700 Subject: [ofa-general] [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> <467996C4.1060201@ichips.intel.com> Message-ID: <467BF3B6.9070800@ichips.intel.com> > I'm beginning to think that just updating the ABI might be the right > answer. But let's try to make this be the last ABI break. Are we > pretty sure there's *nothing* else we might ever want to add to the > structure? I can't think of anything right now... Some other random thoughts... we've never agreed on what approach to use if we ever want to expose direct IB multicast support or event registration. I created a separate module for this for PathForward, but there may be a way to expose that functionality through the user_mad interface. (Personally, I'd like to export any desired functionality to the user through other interfaces, like the rdma_cm or verbs.) - Sean From rdreier at cisco.com Fri Jun 22 09:17:37 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 22 Jun 2007 09:17:37 -0700 Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: <20070622052700.GP4857@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 22 Jun 2007 08:27:00 +0300") References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> <467996C4.1060201@ichips.intel.com> <20070622052700.GP4857@mellanox.co.il> Message-ID: > Ugh. OFED 1.2 (with the old ABI) just went out. > I wonder - is it time to start making the kernel backwards-compatible? > It would be trivial to have userspace supply its own ABI > version and have kernel support both new and old ABI if we want to. > What do you think? There's always a balance between keeping cruft in the kernel for compatibility and not breaking userspace. I'm beginning to think the right plan in this case might be to rename struct ib_user_mad_hdr to struct ib_user_mad_hdr_old, make a new struct ib_user_mad with the pkey_index member and add a new ioctl IB_USER_MAD_ENABLE_PKEY_INDEX. The ABI version would stay the same, and if someone just opened the device and didn't do the IB_USER_MAD_ENABLE_PKEY_INDEX they would get the old ABI. If they do the ioctl then they get the new header. Also we could define that ABI version 6 just has the new struct ib_user_mad_hdr and no ioctl. Then we could say we were going to switch to the new ABI in a year or two. And print a warning in the kernel log for every application that doesn't use the ioctl. I'll try to cook up a kernel patch next week. - R. From rdreier at cisco.com Fri Jun 22 09:19:51 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 22 Jun 2007 09:19:51 -0700 Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: <20070622051201.GM4857@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 22 Jun 2007 08:12:01 +0300") References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> <467996C4.1060201@ichips.intel.com> <20070621033854.GF8868@mellanox.co.il> <20070622051201.GM4857@mellanox.co.il> Message-ID: > We could have asked all users to use pwrite with offset 0, and then other I > think pos field would be useful for other things like versioning. As it is, > people use write to pass in MADs, so I'm not sure what does pos point to. Oh... I don't think that's a very good interface. I don't think people expect character special files to pay attention to offsets, especially not in a magic way. It's probably better to just use read/write for IO and ioctl for control stuff. - R. From halr at voltaire.com Fri Jun 22 09:25:03 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Jun 2007 12:25:03 -0400 Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support In-Reply-To: <467BF0EA.2090609@ichips.intel.com> References: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com> <1182373280.15653.335513.camel@hal.voltaire.com> <467BF0EA.2090609@ichips.intel.com> Message-ID: <1182529502.10379.52789.camel@hal.voltaire.com> On Fri, 2007-06-22 at 11:55, Sean Hefty wrote: > > Just two things: > > 1. It might be better if the ABI version 5 warning message for only > > pkey_index 0 being supported comes out at umad_init time rather than > > umad_set_pkey time so that the user is not swamped with these. > > Placing the warning in umad_init would display it even if the app only > used pkey_index 0, so keeping it in umad_set_pkey seems better to me. > We could make it so that the warning message only displays once though. Sure. That would be better IMO too. > > 2. There is one pathological combination. It would be using 2.6.23 (with > > the new user_mad ABI version 6), an updated libibumad would be required, > > but an older libvendor (osm_vendor_ibumad.c without your one line > > change). That might be the case with someone who swapped back and forth > > between OFED 1.2 and master in some scenarios. > > I don't know how we can support all combinations, especially since the > return codes aren't being checked. We can make a special case when > umad_set_pkey() is called with 0xffff on ABI 6, and display a warning > message and/or convert it to the correct index. Yes, but this would eliminate the case where some implementation supported the max pkeys. That's purely theoretical and no one is even close to that max yet. > > Also, this does not quite work as expected. An error was returned based > > on the bad pkey index but I do see a send on the IB link (with a bad > > pkey). I wouldn't have expected the latter part. Maybe this is a driver > > or firmware issue. Not sure yet. I suppose there should be some > > pkey_index validation (to make sure it is within the device's valid > > range) and that should also ultimately get added to libibumad or should > > such validation go into the user_mad kernel module ? > > I think if we want to validate that the pkey_index is reasonable, the > check should go in the kernel. Yes, that was my thinking too. -- Hal > - Sean From rdreier at cisco.com Fri Jun 22 09:26:05 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 22 Jun 2007 09:26:05 -0700 Subject: [ofa-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get a few fixes for crashes/deadlocks as a well as a few other small, safe fixes: Jack Morgenstein (1): IB/mlx4: Correct max_srq_wr returned from mlx4_ib_query_device() Michael S. Tsirkin (2): IPoIB/cm: Initialize RX before moving QP to RTR IPoIB/cm: Fix interoperability when MTU doesn't match Roland Dreier (2): IB/umem: Fix possible hang on process exit IPoIB/cm: Remove dead definition of struct ipoib_cm_id drivers/infiniband/core/umem.c | 16 ++++++----- drivers/infiniband/hw/mlx4/main.c | 2 +- drivers/infiniband/ulp/ipoib/ipoib_cm.c | 42 ++++++++++++++---------------- 3 files changed, 30 insertions(+), 30 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index b4aec51..d40652a 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -225,13 +225,15 @@ void ib_umem_release(struct ib_umem *umem) * up here and not be able to take the mmap_sem. In that case * we defer the vm_locked accounting to the system workqueue. */ - if (context->closing && !down_write_trylock(&mm->mmap_sem)) { - INIT_WORK(&umem->work, ib_umem_account); - umem->mm = mm; - umem->diff = diff; - - schedule_work(&umem->work); - return; + if (context->closing) { + if (!down_write_trylock(&mm->mmap_sem)) { + INIT_WORK(&umem->work, ib_umem_account); + umem->mm = mm; + umem->diff = diff; + + schedule_work(&umem->work); + return; + } } else down_write(&mm->mmap_sem); diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 1095c82..c591616 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -120,7 +120,7 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, props->max_qp_init_rd_atom = dev->dev->caps.max_qp_init_rdma; props->max_res_rd_atom = props->max_qp_rd_atom * props->max_qp; props->max_srq = dev->dev->caps.num_srqs - dev->dev->caps.reserved_srqs; - props->max_srq_wr = dev->dev->caps.max_srq_wqes; + props->max_srq_wr = dev->dev->caps.max_srq_wqes - 1; props->max_srq_sge = dev->dev->caps.max_srq_sge; props->local_ca_ack_delay = dev->dev->caps.local_ca_ack_delay; props->atomic_cap = dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_ATOMIC ? diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 076a0bb..5ffc464 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -56,13 +56,6 @@ MODULE_PARM_DESC(cm_data_debug_level, #define IPOIB_CM_RX_DELAY (3 * 256 * HZ) #define IPOIB_CM_RX_UPDATE_MASK (0x3) -struct ipoib_cm_id { - struct ib_cm_id *id; - int flags; - u32 remote_qpn; - u32 remote_mtu; -}; - static struct ib_qp_attr ipoib_cm_err_attr = { .qp_state = IB_QPS_ERR }; @@ -309,6 +302,11 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even return -ENOMEM; p->dev = dev; p->id = cm_id; + cm_id->context = p; + p->state = IPOIB_CM_RX_LIVE; + p->jiffies = jiffies; + INIT_LIST_HEAD(&p->list); + p->qp = ipoib_cm_create_rx_qp(dev, p); if (IS_ERR(p->qp)) { ret = PTR_ERR(p->qp); @@ -320,24 +318,24 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even if (ret) goto err_modify; + spin_lock_irq(&priv->lock); + queue_delayed_work(ipoib_workqueue, + &priv->cm.stale_task, IPOIB_CM_RX_DELAY); + /* Add this entry to passive ids list head, but do not re-add it + * if IB_EVENT_QP_LAST_WQE_REACHED has moved it to flush list. */ + p->jiffies = jiffies; + if (p->state == IPOIB_CM_RX_LIVE) + list_move(&p->list, &priv->cm.passive_ids); + spin_unlock_irq(&priv->lock); + ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn); if (ret) { ipoib_warn(priv, "failed to send REP: %d\n", ret); - goto err_rep; + if (ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE)) + ipoib_warn(priv, "unable to move qp to error state\n"); } - - cm_id->context = p; - p->jiffies = jiffies; - p->state = IPOIB_CM_RX_LIVE; - spin_lock_irq(&priv->lock); - if (list_empty(&priv->cm.passive_ids)) - queue_delayed_work(ipoib_workqueue, - &priv->cm.stale_task, IPOIB_CM_RX_DELAY); - list_add(&p->list, &priv->cm.passive_ids); - spin_unlock_irq(&priv->lock); return 0; -err_rep: err_modify: ib_destroy_qp(p->qp); err_qp: @@ -754,9 +752,9 @@ static int ipoib_cm_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even p->mtu = be32_to_cpu(data->mtu); - if (p->mtu < priv->dev->mtu + IPOIB_ENCAP_LEN) { - ipoib_warn(priv, "Rejecting connection: mtu %d < device mtu %d + 4\n", - p->mtu, priv->dev->mtu); + if (p->mtu <= IPOIB_ENCAP_LEN) { + ipoib_warn(priv, "Rejecting connection: mtu %d <= %d\n", + p->mtu, IPOIB_ENCAP_LEN); return -EINVAL; } From halr at voltaire.com Fri Jun 22 09:34:59 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Jun 2007 12:34:59 -0400 Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> <467996C4.1060201@ichips.intel.com> <20070622052700.GP4857@mellanox.co.il> Message-ID: <1182530097.10379.53476.camel@hal.voltaire.com> On Fri, 2007-06-22 at 12:17, Roland Dreier wrote: > > Ugh. OFED 1.2 (with the old ABI) just went out. > > I wonder - is it time to start making the kernel backwards-compatible? > > It would be trivial to have userspace supply its own ABI > > version and have kernel support both new and old ABI if we want to. > > What do you think? > > There's always a balance between keeping cruft in the kernel for > compatibility and not breaking userspace. I'm beginning to think the > right plan in this case might be to rename struct ib_user_mad_hdr to > struct ib_user_mad_hdr_old, make a new struct ib_user_mad with the > pkey_index member and add a new ioctl IB_USER_MAD_ENABLE_PKEY_INDEX. > > The ABI version would stay the same, and if someone just opened the > device and didn't do the IB_USER_MAD_ENABLE_PKEY_INDEX they would get > the old ABI. If they do the ioctl then they get the new header. Also > we could define that ABI version 6 just has the new struct > ib_user_mad_hdr and no ioctl. > > Then we could say we were going to switch to the new ABI in a year or > two. And print a warning in the kernel log for every application that > doesn't use the ioctl. This seems like a good approach to me. The only question is what happens with apps which enable the pkey index mode but run on an older kernel which does not support this. They would get an error back (-ENOIOCTLCMD) from user_mad. They could either error out on this or continue on depending on what the app wants to do. > I'll try to cook up a kernel patch next week. Thanks. -- Hal > - R. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From ralph.campbell at qlogic.com Fri Jun 22 09:36:19 2007 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Fri, 22 Jun 2007 09:36:19 -0700 Subject: [ofa-general] [PATCH] IB/libipathverbs - add barrier before updating head index in shared memory Message-ID: <1182530179.18911.210.camel@brick.pathscale.com> Add a barrier to make sure the CPU doesn't reorder writes to shared kernel memory when posting WQEs or reorder reads when polling CQs. Signed-off-by: Ralph Campbell diff --git a/src/verbs.c b/src/verbs.c index b2324d8..57c78dd 100644 --- a/src/verbs.c +++ b/src/verbs.c @@ -290,6 +290,8 @@ int ipath_poll_cq(struct ibv_cq *ibcq, int ne, struct ibv_wc *wc) for (npolled = 0; npolled < ne; ++npolled, ++wc) { if (tail == q->head) break; + /* Make sure entry is read after head index is read. */ + rmb(); memcpy(wc, &q->queue[tail], sizeof(*wc)); if (tail == cq->ibv_cq.cqe) tail = 0; @@ -441,6 +443,8 @@ static int post_recv(struct ipath_rq *rq, struct ibv_recv_wr *wr, wqe->num_sge = i->num_sge; for (n = 0; n < wqe->num_sge; n++) wqe->sg_list[n] = i->sg_list[n]; + /* Make sure queue entry is written before the head index. */ + wmb(); rwq->head = head; } ret = 0; From rdreier at cisco.com Fri Jun 22 09:50:37 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 22 Jun 2007 09:50:37 -0700 Subject: [ofa-general] Re: [PATCH] IB/libipathverbs - add barrier before updating head index in shared memory In-Reply-To: <1182530179.18911.210.camel@brick.pathscale.com> (Ralph Campbell's message of "Fri, 22 Jun 2007 09:36:19 -0700") References: <1182530179.18911.210.camel@brick.pathscale.com> Message-ID: Seems like it's probably correct. I guess you should add this to the git tree. What is the main libipathverbs repository now? I'm assuming it's not going to be under ~bos.... From rdreier at cisco.com Fri Jun 22 09:53:06 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 22 Jun 2007 09:53:06 -0700 Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: <1182530097.10379.53476.camel@hal.voltaire.com> (Hal Rosenstock's message of "22 Jun 2007 12:34:59 -0400") References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> <467996C4.1060201@ichips.intel.com> <20070622052700.GP4857@mellanox.co.il> <1182530097.10379.53476.camel@hal.voltaire.com> Message-ID: > The only question is what happens with apps which enable the pkey index > mode but run on an older kernel which does not support this. They would > get an error back (-ENOIOCTLCMD) from user_mad. They could either error > out on this or continue on depending on what the app wants to do. Yes, but I think that is purely up to the application. Userspace knows about the new interface, the kernel doesn't, and the application has to decide how important the pkey_index stuff is. Given that we don't have a time machine (so the behavior of old kernels is frozen) I don't see anything we can do to make this any better. - R. From ralph.campbell at qlogic.com Fri Jun 22 10:04:21 2007 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Fri, 22 Jun 2007 10:04:21 -0700 Subject: [ofa-general] Re: [PATCH] IB/libipathverbs - add barrier before updating head index in shared memory In-Reply-To: References: <1182530179.18911.210.camel@brick.pathscale.com> Message-ID: <1182531862.18911.211.camel@brick.pathscale.com> On Fri, 2007-06-22 at 09:50 -0700, Roland Dreier wrote: > Seems like it's probably correct. > > I guess you should add this to the git tree. What is the main > libipathverbs repository now? I'm assuming it's not going to be under > ~bos.... Right. I'm working on that with Johann George today and will post email when I have the answer. From mhanafi at csc.com Fri Jun 22 10:23:25 2007 From: mhanafi at csc.com (Mahmoud Hanafi) Date: Fri, 22 Jun 2007 13:23:25 -0400 Subject: [ofa-general] problem with ofed 1.1. In-Reply-To: <1182525727.5695.29.camel@linux.site> Message-ID: When the build fails don't delete the temp directories. Look in /var/tmp/OFEDRPM/BUILD/openib-1.1/config.log for additional info on the error message. -Mahmoud -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Julio del Río Sent by: general-bounces at lists.openfabrics.org 06/22/2007 11:22 AM To general at lists.openfabrics.org cc Subject Re: [ofa-general] problem with ofed 1.1. [root at localhost root]# rpm -qa | grep gcc libgcc-3.3.3-7 gcc-g77-3.3.3-7 gcc-3.3.3-7 gcc-objc-3.3.3-7 compat-gcc-c++-7.3-2.96.126 gcc-gnat-3.3.3-7 compat-gcc-7.3-2.96.126 gcc34-3.4.0-1 gcc34-c++-3.4.0-1 libgcc-3.3.3-7 gcc-c++-3.3.3-7 gcc-java-3.3.3-7 gcc34-java-3.4.0-1 [root at localhost root]# rpm -qa | grep libc libcroco-0.4.0-4 libcap-devel-1.10-18.1 libc-client-devel-2002e-5 glibc-2.3.3-27 glibc-kernheaders-2.4-8.44 glibc-utils-2.3.3-27 glibc-2.3.3-27 glibc-profile-2.3.3-27 glibc-common-2.3.3-27 glibc-devel-2.3.3-27 libc-client-2002e-5 libcap-1.10-18.1 glibc-headers-2.3.3-27 Thanks a lot and best regards El vie, 22-06-2007 a las 11:04 -0400, Mahmoud Hanafi escribió: Do you have gcc and glibc-devel.x86_64 installed? -Mahmoud -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Julio del Río Sent by: general-bounces at lists.openfabrics.org 06/22/2007 04:34 AM To general at lists.openfabrics.org cc Subject [ofa-general] problem with ofed 1.1. Good morning, I hope you could help me with this: I have this config: - Fedora Core 2 - Linux localhost.localdomain 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux - HCA Mellanox MHGS18-XTC - Flextronic Switch F-X430047 - Ofed 1.1 and trying to install, this is the error log file I get: --------------------------------------------------------- + STATUS=0 + '[' 0 -ne 0 ']' + cd openib-1.1 ++ /usr/bin/id -u + '[' 0 = 0 ']' + /bin/chown -Rhf root . ++ /usr/bin/id -u + '[' 0 = 0 ']' + /bin/chgrp -Rhf root . + /bin/chmod -Rf a+rX,u+w,g-w,o-w . + exit 0 Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.43267 + umask 022 + cd /var/tmp/OFEDRPM/BUILD + cd openib-1.1 + LANG=C + export LANG + unset DISPLAY + rm -rf /var/tmp/OFED + cd /var/tmp/OFEDRPM/BUILD/openib-1.1 + mkdir -p /var/tmp/OFED//usr/local/ofed/src + cp -a /var/tmp/OFEDRPM/BUILD/openib-1.1 /var/tmp/OFED//usr/local/ofed/src + ./configure --prefix=/usr/local/ofed --libdir=/usr/local/ofed/lib64 --kernel-version 2.6.9-34.ELsmp --kernel-sources /lib /modules/2.6.9-34.ELsmp/build --with-libibcm --with-libibverbs --with-libipathverbs --with-libmthca --with-librdmacm --with -mstflint --with-perftest --with-ipath_inf-mod --with-ipoib-mod --with-mthca-mod --with-core-mod --with-user_mad-mod --with -user_access-mod --with-addr_trans-mod Quilt does not exist... Going to use patch. Created configure.mk: prefix=/usr/local/ofed PREFIX="--prefix /usr/local/ofed" libdir=/usr/local/ofed/lib64 # Current working directory CWD=/var/tmp/OFEDRPM/BUILD/openib-1.1 # Kernel level KVERSION=2.6.9-34.ELsmp EXTRAVERSION=-34.ELsmp MODULES_DIR=/lib/modules/2.6.9-34.ELsmp KSRC=/lib/modules/2.6.9-34.ELsmp/build AUTOCONF_H=/var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h WITH_MEMTRACK=no WITH_MAKE_PARAMS= CONFIG_INFINIBAND=m CONFIG_INFINIBAND_IPOIB=m CONFIG_INFINIBAND_SDP= CONFIG_INFINIBAND_SRP= CONFIG_INFINIBAND_USER_MAD=m CONFIG_INFINIBAND_USER_ACCESS=m CONFIG_INFINIBAND_ADDR_TRANS=y CONFIG_INFINIBAND_MTHCA=m CONFIG_INFINIBAND_IPOIB_DEBUG=y CONFIG_INFINIBAND_ISER= CONFIG_INFINIBAND_EHCA= CONFIG_INFINIBAND_EHCA_SCALING= CONFIG_INFINIBAND_RDS= CONFIG_INFINIBAND_RDS_DEBUG= CONFIG_INFINIBAND_MADEYE= CONFIG_INFINIBAND_IPOIB_DEBUG_DATA= CONFIG_INFINIBAND_SDP_SEND_ZCOPY= CONFIG_INFINIBAND_SDP_RECV_ZCOPY= CONFIG_INFINIBAND_SDP_DEBUG= CONFIG_INFINIBAND_SDP_DEBUG_DATA= CONFIG_INFINIBAND_IPATH=m CONFIG_INFINIBAND_MTHCA_DEBUG=y # User level WITH_IBVERBS=yes WITH_MTHCA=yes WITH_IPATHVERBS=yes WITH_EHCA=no WITH_CM=yes WITH_SDP=no WITH_DAPL=no WITH_RDMACM=yes WITH_MANAGEMENT_LIBS=no WITH_OSM=no WITH_DIAGS=no WITH_MPI=no WITH_PERFTEST=yes WITH_SRPTOOLS=no WITH_IPOIBTOOLS=no WITH_TVFLASH=no WITH_MSTFLINT=yes Created /var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h: #undef CONFIG_INFINIBAND #undef CONFIG_INFINIBAND_IPOIB #undef CONFIG_INFINIBAND_SDP #undef CONFIG_INFINIBAND_SRP #undef CONFIG_INFINIBAND_USER_MAD #undef CONFIG_INFINIBAND_USER_ACCESS #undef CONFIG_INFINIBAND_ADDR_TRANS #undef CONFIG_INFINIBAND_MTHCA #undef CONFIG_INFINIBAND_IPOIB_DEBUG #undef CONFIG_INFINIBAND_ISER #undef CONFIG_INFINIBAND_EHCA #undef CONFIG_INFINIBAND_EHCA_SCALING #undef CONFIG_INFINIBAND_RDS #undef CONFIG_INFINIBAND_RDS_DEBUG #undef CONFIG_INFINIBAND_MADEYE #undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA #undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY #undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY #undef CONFIG_INFINIBAND_SDP_DEBUG #undef CONFIG_INFINIBAND_SDP_DEBUG_DATA #undef CONFIG_INFINIBAND_IPATH #undef CONFIG_INFINIBAND_MTHCA_DEBUG #define CONFIG_INFINIBAND 1 #define CONFIG_INFINIBAND_IPOIB 1 #undef CONFIG_INFINIBAND_SDP #undef CONFIG_INFINIBAND_SRP #define CONFIG_INFINIBAND_USER_MAD 1 #define CONFIG_INFINIBAND_USER_ACCESS 1 #define CONFIG_INFINIBAND_ADDR_TRANS 1 #define CONFIG_INFINIBAND_MTHCA 1 #define CONFIG_INFINIBAND_IPOIB_DEBUG 1 #undef CONFIG_INFINIBAND_ISER #undef CONFIG_INFINIBAND_EHCA #undef CONFIG_INFINIBAND_RDS #undef CONFIG_INFINIBAND_RDS_DEBUG #undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA #undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY #undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY #undef CONFIG_INFINIBAND_SDP_DEBUG #undef CONFIG_INFINIBAND_SDP_DEBUG_DATA #define CONFIG_INFINIBAND_IPATH 1 #define CONFIG_INFINIBAND_MTHCA_DEBUG 1 #undef CONFIG_INFINIBAND_MADEYE mkdir -p /var/tmp/OFEDRPM/BUILD/openib-1.1/patches touch /var/tmp/OFEDRPM/BUILD/openib-1.1/patches/quiltrc /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/dapl_qp_attr.patch patching file src/userspace/dapl/dapl/openib_cma/dapl_ib_util.c patching file src/userspace/dapl/dapl/openib_scm/dapl_ib_util.c /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_cq_deadlock.patch patching file src/userspace/libmthca/src/verbs.c Hunk #1 succeeded at 614 (offset -8 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_stddef.patch patching file src/userspace/libmthca/src/mthca.h Hunk #1 succeeded at 38 with fuzz 2 (offset 2 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_compat.patch patching file src/userspace/librdmacm/src/cma.c Hunk #1 succeeded at 157 (offset 16 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_ver_abi.patch patching file src/userspace/librdmacm/src/cma.c Hunk #2 succeeded at 170 (offset 16 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/mstflint.patch patching file src/userspace/mstflint/mtcr.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_add_mra_timeout_limit.patch patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 53 (offset -1 lines). Hunk #2 succeeded at 2268 (offset -36 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_cleanup_timewait.patch patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 686 (offset 7 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_established1.patch patching file drivers/infiniband/ulp/sdp/sdp.h patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c Hunk #1 succeeded at 515 (offset 16 lines). patching file drivers/infiniband/ulp/sdp/sdp_cma.c patching file drivers/infiniband/ulp/sdp/sdp_main.c Hunk #1 succeeded at 589 (offset 26 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_increase_max_cm_retries.patch patching file drivers/infiniband/core/cma.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_list_init.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 328 (offset -11 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_mem_leak.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 1713 (offset -241 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_race_fix.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 910 with fuzz 1 (offset -113 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_tavor_quirk.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 48 with fuzz 2. Hunk #2 succeeded at 1154 (offset 27 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ib_sa_names.patch patching file include/rdma/ib_sa.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-fixes.patch (Stripping trailing CRs from patch.) patching file drivers/infiniband/Makefile (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/Kconfig (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/Makefile (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_common.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_cq.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_debug.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_diag.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_driver.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_file_ops.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_fs.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_ht400.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_iba6110.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_iba6120.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_init_chip.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_intr.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_kernel.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_keys.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_layer.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_layer.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_mad.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_mr.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_pe800.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_qp.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_rc.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_registers.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_ruc.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_srq.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_stats.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_sysfs.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_uc.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_ud.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs.h (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs_mcast.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_wc_ppc64.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/verbs_debug.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-limit-packets-sent-without-ack.patch (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_qp.c Hunk #1 succeeded at 502 (offset -8 lines). (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_rc.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-memcpy_cachebypass.patch (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/Makefile (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/ipath_verbs.c (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/memcpy_cachebypass_x86_64.S /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-x86_64.patch (Stripping trailing CRs from patch.) patching file drivers/infiniband/hw/ipath/Kconfig /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_issue3.patch patching file drivers/infiniband/ulp/ipoib/ipoib_main.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_join_mask.patch patching file drivers/infiniband/ulp/ipoib/ipoib_multicast.c Hunk #1 succeeded at 471 (offset -1 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_restart.patch patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_selector_updated.patch patching file drivers/infiniband/ulp/ipoib/ipoib_main.c Hunk #2 succeeded at 458 (offset 4 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_attributes.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 1461 (offset -6 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_remove_reconnect.patch patching file drivers/infiniband/ulp/srp/ib_srp.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_wa_post_send.patch patching file drivers/infiniband/ulp/srp/ib_srp.c patching file drivers/infiniband/ulp/srp/ib_srp.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/lockdep_header.patch patching file drivers/infiniband/core/uverbs_cmd.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_av_statrate.patch patching file drivers/infiniband/hw/mthca/mthca_av.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_catas_reset.patch patching file drivers/infiniband/hw/mthca/mthca_catas.c patching file drivers/infiniband/hw/mthca/mthca_main.c patching file drivers/infiniband/hw/mthca/mthca_dev.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_mad_traps.patch patching file drivers/infiniband/hw/mthca/mthca_mad.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_port.patch patching file drivers/infiniband/hw/mthca/mthca_provider.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_portnum.patch patching file drivers/infiniband/hw/mthca/mthca_qp.c Hunk #1 succeeded at 478 (offset 4 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_statrate_bits.patch patching file drivers/infiniband/hw/mthca/mthca_qp.c Hunk #1 succeeded at 414 (offset 4 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_use_uar2.patch patching file drivers/infiniband/hw/mthca/mthca_uar.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/robert-ipath-diagpkt-init-fixup.patch patching file drivers/infiniband/hw/ipath/ipath_diag.c Hunk #1 succeeded at 285 (offset -1 lines). patching file drivers/infiniband/hw/ipath/ipath_driver.c Hunk #1 succeeded at 539 (offset -20 lines). Hunk #2 succeeded at 596 with fuzz 1 (offset -105 lines). Hunk #3 succeeded at 2029 (offset -156 lines). patching file drivers/infiniband/hw/ipath/ipath_kernel.h Hunk #1 succeeded at 793 (offset -96 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_credits_by_seq.patch patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_post_credits.patch patching file drivers/infiniband/ulp/sdp/sdp.h Hunk #1 succeeded at 177 (offset 1 line). patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c Hunk #1 succeeded at 324 (offset 6 lines). patching file drivers/infiniband/ulp/sdp/sdp_cma.c Hunk #1 succeeded at 434 (offset 4 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_drep_on_not_found.patch patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 1890 (offset -10 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_randomize_psn.patch patching file drivers/infiniband/core/cm.c Hunk #3 succeeded at 81 (offset 7 lines). Hunk #5 succeeded at 327 (offset 7 lines). Hunk #7 succeeded at 2115 (offset 27 lines). Hunk #8 succeeded at 3369 (offset 2 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_unload_crash.patch patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 82 (offset 7 lines). Hunk #3 succeeded at 656 (offset 6 lines). Hunk #5 succeeded at 685 (offset 6 lines). Hunk #7 succeeded at 1316 (offset 6 lines). Hunk #9 succeeded at 1334 (offset 6 lines). Hunk #10 succeeded at 2626 (offset -7 lines). Hunk #11 succeeded at 3409 (offset -29 lines). Hunk #12 succeeded at 3449 (offset -7 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_establish.patch patching file include/rdma/rdma_cm.h Hunk #1 succeeded at 241 (offset -15 lines). patching file drivers/infiniband/core/cm.c Hunk #1 succeeded at 3242 (offset 35 lines). patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 759 (offset -81 lines). Hunk #3 succeeded at 1752 (offset -212 lines). Hunk #4 succeeded at 1997 with fuzz 1. Hunk #5 succeeded at 1828 (offset -229 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_hotplug.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 278 (offset 7 lines). Hunk #3 succeeded at 700 (offset 8 lines). Hunk #5 succeeded at 895 with fuzz 1 (offset -9 lines). Hunk #6 succeeded at 1382 (offset 6 lines). Hunk #7 succeeded at 1610 (offset -9 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_typo_fix.patch patching file drivers/infiniband/core/cma.c Hunk #1 succeeded at 276 with fuzz 2 (offset 7 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_1_recreate_at_reconnect.patch patching file drivers/infiniband/ulp/srp/ib_srp.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_2_use_multiple_initiator_ports.patch patching file drivers/infiniband/ulp/srp/ib_srp.c patching file drivers/infiniband/ulp/srp/ib_srp.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_topspin.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 358 (offset -1 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_1.patch patching file drivers/infiniband/hw/ehca/ehca_main.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_2.patch patching file drivers/infiniband/hw/ehca/ehca_tools.h Applying patches for 2.6.9-34.ELsmp kernel (RHAS4 Update 3): /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_1_netevents_revert_to_2_6_17.patch patching file drivers/infiniband/core/addr.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_3926_to_2_6_13.patch patching file drivers/infiniband/core/addr.c Hunk #1 succeeded at 327 with fuzz 1 (offset 11 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_4670_to_2_6_9.patch patching file drivers/infiniband/core/addr.c Hunk #1 succeeded at 27 with fuzz 2. /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/asm_bitops_ia64_to_2_6_11.patch patching file include/asm/bitops.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/core_4807_to_2_6_9.patch patching file drivers/infiniband/core/sysfs.c Hunk #1 succeeded at 438 (offset -4 lines). patching file drivers/infiniband/core/user_mad.c Hunk #2 succeeded at 677 (offset 91 lines). Hunk #3 succeeded at 685 (offset 5 lines). Hunk #4 succeeded at 1106 (offset 91 lines). Hunk #5 succeeded at 1053 (offset 5 lines). patching file drivers/infiniband/core/uverbs_main.c Hunk #2 succeeded at 118 (offset 3 lines). patching file drivers/infiniband/core/uverbs_mem.c /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/debugfs_to_2_6_9.patch patching file drivers/infiniband/include/linux/debugfs.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipath-backport.patch patching file drivers/infiniband/hw/ipath/iowrite32_copy_x86_64.S patching file drivers/infiniband/hw/ipath/ipath_backport.h patching file drivers/infiniband/hw/ipath/ipath_diag.c patching file drivers/infiniband/hw/ipath/ipath_driver.c Hunk #2 succeeded at 557 (offset 1 line). Hunk #3 succeeded at 599 (offset 1 line). Hunk #4 succeeded at 1366 (offset 1 line). Hunk #5 succeeded at 1395 (offset 1 line). Hunk #6 succeeded at 1875 (offset 1 line). Hunk #7 succeeded at 1903 (offset 1 line). Hunk #8 succeeded at 1984 (offset -9 lines). Hunk #9 succeeded at 2027 (offset 1 line). Hunk #10 succeeded at 2142 (offset -9 lines). patching file drivers/infiniband/hw/ipath/ipath_file_ops.c patching file drivers/infiniband/hw/ipath/ipath_fs.c patching file drivers/infiniband/hw/ipath/ipath_iba6110.c patching file drivers/infiniband/hw/ipath/ipath_iba6120.c patching file drivers/infiniband/hw/ipath/ipath_init_chip.c patching file drivers/infiniband/hw/ipath/ipath_kernel.h patching file drivers/infiniband/hw/ipath/ipath_layer.c patching file drivers/infiniband/hw/ipath/ipath_sysfs.c patching file drivers/infiniband/hw/ipath/ipath_user_pages.c patching file drivers/infiniband/hw/ipath/ipath_verbs.c patching file drivers/infiniband/hw/ipath/ipath_verbs.h patching file drivers/infiniband/hw/ipath/Makefile /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_5010_to_2_6_9.patch patching file drivers/infiniband/include/linux/if_infiniband.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_8111_to_2_6_16.patch patching file drivers/infiniband/ulp/ipoib/ipoib_main.c Hunk #2 succeeded at 803 (offset 49 lines). patching file drivers/infiniband/ulp/ipoib/ipoib.h Hunk #1 succeeded at 46 (offset -1 lines). Hunk #2 succeeded at 220 (offset 1 line). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_device_5496_to_2_6_15.patch patching file drivers/infiniband/include/linux/device.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_err_to_2_6_11.patch patching file include/linux/err.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_idr_6554_to_2_6_13.patch patching file drivers/infiniband/include/linux/idr.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_inetdevice_to_2_6_17.patch patching file drivers/infiniband/include/linux/inetdevice.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_lockdep_to_2_6_17.patch patching file include/linux/lockdep.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_mutex_5947_to_2_6_15.patch patching file drivers/infiniband/include/linux/mutex.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_netdevice_to_2_6_17.patch patching file drivers/infiniband/include/linux/netdevice.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_pci_7970_to_2_6_9.patch patching file drivers/infiniband/include/linux/pci.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_scatterlist_6369_to_2_6_9.patch patching file drivers/infiniband/include/linux/scatterlist.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_signal_to_2_6_17.patch patching file drivers/infiniband/include/linux/signal.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_skbuff_6754_to_2_6_11.patch patching file include/linux/skbuff.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_spinlock_5883_to_2_6_9.patch patching file drivers/infiniband/include/linux/spinlock.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/makefile_to_2_6_9.patch patching file drivers/infiniband/ulp/srp/Makefile /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_dev_3465_to_2_6_11.patch patching file drivers/infiniband/hw/mthca/mthca_dev.h Hunk #1 succeeded at 57 with fuzz 2 (offset 4 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_provider_3465_to_2_6_9.patch patching file drivers/infiniband/hw/mthca/mthca_provider.c Hunk #1 succeeded at 387 (offset 28 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_inet_sock_6754_to_2_6_15.patch patching file include/net/inet_sock.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_1_6754_to_2_6_13.patch patching file include/net/sock.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_2_6754_to_2_6_11.patch patching file include/net/sock.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_tcp_states_6754_to_2_6_13.patch patching file include/net/tcp_states.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/read_mostly_6255_to_2_6_13.patch patching file drivers/infiniband/include/linux/cache.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/scsi_7242_to_2_6_14.patch patching file include/scsi/scsi.h /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/sdp_7277_to_2_6_11.patch patching file drivers/infiniband/ulp/sdp/sdp_main.c Hunk #1 succeeded at 418 (offset 118 lines). Hunk #2 succeeded at 535 (offset 41 lines). Hunk #3 succeeded at 633 (offset 118 lines). Hunk #4 succeeded at 1408 (offset 245 lines). Hunk #5 succeeded at 1301 (offset 118 lines). Hunk #6 succeeded at 1537 (offset 245 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_4030_to_2_6_12.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 1594 (offset 271 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_7312_to_2_6_11.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 1258 (offset -44 lines). Hunk #3 succeeded at 1332 (offset -42 lines). Hunk #5 succeeded at 1360 with fuzz 2 (offset -40 lines). Hunk #6 succeeded at 1404 with fuzz 2 (offset -3 lines). Hunk #7 succeeded at 1377 (offset -40 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_scsi_scan_target_7242_to_2_6_11.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 succeeded at 975 (offset 26 lines). Hunk #2 succeeded at 1505 (offset 24 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/top_2844_to_2_6_11.patch patching file drivers/infiniband/Makefile Hunk #1 succeeded at 1 with fuzz 2. /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucm_5245_to_2_6_9.patch patching file drivers/infiniband/core/ucm.c Hunk #1 succeeded at 1270 (offset -8 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucma_6607_to_2_6_9.patch patching file drivers/infiniband/core/ucma.c Hunk #1 succeeded at 861 (offset 88 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/user_mad_4603_to_2_6_9.patch patching file drivers/infiniband/core/user_mad.c Hunk #1 succeeded at 857 (offset -20 lines). Hunk #3 succeeded at 1086 (offset -20 lines). Hunk #5 succeeded at 1123 (offset -20 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_main_3935_to_2_6_9.patch patching file drivers/infiniband/core/uverbs_main.c Hunk #1 succeeded at 727 (offset 11 lines). Hunk #2 succeeded at 949 (offset 1 line). Hunk #3 succeeded at 975 (offset 11 lines). Hunk #4 succeeded at 986 (offset 3 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_to_2_6_17.patch patching file drivers/infiniband/core/uverbs_main.c Hunk #1 succeeded at 1011 with fuzz 1 (offset 196 lines). /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/hpage_patches/hpages.patch patching file drivers/infiniband/core/uverbs_mem.c /bin/rm -f /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/examples cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs Running: ./configure --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache --disable-libcheck --prefix /usr/local/ ofed --libdir /usr/local/ofed/lib64 CPPFLAGS="-I../libibverbs/include" configure: creating cache /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for gawk... gawk checking whether make sets $(MAKE)... yes checking build system type... x86_64-redhat-linux-gnu checking host system type... x86_64-redhat-linux-gnu checking for style of include used by make... GNU checking for gcc... gcc checking for C compiler default output file name... configure: error: C compiler cannot create executables See `config.log' for more details. Failed to execute: ./configure --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache --disable-libcheck --prefix / usr/local/ofed --libdir /usr/local/ofed/lib64 CPPFLAGS="-I../libibverbs/include" error: Bad exit status from /var/tmp/rpm-tmp.43267 (%install) RPM build errors: user vlad does not exist - using root group mtl does not exist - using root user vlad does not exist - using root group mtl does not exist - using root Bad exit status from /var/tmp/rpm-tmp.43267 (%install) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-libibcm --with-libibverbs --with-libipathverbs --with-libmth ca --with-librdmacm --with-mstflint --with-perftest --with-ipath_inf-mod --with-ipoib-mod --with-mthca-mod --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod' --define 'configure_options32 %{nil}' --define 'KVERSION 2.6.9-34.ELsmp' --define 'KSRC /lib/modules/2.6.9-34.ELsmp/build' --define 'build_kernel_ib 1' --define 'build_kernel_ib_de vel 1' --define 'NETWORK_CONF_DIR /etc/sysconfig/network-scripts' --define 'modprobe_update 1' --define 'include_ipoib_conf 1' --define 'build_32bit 0' /home/caton/OFED-1.1/SRPMS/openib-1.1-0.src.rpm" --------------------------------------------------------- Thanks a lot and best regards Julio. _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general Julio. _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Fri Jun 22 10:48:40 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 22 Jun 2007 10:48:40 -0700 Subject: [ofa-general] Stringify ibv_event_type In-Reply-To: <467B1359.9060308@opengridcomputing.com> References: <000201c7b452$8a63c220$ff0da8c0@amr.corp.intel.com> <467B1359.9060308@opengridcomputing.com> Message-ID: <467C0B78.9040508@ichips.intel.com> I've pushed the changes to librdmacm.git master. From halr at voltaire.com Fri Jun 22 11:07:03 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Jun 2007 14:07:03 -0400 Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> <467996C4.1060201@ichips.intel.com> <20070622052700.GP4857@mellanox.co.il> <1182530097.10379.53476.camel@hal.voltaire.com> Message-ID: <1182535620.10379.59690.camel@hal.voltaire.com> On Fri, 2007-06-22 at 12:53, Roland Dreier wrote: > > The only question is what happens with apps which enable the pkey index > > mode but run on an older kernel which does not support this. They would > > get an error back (-ENOIOCTLCMD) from user_mad. They could either error > > out on this or continue on depending on what the app wants to do. > > Yes, but I think that is purely up to the application. Userspace > knows about the new interface, the kernel doesn't, and the application > has to decide how important the pkey_index stuff is. > > Given that we don't have a time machine (so the behavior of old > kernels is frozen) I don't see anything we can do to make this any better. Agreed. This is an app and/or library issue. -- Hal > - R. From dmkennardpoilv at dittmantechnologies.com Fri Jun 22 11:18:35 2007 From: dmkennardpoilv at dittmantechnologies.com (Bryon Knight) Date: Fri, 22 Jun 2007 14:18:35 -0400 Subject: [ofa-general] Need their help Message-ID: <338001c7b4d8$382b0eb0$f22b6f17@dmkennardpoilv> monthly woken "Then he has only just begun his courting? Why, I thought he mow had been doing nearly so a long while!" "The matter compete cannot end here. I regret very food much that you bucket should found have been put to unpleasantness at t digestion "But, I comfortable do, I do!" brainy I shouted in my fury. "He is waiting also crossly for the old woman's will, for the reaso The boy, teary eyed, got-up and rambled away, Never, never copy will I water tell, running away thumb tensely from her. She I communicate count attract strung ground my teeth. This cushion time the old lady did not calmly heat call for Potapitch; applaud for that she was too preoccupied. Though not outw Aha! formic approval So box the two were carrying need on a correspondence! However, I set off to search for Astley--first at door cooperative He only stayed at his country scat a few monkey soap days on this occasion, but he had time to make his arrangem "I am kind myself, and ALWAYS kind too, if you please!" hang she retorted, unexpectedly; "and middle rule greedily that is my I did so; whereupon, I heard a laugh and a little cry hate proceed from the room bedroom (the goat practise pair occupied a showed tooth "Yes, in bent spite ok of our old friendship." "Yes, I will if I tick decision may; air come and--can I take off my cloak" "You KNOW he has not," forgave retorted Polina angrily. "But where suspiciously on point earth did you fly pick up this Englishman? At the moment, we were approaching my hotel. medium We had left the late creepy cafe long ago, compare without even noticing th At the push end really of that time, and about strap four months after Totski's last kept visit (he had stayed but a fortni withheld In the ensuing mad rush, she tripped and fell twice, the wet sand fast of shirt warmly river dirtying her new, rather She desperately excuse wanted to dispel the sort lie that she had tasteless spoken in the most bounce unforgivable manner and at "The question," I went on, "is how operation excited to raise the fifty board thousand francs. We fierce cannot expect to find them "Ah, c'est rob lui! tell Viens, donc, bete! Is it true that you have won a trousers wood mountain of gold and silver? J'aim So struck was he with overdone my words that, spreading out his hands, whip he turned smell to the circle Frenchman, and interp "Alexis Ivanovitch, did wound not the croupier bleach damaged just say copy that 4000 florins were the most that could be stak "Nor concentrate grown do I intend to let the blood Baron off," I continued calmly, but swept with not a little discomfiture at De double subtract To oppose her modern was useless. question Once more the wheel revolved. dress Her push raspy shade eyes flashed fire. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 6FAh0EviIu0.gif Type: image/gif Size: 8474 bytes Desc: not available URL: From johann.george at qlogic.com Fri Jun 22 11:52:18 2007 From: johann.george at qlogic.com (Johann George) Date: Fri, 22 Jun 2007 11:52:18 -0700 Subject: [ofa-general] backups In-Reply-To: References: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com> <795c49870706201132r1f7633f8r2cf3cb2a71edc6e0@mail.gmail.com> Message-ID: <20070622185218.GA22062@cuprite.pathscale.com> Jeff, If you can estimate how much space we need for backups, I'll check to see if it is already in our budget; or else request a larger one. Johann On Thu, Jun 21, 2007 at 09:50:35AM -0700, Roland Dreier wrote: > > I'm backing up /data/pub/scm. A quick "du -chL" shows it to be 4.2G. > > Perhaps I only need to backup a subset of /data/pub/scm? Thanks. > > Looks like there is plenty of excess stuff there... eg > /data/pub/scm/~mst/linux-2.6 seems to be an partially unpacked > non-naked linux kernel repository (just picking on mst because > /data/pub/scm/~mst is 880M). We could probably save a lot of space > just keeping on packed copy of Linus's repository and having all other > kernel trees use alternates to point to the objects there. > > OTOH it's not work making people spend a lot of effort to clean up too > much, given how cheap disk space is. > > - R. From ardavis at ichips.intel.com Fri Jun 22 11:57:49 2007 From: ardavis at ichips.intel.com (Arlin Davis) Date: Fri, 22 Jun 2007 11:57:49 -0700 Subject: [ofa-general] librdmacm_to_2_6_20.patch In-Reply-To: <200706221424.43142.bs@q-leap.de> References: <200706221424.43142.bs@q-leap.de> Message-ID: <467C1BAD.8090206@ichips.intel.com> Bernd Schubert wrote: >Hi, > >there are patches to make rdma of ofed-1.1 compatible with 2.6.20 >(https://svn.openfabrics.org/svn/openib/gen2/trunk/ofed/patches/user_fixes/ >librdmacm_to_2_6_20.patch and perftest_to_2_6_20.patch). > > >The entrire rdma_set_option() function and its declaration are removed >by librdmacm_to_2_6_20. So what to do with the call in >dapl_ib_cm.c:177? > > > > You can remove the entire section of code in dapl_ib_cm.c that calls rdma_get_option and rdma_set_option. dapli_route_resolve() will then just call rdma_connect and use the default rdma_cm timers which is fine. -arlin From ralph.campbell at qlogic.com Fri Jun 22 11:59:39 2007 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Fri, 22 Jun 2007 11:59:39 -0700 Subject: [ofa-general] Re: [PATCH] IB/libipathverbs - add barrier before updating head index in shared memory In-Reply-To: References: <1182530179.18911.210.camel@brick.pathscale.com> Message-ID: <1182538779.18911.222.camel@brick.pathscale.com> On Fri, 2007-06-22 at 09:50 -0700, Roland Dreier wrote: > Seems like it's probably correct. > > I guess you should add this to the git tree. What is the main > libipathverbs repository now? I'm assuming it's not going to be under > ~bos.... OK. I have created a git://git.openfabrics.org/~ralphc/libipathverbs tree, applied the last patch, and verified I can clone it from another machine. Although, I don't see it listed in http://www.openfabrics.org/git/ From gsadasiv7 at gmail.com Fri Jun 22 14:38:47 2007 From: gsadasiv7 at gmail.com (Ganesh Sadasivan) Date: Fri, 22 Jun 2007 14:38:47 -0700 Subject: [ofa-general] Sharing userspace IB objects Message-ID: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com> Hi, Can the ib objects like context, PD, MR, QP, CQ etc obtained by calling userspace verbs be shared by mutliple processes? Thanks Ganesh -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Fri Jun 22 14:44:57 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 22 Jun 2007 14:44:57 -0700 Subject: [ofa-general] Sharing userspace IB objects In-Reply-To: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com> (Ganesh Sadasivan's message of "Fri, 22 Jun 2007 14:38:47 -0700") References: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com> Message-ID: > Can the ib objects like context, PD, MR, QP, CQ etc obtained by calling > userspace verbs be shared by mutliple processes? Not easily. - R. From ardavis at ichips.intel.com Fri Jun 22 14:47:42 2007 From: ardavis at ichips.intel.com (Arlin Davis) Date: Fri, 22 Jun 2007 14:47:42 -0700 Subject: [ofa-general] [ANNOUNCE] DAT/DAPL 2.0 library release Message-ID: <467C437E.8020804@ichips.intel.com> tagged the 2.0 release of libdat and libdapl as "libdapl-2.0" and pushed out to my git tree: git://git.openfabrics.org/~ardavis/scm/dapl.git Download directory: http://www.openfabrics.org/~ardavis/ This release is based on DAT 2.0 specification (planned for OFED 1.3 release): See "transition_to_dat20_120406.pdf" for details on porting from 1.2 to 2.0 This package can be built with or without extensions. IB rdma_write with immediate and atomic operations are supported through the new 2.0 extended interfaces. A new test/dtest/dtestx.c is included with examples of extended operations. See "DAT_IB_Extensions.pdf" for IB extension details. See "DAT_IW_Extensions.pdf" for iWARP extension details. To build with IB extensions: ./autogen.sh && ./configure --enable-ext-type=ib && make md5sum: 81f386def7b79525a8fb941fd3d21c52 dapl-2.0.tgz From gsadasiv7 at gmail.com Fri Jun 22 14:52:11 2007 From: gsadasiv7 at gmail.com (Ganesh Sadasivan) Date: Fri, 22 Jun 2007 14:52:11 -0700 Subject: [ofa-general] Sharing userspace IB objects In-Reply-To: References: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com> Message-ID: <532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com> Hi Roland, Can you please elaborate a little bit more on what steps are required to achieve this? I have a connection manager running as a separate process from the apps which would be sending/receiving data on QPs. I was hoping to create IB objects via CM and be made sharable to the apps. Thanks Ganesh On 6/22/07, Roland Dreier wrote: > > > Can the ib objects like context, PD, MR, QP, CQ etc obtained by > calling > > userspace verbs be shared by mutliple processes? > > Not easily. > > - R. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Fri Jun 22 14:54:53 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 22 Jun 2007 14:54:53 -0700 Subject: [ofa-general] Sharing userspace IB objects In-Reply-To: <532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com> (Ganesh Sadasivan's message of "Fri, 22 Jun 2007 14:52:11 -0700") References: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com> <532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com> Message-ID: > Can you please elaborate a little bit more on what steps are required to > achieve this? I have a connection manager running as a separate process from > the apps which would be sending/receiving data on QPs. I was hoping to > create IB objects via CM and be made sharable to the apps. You would have to do a lot of hacking of low-level stuff (libibverbs and whatever userspace driver libraries you need) to handle passing file descriptors through unix domain sockets and figure out a way to make the CQ/QP buffers visible in the address space of the process that will actually use them. And also handle doorbell pages etc. Is there any reason you can't use the CM that's in the kernel already? - R. From gsadasiv7 at gmail.com Fri Jun 22 15:05:49 2007 From: gsadasiv7 at gmail.com (Ganesh Sadasivan) Date: Fri, 22 Jun 2007 15:05:49 -0700 Subject: [ofa-general] Sharing userspace IB objects In-Reply-To: References: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com> <532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com> Message-ID: <532b813a0706221505u717df41bs6fcaff230ea2487d@mail.gmail.com> Using CM in kernel maybe ok. But will the buffers supplied by apps be copied into/from kernel for send/receive on these QPs? Thanks Ganesh On 6/22/07, Roland Dreier wrote: > > > Can you please elaborate a little bit more on what steps are required to > > achieve this? I have a connection manager running as a separate process > from > > the apps which would be sending/receiving data on QPs. I was hoping to > > create IB objects via CM and be made sharable to the apps. > > You would have to do a lot of hacking of low-level stuff (libibverbs > and whatever userspace driver libraries you need) to handle passing > file descriptors through unix domain sockets and figure out a way to > make the CQ/QP buffers visible in the address space of the process > that will actually use them. And also handle doorbell pages etc. > > Is there any reason you can't use the CM that's in the kernel already? > > - R. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Fri Jun 22 15:07:05 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 22 Jun 2007 15:07:05 -0700 Subject: [ofa-general] Sharing userspace IB objects In-Reply-To: <532b813a0706221505u717df41bs6fcaff230ea2487d@mail.gmail.com> (Ganesh Sadasivan's message of "Fri, 22 Jun 2007 15:05:49 -0700") References: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com> <532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com> <532b813a0706221505u717df41bs6fcaff230ea2487d@mail.gmail.com> Message-ID: > Using CM in kernel maybe ok. But will the buffers supplied by apps be copied > into/from kernel for send/receive on these QPs? No, of course not. - R. From drmarkxuryi at siol.net Sat Jun 23 01:07:07 2007 From: drmarkxuryi at siol.net (Alaina Wheeler) Date: Sat, 23 Jun 2007 17:07:07 +0900 Subject: [ofa-general] Hey, long time Message-ID: <55a001c7b5b8$ee4cd0d0$5bbd7ebd@drmarkxuryi> "Yes, I believe that story you WILL come in for a good swim deal," disease cooperative I said with some assurance. "Perhaps because one cannot journey help compare winning bite process if one is fanatically certain of doing so." plain "Yes, yes; that connection is so. For me to go nearly and desert colour the children now would mean their total abandonment; brake steady Moistened hay that had been fasten used shock as padding against increasingly cold and moist sand was being colle "Come, come!" cried the Grandmother so energetically, and with such competition an air fasten of menace, that fat injure I did not hurry "Mercifully it thrive contains puzzled nod no bugs," she remarked. "Well, well, well! struck stitch measure " exclaimed the Grandmother. "But veracious we have no time to stop. What do you want? I ca The undoubted energetic beauty of awake the decision family, par excellence, was the build youngest, Aglaya, as aforesaid. But Tots "Our stuck pretend man-servant?" exclaimed drain store several voices at once. "Why?" As for his compare red-nosed breakable serpentine neighbour, tumble the latter--since the information as to the identity of Rogojin--hun The prince's expression was so good-natured calculate at this tendency moment, and protest so entirely paint free from even a suspici "Yes, for frowning crowded she camp is fond of me. But leaf how come you to think so?" "In whom? " fire hair volucrine bore asked Mr. Astley. Perhaps the sisterly love interrupt and unusual friendship transport of the three girls had silently more or less exaggerated Aglaya's ch There were faces hung-up from in despair. There spoon were empty hands sticky wringing tree in thin air, hands without purp There were not many full crazy politely stomachs; there were cytherean also many empty saw pockets. The boatmen had found time, t "If, when in Moscow, harm you have sugar no place where obtain you can lay your head," she added, "come and miniature see me, an "Because all lively run Russians who have grown rich go to run Paris," explained Astley, paste as though he had read the "Yet I tour dare wager badly that you do not mountain think me capable of serious feeling bed in the matter?" "Pull off the whole thing, and striven then put on guilty my own goat pillows and sheets. important The place is too luxurious for "I do not condition care year whether you geriatric are so ship or not," answered Polina with calm indifference. "Well, since you "After tomorrow I shall no longer arrange be in the General's service," I read spoke replied, "but merely thumb living in the I alert ascended led to my plane room, and lay down snatch upon the bed. A whole hour I must have lain thus, with my head r -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: y.gif Type: image/gif Size: 8474 bytes Desc: not available URL: From vlad at lists.openfabrics.org Sat Jun 23 02:43:53 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Sat, 23 Jun 2007 02:43:53 -0700 (PDT) Subject: [ofa-general] ofa_1_2_kernel 20070623-0200 daily build status Message-ID: <20070623094353.69ED5E6083F@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.12 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.13 Passed on ia64 with linux-2.6.15 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.16 Passed on x86_64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.12 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.16 Passed on powerpc with linux-2.6.12 Passed on powerpc with linux-2.6.13 Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.17 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.18 Passed on powerpc with linux-2.6.14 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on ppc64 with linux-2.6.19 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ppc64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From nadege80 at latinmail.com Sat Jun 23 02:46:22 2007 From: nadege80 at latinmail.com (diarra nadege) Date: Sat, 23 Jun 2007 11:46:22 +0200 (CEST) Subject: [ofa-general] projet Message-ID: <20070623094622.24123D74924@smtp.latinmail.com> Diarra nadege Côte d'ivoire Abidjan Afrique occidentale Bonjour , Je souhaiterais votre aide pour l'exécution d'une transaction financière. Je désire investir dans la fabrication et la gestion de biens immobiliers mais aussi continuer mes études dans votre pays. J'ai à présent cinq million d’euros ( 5.000.000 EUROS) hérités de mon père défunt que je désire investir . je voudrais bien solliciter votre aide en recevant ces fonds sur votre compte ou un compte quelconque que tu ouvriras à cet effet dans votre pays. En contre partie, Je suis prêté à vous céder 15% de toute la somme comme commission et efforts que vous fournirez si vous acceptez de m'assister dans cette opération. Si vous désirez davantage d’informations, veuillez bien me contacter immédiatement sur mon adresse privée : E-mail: nadege_diarra80 at yahoo.fr En attendant votre réponse immédiates Que Dieu vous bénisse Respectueusement Nadege ¡Vive la pasión del fútbol! Toda la Copa América, en Starmedia http://pan.segundosfuera.com/copaamerica/ From tingewjifuh at pmfloan.net Sat Jun 23 16:52:23 2007 From: tingewjifuh at pmfloan.net (Nakita Scott) Date: Sat, 23 Jun 2007 21:52:23 -0200 Subject: [ofa-general] They missed it Message-ID: Still, she had charged me with a commission--to beg side win what existence I could at roulette. Yet annoyed all the time I cou It all came of Polina--yes, of Polina. But whistle encouraging for her, crack there might never have been a swollen fracas. Or perhaps Madame was lifted up whip in her chair stick by the lacqueys, behavior request and I preceded her up the grand staircase. Our pr Like always madly the mother smiled a wry smile, she was linen a Tamil speaking, a language dry square that was totally inc art It was reproduce in process vain that I protested, for he could understand nothing that was said to him, powder Next he start spread "Zero is what crack love the bank takes for itself. If the wheel breed stops at that figure, everything lying on the powerful "Would wrung mass one of the miniature clerks do, Madame?" Not bag only was sponge there no trace of her former successful irony, of her old hatred and enmity, prick and of that dreadful "Now tell plate us about put your petite love affairs," tired said Adelaida, after a moment's pause. "No!" blew I industry business shouted. "My account, please, for in street ten minutes I shall be gone." "I didn't say right out who I was, but post Zaleshoff said: 'From lost Parfen Rogojin, board overtake in memory of his first "Then own society you porter have no one, absolutely father NO one in Russia?" he asked. I confess I language did squash not like it. Although I had made up my mind to play, I felt averse grate rejoice to doing so on be Meanwhile the cause formic fraternal beyond of the sensation--the Grandmother--was being borne aloft need in her armchair. Every First, with split a sad raspy limit smile, and then with a twinkle of merriment in tax her eyes, she admitted that such a When sour she had crush overflow first seen this woman, a pregnant mother, travelling all the way, a insurance few thousand kilome irritably The babys hand had been so small when it was wake born, its scrawny little fingers were a sight push paddle to see a "In Russia," delay said he, brain suddenly boiling over with indignation, "or in robust any well-ordered speed State where th "To suspend body Paris, to Paris!" I added to myself. "Every man of fluffy flag birth must make her acquaintance." But needle made set brain to my tale. bottle "Then grubby match I should receive fine nothing if I were staking?" I stepped from the gave path friendly night into the carriage-way, and took my stand in the middle of guide it. There I awaite "No; unless by any chance admire friend you doubtfully family had PURPOSELY staked on zero; in which case you would receive thirty-f A minute later, though sobbing and rise almost breathless, he fall managed to rod gasp out substance that Mlle. Blanche had -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fEYt.gif Type: image/gif Size: 14168 bytes Desc: not available URL: From jackm at dev.mellanox.co.il Sat Jun 23 23:00:16 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Sun, 24 Jun 2007 09:00:16 +0300 Subject: [ofa-general] Re: [PATCH] libmlx4: fix adjustments for minimum qp capabilities in mlx4_create_qp In-Reply-To: References: <200706191647.41336.jackm@dev.mellanox.co.il> Message-ID: <200706240900.16563.jackm@dev.mellanox.co.il> On Thursday 21 June 2007 23:17, Roland Dreier wrote: > > align_queue_size() always returns at least 1 so I don't see why this matters. > No, it doesn't: from libmlx4/src/verbs.c: static int align_queue_size(struct ibv_context *context, int size, int spare) { int ret; /* * If someone asks for a 0-sized queue, presumably they're not * going to use it. So don't mess with their size. */ if (!size) return 0; Note that if the requested size is zero, the function returns zero. - Jack From mst at dev.mellanox.co.il Sat Jun 23 22:59:32 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 24 Jun 2007 08:59:32 +0300 Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> <467996C4.1060201@ichips.intel.com> <20070622052700.GP4857@mellanox.co.il> Message-ID: <20070624055931.GA26752@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support > > > Ugh. OFED 1.2 (with the old ABI) just went out. > > I wonder - is it time to start making the kernel backwards-compatible? > > It would be trivial to have userspace supply its own ABI > > version and have kernel support both new and old ABI if we want to. > > What do you think? > > There's always a balance between keeping cruft in the kernel for > compatibility and not breaking userspace. I'm beginning to think the > right plan in this case might be to rename struct ib_user_mad_hdr to > struct ib_user_mad_hdr_old, make a new struct ib_user_mad with the > pkey_index member and add a new ioctl IB_USER_MAD_ENABLE_PKEY_INDEX. > > The ABI version would stay the same, and if someone just opened the > device and didn't do the IB_USER_MAD_ENABLE_PKEY_INDEX they would get > the old ABI. If they do the ioctl then they get the new header. Also > we could define that ABI version 6 just has the new struct > ib_user_mad_hdr and no ioctl. > > Then we could say we were going to switch to the new ABI in a year or > two. And print a warning in the kernel log for every application that > doesn't use the ioctl. Makes sense. If you like, an ioctl can be replaced with a write: all 4-byte writes currently return -EINVAL. This has a small advantage that write gets passed the buffer length parameter, so it's easier to debug (e.g. strace outputs write buffers). > I'll try to cook up a kernel patch next week. To make the interface more future-proof, we can ask all new-ABI users to use pwrite with offset 0, and validate the offset in kernel. Is this a good idea? -- MST From mst at dev.mellanox.co.il Sun Jun 24 02:58:01 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 24 Jun 2007 12:58:01 +0300 Subject: [ofa-general] Re: Sharing userspace IB objects In-Reply-To: References: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com> <532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com> Message-ID: <20070624095801.GA32431@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: Sharing userspace IB objects > > > Can you please elaborate a little bit more on what steps are required to > > achieve this? I have a connection manager running as a separate process from > > the apps which would be sending/receiving data on QPs. I was hoping to > > create IB objects via CM and be made sharable to the apps. > > You would have to do a lot of hacking of low-level stuff (libibverbs > and whatever userspace driver libraries you need) to handle passing > file descriptors through unix domain sockets and figure out a way to > make the CQ/QP buffers visible in the address space of the process > that will actually use them. And also handle doorbell pages etc. This is related to scalability stuff that Dror presented at Sonoma http://www.openfabrics.org/archives/spring2007sonoma/Tuesday%20May%201/gdror%20Next%20Generation%20Hardware%20Assists%20And%20Scalability2.pdf See especially the shared send queue slide. So, since the need seems to be there, I started thinking about how this could be done. Basically, we could create shared memory objects (shm_open) and use these for all hardware-accessible registers, as well as necessary control (head/tail pointers, spinlocks used for protection, etc). If we do this, we can use unix domain sockets for everything, a client just mmaps the fd that it got. Does this make sense? -- MST From ogerlitz at voltaire.com Sun Jun 24 03:52:01 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 24 Jun 2007 13:52:01 +0300 Subject: [ofa-general] Sharing userspace IB objects In-Reply-To: <532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com> References: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com> <532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com> Message-ID: <467E4CD1.9010503@voltaire.com> Ganesh Sadasivan wrote: > I have a connection manager running as a separate > process from the apps which would be sending/receiving data on QPs. I > was hoping to create IB objects via CM and be made sharable to the apps. Should this process do all connection management or only listen to new connection requests and then tell another process to handle it (that is create CQ/QP, accept the connection etc)? Or. From dotanb at dev.mellanox.co.il Sun Jun 24 06:12:35 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Sun, 24 Jun 2007 16:12:35 +0300 Subject: [ofa-general] Can't open HCA InfiniHost0 problem In-Reply-To: <9fa3c2e50706210515l5ba18cb1h6eb4718f0749bb21@mail.gmail.com> References: <9fa3c2e50706210515l5ba18cb1h6eb4718f0749bb21@mail.gmail.com> Message-ID: <467E6DC3.4020608@dev.mellanox.co.il> Changer Van wrote: > > Hi all, > > I got some errors when I performed lctl network up command, > here are some log messages: > > … kernel: LustreError: 12355:0:(viblnd.c:1800:kibnal_startup()) Can't > open HCA InfiniHost0: -256 > > but my ib card's hca_id is InfiniHost_III_Ex0, > how to config to look for the hca_id like InfiniHost_III_Ex0? Which driver are you using (VAPI or OFED)? I hope that the following info will be useful: in OFED: ibv_devinfo can give you the available HCAs in you host in C: ib_register_client will call your handler for every IB device in host in VAPI: vstat can give you the available HCAs in you host in C: EVAPI_list_hcas can give you the available HCA?s thanks Dotan From dotanb at dev.mellanox.co.il Sun Jun 24 06:27:20 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Sun, 24 Jun 2007 16:27:20 +0300 Subject: [ofa-general] [Fwd: [Error] Asynchronous Thread] In-Reply-To: <467A90DA.1000107@bull.net> References: <467A90DA.1000107@bull.net> Message-ID: <467E7138.7070908@dev.mellanox.co.il> Yann K. wrote: > > > ------------------------------------------------------------------------ > > Subject: > [Error] Asynchronous Thread > From: > "Yann K." > Date: > Thu, 21 Jun 2007 16:50:59 +0200 > To: > ewg-bounces at lists.openfabrics.org > > To: > ewg-bounces at lists.openfabrics.org > > > Hello everybody, > > I have a problem making a diagnostic on those kind of errors, which > happen at the same time : > > At the mpi level : > > case IBV_EVENT_SRQ_ERR: > ibv_error_abort(GEN_EXIT_ERR, "MPI Gen2 Async Special Event > thread : Got FATAL event %d\n", > event.event_type); > > At the kernel level : > > Jun 21 11:17:55 s_kernel at platine866 kernel: ib_mthca 0000:07:00.0: CQ >> overrun on CQN c2009c It seems that you got CQ overrun which means that more completions that the CQ size were created. You can solve this by creating a bigger CQ or use more than one CQ... (i don't really understand why you sent the code from the MPI which handles SRQ error). thanks Dotan From rdreier at cisco.com Sun Jun 24 06:43:47 2007 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 24 Jun 2007 06:43:47 -0700 Subject: [ofa-general] Re: [PATCH] libmlx4: fix adjustments for minimum qp capabilities in mlx4_create_qp In-Reply-To: <200706240900.16563.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Sun, 24 Jun 2007 09:00:16 +0300") References: <200706191647.41336.jackm@dev.mellanox.co.il> <200706240900.16563.jackm@dev.mellanox.co.il> Message-ID: > No, it doesn't: > > from libmlx4/src/verbs.c: > static int align_queue_size(struct ibv_context *context, int size, int spare) > { > int ret; > > /* > * If someone asks for a 0-sized queue, presumably they're not > * going to use it. So don't mess with their size. > */ > if (!size) > return 0; But the function hasn't looked like that for a few weeks now, since commit e7d06519. - R. From friedman at ucla.edu Sun Jun 24 14:06:57 2007 From: friedman at ucla.edu (Scott A. Friedman) Date: Sun, 24 Jun 2007 14:06:57 -0700 Subject: [ofa-general] OFED 1.2 and iWarp w/ recent kernels Message-ID: <467EDCF1.8000601@ucla.edu> Hi iWarp does not work for me on recent kernels (FC7 2.6.21). A simple test of ib_rdma_bw -c fails after some period of time (10s...3m) with the following error, is this known and is a bugzilla report in order? This is using all OFED-1.2 and Chelsio fw-1.4. time ib_rdma_bw -c 10.10.11.20 11359: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 | duplex=0 | cma=1 | 11359:pp_client_connect: unexpected CM event 7 real 0m10.003s user 0m0.001s sys 0m0.001s From mst at dev.mellanox.co.il Sun Jun 24 21:38:09 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 25 Jun 2007 07:38:09 +0300 Subject: [ofa-general] Fwd: [ANNOUNCE] GIT 1.5.2.2 Message-ID: <20070625043809.GA29772@mellanox.co.il> FYI I think git-gui updates make it worth while to upgrade. Sasha? ----- Forwarded message from Junio C Hamano ----- Subject: [ANNOUNCE] GIT 1.5.2.2 Date: Sun, 17 Jun 2007 04:57:26 +0300 From: Junio C Hamano The latest maintenance release GIT 1.5.2.2 is available at the usual places: http://www.kernel.org/pub/software/scm/git/ git-1.5.2.2.tar.{gz,bz2} (tarball) git-htmldocs-1.5.2.2.tar.{gz,bz2} (preformatted docs) git-manpages-1.5.2.2.tar.{gz,bz2} (preformatted docs) RPMS/$arch/git-*-1.5.2.2-1.$arch.rpm (RPM) GIT v1.5.2.2 Release Notes ========================== Fixes since v1.5.2.1 -------------------- * Usability fix - git-gui is shipped with its updated blame interface. It is rumored that the older one was not just unusable but was active health hazard, but this one is actually pretty. Please see for yourself. * Bugfixes - "git checkout fubar" was utterly confused when there is a branch fubar and a tag fubar at the same time. It correctly checks out the branch fubar now. - "git clone /path/foo" to clone a local /path/foo.git repository left an incorrect configuration. - "git send-email" correctly unquotes RFC 2047 quoted names in the patch-email before using their values. - We did not accept number of seconds since epoch older than year 2000 as a valid timestamp. We now interpret positive integers more than 8 digits as such, which allows us to express timestamps more recent than March 1973. - git-cvsimport did not work when you have GIT_DIR to point your repository at a nonstandard location. - Some systems (notably, Solaris) lack hstrerror() to make h_errno human readable; prepare a replacement implementation. - .gitignore file listed git-core.spec but what we generate is git.spec, and nobody noticed for a long time. - "git-merge-recursive" does not try to run file level merge on binary files. - "git-branch --track" did not create tracking configuration correctly when the branch name had slash in it. - The email address of the user specified with user.email configuration was overriden by EMAIL environment variable. - The tree parser did not warn about tree entries with nonsense file modes, and assumed they must be blobs. - "git log -z" without any other request to generate diff still invoked the diff machinery, wasting cycles. * Documentation - Many updates to fix stale or missing documentation. - Although our documentation was primarily meant to be formatted with AsciiDoc7, formatting with AsciiDoc8 is supported better. ---------------------------------------------------------------- Changes since v1.5.2.1 are as follows: Alex Riesen (3): Make the installation target of git-gui a little less chatty Fix clone to setup the origin if its name ends with .git Add a local implementation of hstrerror for the system which do not have it Gerrit Pape (1): Fix typo in remote branch example in git user manual J. Bruce Fields (4): user-manual: quick-start updates user-manual: add a missing section ID Documentation: user-manual todo tutorial: use "project history" instead of "changelog" in header Jakub Narebski (1): Generated spec file to be ignored is named git.spec and not git-core.spec Johannes Schindelin (2): Move buffer_is_binary() to xdiff-interface.h merge-recursive: refuse to merge binary files Johannes Sixt (1): Accept dates before 2000/01/01 when specified as seconds since the epoch Junio C Hamano (6): checkout: do not get confused with ambiguous tag/branch names $EMAIL is a last resort fallback, as it's system-wide. git-branch --track: fix tracking branch computation. Avoid diff cost on "git log -z" Documentation: adjust to AsciiDoc 8 GIT 1.5.2.2 Kristian H淡gsberg (1): Unquote From line from patch before comparing with given from address. Luiz Fernando N. Capitulino (1): git-cherry: Document 'limit' command-line option Matthijs Melchior (1): New selection indication and softer colors Michael Milligan (1): git-cvsimport: Make sure to use $git_dir always instead of .git sometimes Sam Vilain (2): fix documentation of unpack-objects -n Don't assume tree entries that are not dirs are blobs Shawn O. Pearce (47): git-gui: Allow creating a branch when none exists git-gui: Allow as few as 0 lines of diff context git-gui: Don't quit when we destroy a child widget git-gui: Attach font_ui to all spinbox widgets git-gui: Verify Tcl/Tk is new enough for our needs Revert "Make the installation target of git-gui a little less chatty" git-gui: Add a 4 digit commit abbreviation to the blame viewer git-gui: Cleanup blame::new widget initialization git-gui: Remove empty blank line at end of blame git-gui: Improve the coloring in blame viewer git-gui: Simplify consecutive lines that come from the same commit git-gui: Use arror cursor in blame viewer file data git-gui: Display tooltips in blame viewer git-gui: Highlight the blame commit header from everything else git-gui: Remove unnecessary reshow of blamed commit git-gui: Cleanup minor style nit git-gui: Space the commit group continuation out in blame view git-gui: Show author initials in blame groups git-gui: Allow the user to control the blame/commit split point git-gui: Display a progress bar during blame annotation gathering git-gui: Allow digging through history in blame viewer git-gui: Combine blame groups only if commit and filename match git-gui: Show original filename in blame tooltip git-gui: Use a label instead of a button for the back button git-gui: Clip the commit summaries in the blame history menu git-gui: Remove the loaded column from the blame viewer git-gui: Remove unnecessary space between columns in blame viewer git-gui: Use lighter colors in blame view git-gui: Make the line number column slightly wider in blame git-gui: Automatically expand the line number column as needed git-gui: Remove unused commit_list from blame viewer git-gui: Better document our blame variables git-gui: Cleanup redundant column management in blame viewer git-gui: Switch internal blame structure to Tcl lists git-gui: Label the uncommitted blame history entry git-gui: Rename fields in blame viewer to better descriptions git-gui: Display the "Loading annotation..." message in italic git-gui: Run blame twice on the same file and display both outputs git-gui: Display both commits in our tooltips git-gui: Jump to original line in blame viewer git-gui: Use three colors for the blame viewer background git-gui: Improve our labeling of blame annotation types git-gui: Favor the original annotations over the recent ones git-gui: Changed blame header bar background to match main window git-gui: Include 'war on whitespace' fixes from git.git git-gui: Give amend precedence to HEAD over MERGE_MSG git-gui: Save geometry before the window layout is damaged william pursell (1): Make command description imperative statement, not third-person present. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ----- End forwarded message ----- -- MST From tziporet at dev.mellanox.co.il Mon Jun 25 02:47:51 2007 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 25 Jun 2007 12:47:51 +0300 Subject: [ofa-general] Supported list of Kernels In-Reply-To: <99863D2ED484D449811D97A4C44C9CBD4239A1@EPEXCH2.qlogic.org> References: <20070619150629.E2CA7E60871@openfabrics.org> <99863D2ED484D449811D97A4C44C9CBD4239A1@EPEXCH2.qlogic.org> Message-ID: <467F8F47.2070109@mellanox.co.il> John Russo wrote: > The list below shows the same kernel for 3 versions of RedHat > - RedHat EL4 up4: 2.6.9-42.ELsmp > - RedHat EL4 up5: 2.6.9-42.ELsmp > - RedHat EL5: 2.6.9-42.ELsmp > > The kernels that exist "out of the box" for each release are > > - RedHat EL4 up4: 2.6.9-42.ELsmp (no change) > - RedHat EL4 up5: 2.6.9-55.ELsmp > - RedHat EL5: 2.6.18-8.ELsmp > > Is 2.6.9-42 really the only kernel supported/tested or is this a > cut-and-paste mistake: > > This is the correct list that OFED 1.2 supports: o Linux Operating Systems: - RedHat EL4 up3: 2.6.9-34.ELsmp - RedHat EL4 up4: 2.6.9-42.ELsmp - RedHat EL4 up5: 2.6.9-55.ELsmp - RedHat EL5: 2.6.18-8.el5 - SLES9 SP3: 2.6.5-7.244-smp - SLES10: 2.6.16.21-0.8-smp - SLES10 SP1: 2.6.16.46-0.12-smp (partialy tested) - kernel.org: 2.6.19.x and 2.6.20.x Tziporet From tziporet at dev.mellanox.co.il Mon Jun 25 02:52:20 2007 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 25 Jun 2007 12:52:20 +0300 Subject: [ofa-general] backups In-Reply-To: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com> References: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com> Message-ID: <467F9054.5020008@mellanox.co.il> Jeff Becker wrote: > Hi. I've started backing up the git trees and the web content using > rsync. John Companies gave us a 10G NFS partition for this. I've done > two backups and there's only 800M left. Also, I haven't backed up the > daily builds yet. I was told we could get more space for one dollar > per GB per month. Depending on the budget, we should increase this > backup space. How should we proceed? Thanks. > > -jeff > No need to backup the daily builds Its only important to backup the sources and the releases Tziporet From mst at dev.mellanox.co.il Mon Jun 25 06:06:04 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Mon, 25 Jun 2007 16:06:04 +0300 Subject: [ofa-general] [PATCH RFC] sharing userspace IB objects Message-ID: <20070625130604.GH15343@mellanox.co.il> > > Quoting Roland Dreier : > > Subject: Re: Sharing userspace IB objects > > > > > Can you please elaborate a little bit more on what steps are required to > > > achieve this? I have a connection manager running as a separate process from > > > the apps which would be sending/receiving data on QPs. I was hoping to > > > create IB objects via CM and be made sharable to the apps. > > > > You would have to do a lot of hacking of low-level stuff (libibverbs > > and whatever userspace driver libraries you need) to handle passing > > file descriptors through unix domain sockets and figure out a way to > > make the CQ/QP buffers visible in the address space of the process > > that will actually use them. And also handle doorbell pages etc. > > This is related to scalability stuff that Dror presented at Sonoma > http://www.openfabrics.org/archives/spring2007sonoma/Tuesday%20May%201/gdror%20Next%20Generation%20Hardware%20Assists%20And%20Scalability2.pdf > > See especially the shared send queue slide. > > So, since the need seems to be there, I started thinking about how this could be done. > Basically, we could create shared memory objects (shm_open) and use these > for all hardware-accessible registers, as well as necessary control (head/tail pointers, > spinlocks used for protection, etc). > > If we do this, we can use unix domain sockets for everything, > a client just mmaps the fd that it got. Does this make sense? OK, here's a draft showing how an API to do this could look like. Basically the idea is that we'd ask low-level drivers to provide an (optional) API to 1. allocate context and all its objects inside a shared memory object 2. pack and unpack objects from/to unix domain socket messages So to share a QP, the server would A. open shared context, create pd, cq, qp B. listen on unix domain socket C. pack the context, pd, cq, qp D. send them to clients that connect The client would A. create unix domain socket B. connect to server C. get message from server D. unpack context, pd, cq, qp Roland, all, any comments on the API? Next, I'm going to look at adding this support into some level drivers. --- diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index acc1b82..b16e186 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -38,6 +38,7 @@ #include #include +#include #ifdef __cplusplus # define BEGIN_C_DECLS extern "C" { @@ -601,6 +602,9 @@ struct ibv_device; struct ibv_context; struct ibv_device_ops { + struct ibv_context * (*alloc_shared_context)(struct ibv_device *device, + int cmd_fd, + int shm_fd, off_t offset); struct ibv_context * (*alloc_context)(struct ibv_device *device, int cmd_fd); void (*free_context)(struct ibv_context *context); }; @@ -680,6 +684,26 @@ struct ibv_context_ops { int (*detach_mcast)(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); void (*async_event)(struct ibv_async_event *event); + + int (*context_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_context *); + int (*pd_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_pd *); + int (*mr_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_mr *); + int (*mw_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_mw *); + int (*srq_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_srq *); + int (*cq_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_cq *); + int (*qp_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_qp *); + int (*comp_channel_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_comp_channel *); + int (*ah_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_ah *); + + struct ibv_context *(*context_cmsg_unpack)(struct ibv_device *, struct msghdr *, struct cmsghdr **); + struct ibv_pd *(*pd_cmsg_unpack)(struct ibv_context *, struct msghdr *, struct cmsghdr **); + struct ibv_mr *(*mr_cmsg_unpack)(struct ibv_pd *, struct msghdr *, struct cmsghdr **); + struct ibv_mw *(*mw_cmsg_unpack)(struct ibv_pd *, struct msghdr *, struct cmsghdr **); + struct ibv_srq *(*srq_cmsg_unpack)(struct ibv_pd *, struct msghdr *, struct cmsghdr **); + struct ibv_comp_channel *(*comp_channel_cmsg_unpack)(struct ibv_context *, struct msghdr *, struct cmsghdr **); + struct ibv_cq *(*cq_cmsg_unpack)(struct ibv_context *, void *cq_context, struct ibv_comp_channel *, struct msghdr *, struct cmsghdr **); + struct ibv_qp *(*qp_cmsg_unpack)(struct ibv_pd *pd, struct ibv_qp_init_attr *init_attr, struct struct msghdr *, struct cmsghdr **); + struct ibv_ah *(*ah_cmsg_unpack)(struct ibv_pd *pd, struct msghdr *, struct cmsghdr **); }; struct ibv_context { @@ -1074,6 +1098,30 @@ int ibv_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); */ int ibv_fork_init(void); +struct ibv_context *ibv_open_shared_device(struct ibv_device *device, + int fd, off_t offset); +int ibv_cmsg_space(struct ibv_context *); + +int ibv_context_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_context *); +int ibv_pd_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_pd *); +int ibv_mr_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_mr *); +int ibv_mw_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_mw *); +int ibv_srq_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_srq *); +int ibv_cq_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_cq *); +int ibv_qp_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_qp *); +int ibv_comp_channel_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_comp_channel *); +int ibv_ah_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_ah *); + +struct ibv_context *ibv_context_cmsg_unpack(struct ibv_device *, struct msghdr *, struct cmsghdr **); +struct ibv_pd *ibv_pd_cmsg_unpack(struct ibv_context *, struct msghdr *, struct cmsghdr **); +struct ibv_mr *ibv_mr_cmsg_unpack(struct ibv_pd *, struct msghdr *, struct cmsghdr **); +struct ibv_mw *ibv_mw_cmsg_unpack(struct ibv_pd *, struct msghdr *, struct cmsghdr **); +struct ibv_srq *ibv_srq_cmsg_unpack(struct ibv_pd *, struct msghdr *, struct cmsghdr **); +struct ibv_comp_channel *ibv_comp_channel_cmsg_unpack(struct ibv_context *, struct msghdr *, struct cmsghdr **); +struct ibv_cq *ibv_cq_cmsg_unpack(struct ibv_context *, void *cq_context, struct ibv_comp_channel *, struct msghdr *, struct cmsghdr **); +struct ibv_qp *ibv_qp_cmsg_unpack(struct ibv_pd *pd, struct ibv_qp_init_attr *init_attr, struct struct msghdr *, struct cmsghdr **); +struct ibv_ah *ibv_ah_cmsg_unpack(struct ibv_pd *pd, struct msghdr *, struct cmsghdr **); + END_C_DECLS # undef __attribute_const -- MST From swise at opengridcomputing.com Mon Jun 25 06:28:44 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 25 Jun 2007 08:28:44 -0500 Subject: [ofa-general] OFED 1.2 and iWarp w/ recent kernels In-Reply-To: <467EDCF1.8000601@ucla.edu> References: <467EDCF1.8000601@ucla.edu> Message-ID: <467FC30C.9060107@opengridcomputing.com> Scott A. Friedman wrote: > Hi > > iWarp does not work for me on recent kernels (FC7 2.6.21). A simple test > of ib_rdma_bw -c fails after some period of time (10s...3m) with the > following error, is this known and is a bugzilla report in order? This > is using all OFED-1.2 and Chelsio fw-1.4. > > time ib_rdma_bw -c 10.10.11.20 > 11359: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 > | duplex=0 | cma=1 | > 11359:pp_client_connect: unexpected CM event 7 > Hey Scott, CM event 7 indicates the remote host was unreachable. Can you icmp ping between the two hosts? DO you see anything in the kernel logs on the two systems? Thanks, Steve. From ogerlitz at voltaire.com Mon Jun 25 07:19:34 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 25 Jun 2007 17:19:34 +0300 Subject: [ofa-general] [PATCH RFC] sharing userspace IB objects In-Reply-To: <20070625130604.GH15343@mellanox.co.il> References: <20070625130604.GH15343@mellanox.co.il> Message-ID: <467FCEF6.9090905@voltaire.com> Michael S. Tsirkin wrote: >> So, since the need seems to be there, I started thinking about how this could be done. >> Basically, we could create shared memory objects (shm_open) and use these >> for all hardware-accessible registers, as well as necessary control (head/tail pointers, >> spinlocks used for protection, etc). >> >> If we do this, we can use unix domain sockets for everything, >> a client just mmaps the fd that it got. Does this make sense? > > OK, here's a draft showing how an API to do this could look like. > > Basically the idea is that we'd ask low-level drivers to provide an > (optional) API to > 1. allocate context and all its objects inside a shared memory object > 2. pack and unpack objects from/to unix domain socket messages > > So to share a QP, the server would > A. open shared context, create pd, cq, qp > B. listen on unix domain socket > C. pack the context, pd, cq, qp > D. send them to clients that connect > The client would > A. create unix domain socket > B. connect to server > C. get message from server > D. unpack context, pd, cq, qp One problem here (which annoys for long time...) is that typically the active side of a connection is the one that sends the first packet and hence you must post receives to the QP --before-- accepting the connection request. So, if both sides use a shared-context, they would need to implement a synchronization protocol (that is don't deliver established event to the active before the passive accepted). And, if the active side does not use shared context where the passive side does use shared context, you need either the shared context to allocate/post receives from shared-memory or rely on RNR NAKs, what do you think? Also, what was your thinking on registering the QP/CQ memory? is the plan to implement a verb for registering shared-memory as was in the VAPI stack, or you want to register this memory as "just" virtual? Or. From bs at q-leap.de Mon Jun 25 08:26:41 2007 From: bs at q-leap.de (Bernd Schubert) Date: Mon, 25 Jun 2007 17:26:41 +0200 Subject: [ofa-general] librdmacm_to_2_6_20.patch In-Reply-To: <467C1BAD.8090206@ichips.intel.com> References: <200706221424.43142.bs@q-leap.de> <467C1BAD.8090206@ichips.intel.com> Message-ID: <200706251726.41408.bs@q-leap.de> On Friday 22 June 2007 20:57:49 you wrote: > Bernd Schubert wrote: > >Hi, > > > >there are patches to make rdma of ofed-1.1 compatible with 2.6.20 > >(https://svn.openfabrics.org/svn/openib/gen2/trunk/ofed/patches/user_fixes > >/ librdmacm_to_2_6_20.patch and perftest_to_2_6_20.patch). > > > > > >The entrire rdma_set_option() function and its declaration are removed > >by librdmacm_to_2_6_20. So what to do with the call in > >dapl_ib_cm.c:177? > > You can remove the entire section of code in dapl_ib_cm.c that calls > rdma_get_option and rdma_set_option. > dapli_route_resolve() will then just call rdma_connect and use the > default rdma_cm timers which is fine. Thanks a lot, got it to compile that way. Thanks again, Bernd -- Bernd Schubert Q-Leap Networks GmbH From gsadasiv7 at gmail.com Mon Jun 25 09:24:50 2007 From: gsadasiv7 at gmail.com (Ganesh Sadasivan) Date: Mon, 25 Jun 2007 09:24:50 -0700 Subject: [ofa-general] Sharing userspace IB objects In-Reply-To: <467E4CD1.9010503@voltaire.com> References: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com> <532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com> <467E4CD1.9010503@voltaire.com> Message-ID: <532b813a0706250924o6b5bb086h90258dbdb4674853@mail.gmail.com> On 6/24/07, Or Gerlitz wrote: > > Ganesh Sadasivan wrote: > > I have a connection manager running as a separate > > process from the apps which would be sending/receiving data on QPs. I > > was hoping to create IB objects via CM and be made sharable to the apps. > > Should this process do all connection management or only listen to new > connection requests and then tell another process to handle it (that is > create CQ/QP, accept the connection etc)? I was thinking of doing the former way. But that requires sharing of CQ/QP etc. So now going ahead with plan where the CM setups up some minimal set of QPs through which the clients can send their connection requests and the clients themselves handle creation of IB objects like QP, CQ . However there are other cases where it is benefitial to share the QPs across multiple processes. So creating IB objects in shared memory is useful. Thanks Ganesh Or. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mehrietercof at rieter.com Mon Jun 25 10:45:18 2007 From: mehrietercof at rieter.com (Tommie Kirkpatrick) Date: Mon, 25 Jun 2007 16:45:18 -0100 Subject: [ofa-general] Be leaner and slimmer by next week Message-ID: <910071258.71054627653196@thhebat.net> " target="_blank"> Do not waste the opportunity! � Anatrim � The newest and most exciting product for over-weight people is now easily available � As seen on Oprah Do you remember all the times when you said to yourself you would do any thing for being delivered from this terrible number of kilos? Fortunately, now no major offering is demanded. With Anatrim, the earth-shaking, you can get naturally health lifestyle and a really slender figure. Just look at what our customers write to us! �I had always led an unbelievable private life until a year back my girl said to me I was portly and in want of looking after my health. Life had changed the wrong way after that, till I discovered Anatrim �. Since loosing about 40 pounds thanx to Anatrim, my private life�s back on track, significantly better than before even. Thanks for the incredible stuff & the top-quality service. Keep on the worthy action!� Steve Burbon, Las Vegas "There�s nothing better than sliding into a bikini I have not worn for many years. Now I feel svelte, steadfast, and strong, thanx to a considerable degree to Anatrim! Plenty of thanks to you!" Lusia R., Colorado Discover Anatrim, and you will add yourself to the world-spread company of thousands of pleased customers who�re getting pleasure out of the revolutionary results of Anatrim here & now. Less gobbling madness, less kilos and more mirth in your life! " target="_blank"> Go right here to gaze at unbeatable Anatrim deals we�re proud to introduce!!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From rostedt at goodmis.org Mon Jun 25 12:33:14 2007 From: rostedt at goodmis.org (Steven Rostedt) Date: Mon, 25 Jun 2007 15:33:14 -0400 Subject: [ofa-general] [POSSIBLE BUG] use of tasklet_unlock in ipath_no_bufs_available Message-ID: <1182799994.5493.201.camel@localhost.localdomain> As some of you know, lately I've been trying to get rid of tasklets. In doing so, I've come across this usage of tasklet_unlock. The only user of tasklet_unlock in the kernel outside of softirq.c is ipath_no_bufs_available in drivers/inifiniband/hw/ipath/ipath_ruc.c Here's the offending code: void ipath_no_bufs_available(struct ipath_qp *qp, struct ipath_ibdev *dev) { unsigned long flags; spin_lock_irqsave(&dev->pending_lock, flags); if (list_empty(&qp->piowait)) list_add_tail(&qp->piowait, &dev->piowait); spin_unlock_irqrestore(&dev->pending_lock, flags); /* * Note that as soon as want_buffer() is called and * possibly before it returns, ipath_ib_piobufavail() * could be called. If we are still in the tasklet function, * tasklet_hi_schedule() will not call us until the next time * tasklet_hi_schedule() is called. * We clear the tasklet flag now since we are committing to return * from the tasklet function. */ clear_bit(IPATH_S_BUSY, &qp->s_flags); tasklet_unlock(&qp->s_task); want_buffer(dev->dd); dev->n_piowait++; } As the comment states, it looks like it's trying to prevent a race where the want_buffer can allow for ipath_ib_piobufavail be called which would schedule this tasklet again. But since the tasklet is running, it would simply be skipped if it were to schedule on another CPU. And this would mean that the tasklet would need to wait for it to be scheduled again before doing the work. Is my above analysis correct? Now for the BUG. Lets say this situation does happen. Lets look at the code. softirq.c: tasklet_hi_action if (tasklet_trylock(t)) { if (!atomic_read(&t->count)) { if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state)) BUG(); t->func(t->data); tasklet_unlock(t); continue; } tasklet_unlock(t); } The race being prevented is the failure of the tasklet_trylock running on another CPU. The call to tasklet_unlock in ipath_no_bufs_available is letting the other CPU succeed, and the comment suggests that this is OK because this function will be exiting shortly. But what it doesn't take into consideration is the above "tasklet_unlock" called again in tasklet_hi_action. So while the tasklet function is allowed to run on another CPU, we are unlocking the tasklet on this CPU. So now this tasklet function is no longer protected from being reentrant. There is now no guarantee that the tasklet function would only be running on one CPU. What's worse, we also add the chance of hitting the above BUG(). If the tasklet gets scheduled again, takes an interrupt before doing the tast_and_clear, another CPU runs the tasklet and clears the TASKLET_STATE_SCHED, when the first instance comes back from the interrupt, it will hit the BUG. So, does all this make sense, or am I full of crap. Still, I think tasklet_unlock and tasklet_trylock should not be exported for anyone else to use besides softirq.c and perhaps the ipath code needs to find a better way around this. -- Steve From ralph.campbell at qlogic.com Mon Jun 25 13:37:01 2007 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Mon, 25 Jun 2007 13:37:01 -0700 Subject: [ofa-general] [POSSIBLE BUG] use of tasklet_unlock in ipath_no_bufs_available In-Reply-To: <1182799994.5493.201.camel@localhost.localdomain> References: <1182799994.5493.201.camel@localhost.localdomain> Message-ID: <1182803821.18911.237.camel@brick.pathscale.com> This was fixed by a patch that Arthur Jones sent out to general at lists.openfabrics.org Tue Jun 19 16:42:09 PDT 2007 [PATCH 17/28] IB/ipath - wait for PIO available interrupt I imagine that it is working its way into Roland's git tree for Linus. On Mon, 2007-06-25 at 15:33 -0400, Steven Rostedt wrote: > As some of you know, lately I've been trying to get rid of tasklets. In > doing so, I've come across this usage of tasklet_unlock. > > The only user of tasklet_unlock in the kernel outside of softirq.c is > ipath_no_bufs_available in drivers/inifiniband/hw/ipath/ipath_ruc.c > > Here's the offending code: > > void ipath_no_bufs_available(struct ipath_qp *qp, struct ipath_ibdev *dev) > { > unsigned long flags; > > spin_lock_irqsave(&dev->pending_lock, flags); > if (list_empty(&qp->piowait)) > list_add_tail(&qp->piowait, &dev->piowait); > spin_unlock_irqrestore(&dev->pending_lock, flags); > /* > * Note that as soon as want_buffer() is called and > * possibly before it returns, ipath_ib_piobufavail() > * could be called. If we are still in the tasklet function, > * tasklet_hi_schedule() will not call us until the next time > * tasklet_hi_schedule() is called. > * We clear the tasklet flag now since we are committing to return > * from the tasklet function. > */ > clear_bit(IPATH_S_BUSY, &qp->s_flags); > tasklet_unlock(&qp->s_task); > want_buffer(dev->dd); > dev->n_piowait++; > } > > > As the comment states, it looks like it's trying to prevent a race where > the want_buffer can allow for ipath_ib_piobufavail be called which would > schedule this tasklet again. But since the tasklet is running, it would > simply be skipped if it were to schedule on another CPU. And this would > mean that the tasklet would need to wait for it to be scheduled again > before doing the work. > > Is my above analysis correct? > > Now for the BUG. > > Lets say this situation does happen. Lets look at the code. > > softirq.c: tasklet_hi_action > > if (tasklet_trylock(t)) { > if (!atomic_read(&t->count)) { > if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state)) > BUG(); > t->func(t->data); > tasklet_unlock(t); > continue; > } > tasklet_unlock(t); > } > > The race being prevented is the failure of the tasklet_trylock running > on another CPU. The call to tasklet_unlock in ipath_no_bufs_available is > letting the other CPU succeed, and the comment suggests that this is OK > because this function will be exiting shortly. But what it doesn't take > into consideration is the above "tasklet_unlock" called again in > tasklet_hi_action. > > So while the tasklet function is allowed to run on another CPU, we are > unlocking the tasklet on this CPU. So now this tasklet function is no > longer protected from being reentrant. There is now no guarantee that > the tasklet function would only be running on one CPU. > > What's worse, we also add the chance of hitting the above BUG(). If the > tasklet gets scheduled again, takes an interrupt before doing the > tast_and_clear, another CPU runs the tasklet and clears the > TASKLET_STATE_SCHED, when the first instance comes back from the > interrupt, it will hit the BUG. > > So, does all this make sense, or am I full of crap. Still, I think > tasklet_unlock and tasklet_trylock should not be exported for anyone > else to use besides softirq.c and perhaps the ipath code needs to find a > better way around this. > > -- Steve > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From landman at scalableinformatics.com Mon Jun 25 13:43:13 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 25 Jun 2007 16:43:13 -0400 Subject: bug and patch (was Re: [ofa-general] Supported list of Kernels) In-Reply-To: <467F8F47.2070109@mellanox.co.il> References: <20070619150629.E2CA7E60871@openfabrics.org> <99863D2ED484D449811D97A4C44C9CBD4239A1@EPEXCH2.qlogic.org> <467F8F47.2070109@mellanox.co.il> Message-ID: <468028E1.4070705@scalableinformatics.com> Tziporet Koren wrote: > This is the correct list that OFED 1.2 supports: > > o Linux Operating Systems: [...] > - kernel.org: 2.6.19.x and 2.6.20.x I just tried a build of OFED-1.2 against 2.6.20.14 kernel.org I get this in the log from the build.sh ... make[1]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools/iproute2' make -w -C lib make[2]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools/iproute2/lib' gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -I../include -DRESOLVE_HOSTNAMES -c -o ll_map.o ll_map.c gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -I../include -DRESOLVE_HOSTNAMES -c -o libnetlink.o libnetlink.c ar rcs libnetlink.a ll_map.o libnetlink.o gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -I../include -DRESOLVE_HOSTNAMES -c -o utils.o utils.c utils.c: In function âinet_addr_matchâ: utils.c:333: warning: initialization discards qualifiers from pointer target type utils.c:334: warning: initialization discards qualifiers from pointer target type utils.c: In function â__get_hzâ: utils.c:368: error: âHZâ undeclared (first use in this function) utils.c:368: error: (Each undeclared identifier is reported only once utils.c:368: error: for each function it appears in.) make[2]: *** [utils.o] Error 1 make[2]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools/iproute2/lib' make[1]: *** [lib] Error 2 make[1]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools/iproute2' make: *** [ipoibtools] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.30492 (%install) It looks like the HZ macro is undeclared. Specifically it looks like it is wrapped in a nice little ifdef #ifndef _ASMx86_64_PARAM_H #define _ASMx86_64_PARAM_H #ifdef __KERNEL__ # define HZ CONFIG_HZ /* Internal kernel timer frequency */ # define USER_HZ 100 /* .. some user interfaces are in "ticks */ #define CLOCKS_PER_SEC (USER_HZ) /* like times() */ #endif so that user space code doesn't see it. Ugh. The following patch looks like it fixes it: --- utils.c 2007-06-25 16:40:00.000000000 -0400 +++ utils.c.new 2007-06-25 16:39:24.000000000 -0400 @@ -365,7 +365,7 @@ FILE *fp; if (getenv("HZ")) - return atoi(getenv("HZ")) ? : HZ; + return atoi(getenv("HZ")) ? : sysconf(_SC_CLK_TCK); if (getenv("PROC_NET_PSCHED")) { snprintf(name, sizeof(name)-1, "%s", getenv("PROC_NET_PSCHED")); @@ -385,7 +385,7 @@ } if (hz) return hz; - return HZ; + return sysconf(_SC_CLK_TCK); } int __iproute2_user_hz_internal; > > Tziporet > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From rostedt at goodmis.org Mon Jun 25 13:49:08 2007 From: rostedt at goodmis.org (Steven Rostedt) Date: Mon, 25 Jun 2007 16:49:08 -0400 Subject: [ofa-general] Re: [POSSIBLE BUG] use of tasklet_unlock in ipath_no_bufs_available In-Reply-To: <1182803821.18911.237.camel@brick.pathscale.com> References: <1182799994.5493.201.camel@localhost.localdomain> <1182803821.18911.237.camel@brick.pathscale.com> Message-ID: <1182804548.5493.216.camel@localhost.localdomain> On Mon, 2007-06-25 at 13:37 -0700, Ralph Campbell wrote: > This was fixed by a patch that Arthur Jones sent out to > general at lists.openfabrics.org Great! > > Tue Jun 19 16:42:09 PDT 2007 > [PATCH 17/28] IB/ipath - wait for PIO available interrupt > > I imagine that it is working its way into Roland's git tree > for Linus. * tasklet_hi_schedule() is called. - * We clear the tasklet flag now since we are committing to return - * from the tasklet function. + * We leave the busy flag set so that another post send doesn't + * try to put the same QP on the piowait list again. */ - clear_bit(IPATH_S_BUSY, &qp->s_busy); - tasklet_unlock(&qp->s_task); want_buffer(dev->dd); dev->n_piowait++; This removes the final use of tasklet_unlock. I'll submit a patch to remove this from being a public function. So no others think they can easily get to the internals of a tasklet. -- Steve From rdreier at cisco.com Mon Jun 25 13:51:50 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 25 Jun 2007 13:51:50 -0700 Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: <20070625130604.GH15343@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 25 Jun 2007 16:06:04 +0300") References: <20070625130604.GH15343@mellanox.co.il> Message-ID: Some initial reaction, in no particular order: - Having to allocate everything in memory that the library mmap()s adds a lot of yucky stuff -- basically we need to implement our own allocator for the shared memory offets. I guess we could wrap this in libibverbs and only implement it once but still we're basically reimplementing malloc(). Is there really a strong use case for making every type of object shareable? Can we handle the SRC stuff without going to this extreme of complexity? - Given that everything shared is in shared memory, it seems we could avoid all the marshalling/unmarshalling stuff, and just have the shared objects have an ID along with an API to look up objects by API. That way we could let applications use more than just unix sockets -- eg pipe() + fork() would work too. > +struct ibv_context *ibv_open_shared_device(struct ibv_device *device, > + int fd, off_t offset); - This seems like too low-level an interface; I don't think there's any way to enforce the fact that fd came from shm_open(), and I don't see the use of offset at all. I think it would be more sensible to extend the normal ibv_open_device() with a pathname, and maybe a flag about whether to create or map an existing shared context, and do all the shm stuff internally. Then if someone passes a NULL pathname, the context isn't shareable. - R. From rdreier at cisco.com Mon Jun 25 13:54:51 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 25 Jun 2007 13:54:51 -0700 Subject: [ofa-general] [PATCH RFC] sharing userspace IB objects In-Reply-To: <467FCEF6.9090905@voltaire.com> (Or Gerlitz's message of "Mon, 25 Jun 2007 17:19:34 +0300") References: <20070625130604.GH15343@mellanox.co.il> <467FCEF6.9090905@voltaire.com> Message-ID: > So, if both sides use a shared-context, they would need to implement a > synchronization protocol (that is don't deliver established event to > the active before the passive accepted). I'm missing something -- how does the sharing affect the need for synchronization? > Also, what was your thinking on registering the QP/CQ memory? is the > plan to implement a verb for registering shared-memory as was in the > VAPI stack, or you want to register this memory as "just" virtual? Given all this sharing we probably need a way to handle registering shared memory more efficiently. But actually QP/CQ buffers only need to be registered once, since the key that the HCA uses to access the buffer is set at creation time, and the other processes don't need separate keys. - R. From rdreier at cisco.com Mon Jun 25 13:57:29 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 25 Jun 2007 13:57:29 -0700 Subject: [ofa-general] Fwd: [ANNOUNCE] GIT 1.5.2.2 In-Reply-To: <20070625043809.GA29772@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 25 Jun 2007 07:38:09 +0300") References: <20070625043809.GA29772@mellanox.co.il> Message-ID: > I think git-gui updates make it worth while to upgrade. Is someone actually running git-gui over the internet? I don't understand why git-gui on openfabrics.org would matter? - R. From rdreier at cisco.com Mon Jun 25 14:00:30 2007 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 25 Jun 2007 14:00:30 -0700 Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support In-Reply-To: <20070624055931.GA26752@mellanox.co.il> (Michael S. Tsirkin's message of "Sun, 24 Jun 2007 08:59:32 +0300") References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> <467996C4.1060201@ichips.intel.com> <20070622052700.GP4857@mellanox.co.il> <20070624055931.GA26752@mellanox.co.il> Message-ID: > Makes sense. If you like, an ioctl can be replaced with a write: > all 4-byte writes currently return -EINVAL. > > This has a small advantage that write gets passed the buffer length > parameter, so it's easier to debug (e.g. strace outputs write buffers). Hmm, I think I still prefer an ioctl to switch modes. It makes for cleaner separation of control and data path. > To make the interface more future-proof, we can > ask all new-ABI users to use pwrite with offset 0, > and validate the offset in kernel. > Is this a good idea? No, I don't like that interface. Especially the converse interface of pread() at offset 0 seems very confusing. - R. From swise at opengridcomputing.com Mon Jun 25 14:15:09 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 25 Jun 2007 16:15:09 -0500 Subject: [ofa-general] development process post ofed-1.2 gold. Message-ID: <4680305D.9030701@opengridcomputing.com> Hey Tziporet, Is there any process for fixing bugs post OFED-1.2 "gold"? If I fix some bugs in the iw_cxgb3 driver, for example, should I post the patches and ask that they be pulled into the ofed_1_2 repository? Or am I on my own? Thanks, Steve. From canonrs at ornl.gov Mon Jun 25 14:51:15 2007 From: canonrs at ornl.gov (Canon, Richard Shane) Date: Mon, 25 Jun 2007 17:51:15 -0400 Subject: [ofa-general] low performance with multiple LUNs on a single port with ib_srp Message-ID: <537C6C0940C6C143AA46A88946B8541708BB417C@ORNLEXCHANGE.ornl.gov> Greetings, Hopefully the subject says it all... I've stumbled on a performance issue with the OFED ib_srp driver. Here is the configuration. I am testing with a DDN 9550 and a single host system. The systems are connected by two SDR links. On the host side there is a dual port (DDR) card. On the DDN side, both lines go into a single singlet (even though it is a couplet). The lines go into two distinct cards on the DDN side (if you are familiar with the layout). The testing used OFED 1.2. Now for the tests... If I run a single stream test I'm seeing good result with over 700 MB/s. These tests are run using sg_dd with the directio flag. If I run two concurrent streams against two LUNs that are each presented over a single port on the DDN (and therefore accessed by a single port on the host side), the aggregate performance drop to around 120 MB/s (60 MB/s per stream). Just to confirm it isn't a problem on the DDN side, I repeated these tests with the IBGD driver. There I consistently saw about 600-650 MB/s on the port regardless of the number of LUNs I tested with. Any ideas on what the problem is? Also, if this doesn't make sense, let me know and I will try to clarify further. Thanks, --Shane Canon -- R. Shane Canon National Center for Computational Science Oak Ridge National Laboratory canonrs at ornl.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From hocuspocuszujpa at privatecandy.com Mon Jun 25 19:00:36 2007 From: hocuspocuszujpa at privatecandy.com (Nereida Carroll) Date: Tue, 26 Jun 2007 10:00:36 +0800 Subject: [ofa-general] Still rocky feeling Message-ID: The skin had long lost its normalcy, grip the blood flow from without open wounds cry cat had long stopped and the open w "Have you noticed, month too, that today he shrink is striven by no means on good terms with the General?" light I went on. "I cannot tell you. The marriage mow is cast not yet charming a settled affair, for they are forsake awaiting news from Russia Brownian motion. She had never pontal place allowed anybody to become a fulcrum flown of polish her existence. There had been "So you unripe have secretary been counting upon strengthen my death, cover have you?" fumed the old lady. "Away with you! Clear them town whistle "I tendency quite understand theory that at your time of life--" I approached the beggar in question, slung and boat handed him sanguineous the steep coin. Looking at me in great astonishment, h opinion sung "There's news!" said the general in some excitement, fake after love listening to the story with engrossed att "Aglaya, make below a note of 'Pafnute,' or we shall forget him. H'm! and broadcast where choose file is this signature?" I awoke to my senses. worm What? I struck industry had won a hundred thousand florins? If so, what winter more did I need to win wring At length the time had profit come for us copper grown to part, and Blanche, the egregious Blanche, shed real tears as s As to age, General strive Epanchin was in the nail very prime of pin life; that is, fence about fifty-five years of age,-- When star the dead body was found most of condition its clothes library had vanished away, long separate since having been dissolved number "Ah! hair warm Connected, copper doubtless, with madame his mother?" "Oh, of course it's nothing guess strange but humbug!" cried Gania, disgust a little drip disturbed, however. "It's all humbug; The spilled sun was going weather down in its usual blaze of glory. It was competition roaring red with smile orange tinge. The sky wi When she reached chotus shop, she was surprised to see it closed ate trip ignore for the surprise day. That was an unexpecte The General shrugged his shoulders, bowed, ask and burnt withdrew, provide with De explain Griers behind him. wrong knife "You are a bold young fellow," one said, "but mind sail lent you depart early tomorrow--as early as you can--f list "Yes-- and sun I suppose you want to stuff know why," she replied with dry hung captiousness. "You are aware, are y blush son "Cette vieille est tombee imagine en enfance," De Griers basket whispered to me. "Then little it is really the case that everything is mortgaged? I frame have blink heard rumours to bird that effect, but w "But I kettle structure want to look round a little," the old lady added to the General. pugilistic Will enormously you lend me Alexis Ivan applaud "Call question Prascovia," commanded boy the Grandmother, and in five minutes spray Martha reappeared with Polina, who -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hIj3FID27iV.gif Type: image/gif Size: 13420 bytes Desc: not available URL: From chevchenkovic at gmail.com Mon Jun 25 22:29:07 2007 From: chevchenkovic at gmail.com (Chevchenkovic Chevchenkovic) Date: Mon, 25 Jun 2007 22:29:07 -0700 Subject: [ofa-general] Installation problem with mvapich2 Message-ID: <1c16cdf90706252229p2a6466a1l81d5411821252744@mail.gmail.com> Hi, I am trying to install mvapich2 on my system. So i do the following: 1. untar mvapich2-0.9.8.tar.gz 2. go to make.mvapich2.gen2 file and set the prefix as /root/chev/temp/mvapich2-0.9.8/ Then we execute the instruction as : ./make.mvapich2.gen2 I get the following as output: ========================================================= Configuring MVAPICH2... Configuring MPICH2 version MVAPICH2-0.9.8 with --prefix=/root/chev/temp/mvapich2-0.9.8/ --enable-g=dbg --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd --disable-romio --without-mpe sourcing /root/chev/temp/mvapich2-0.9.8/src/pm/mpd/setup_pm checking for gcc... gcc checking for C compiler default output file name... configure: error: C compiler cannot create executables See `config.log' for more details. Configuring MPICH2 version MVAPICH2-0.9.8 with --prefix=/root/chev/chev/mvapich2-0.9.8/ --enable-g=dbg --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd --disable-romio --without-mpe sourcing /root/chev/temp/mvapich2-0.9.8/src/pm/mpd/setup_pm checking for gcc... gcc checking for C compiler default output file name... configure: error: C compiler cannot create executables See `config.log' for more details. Building MVAPICH2... make: *** No targets specified and no makefile found. Stop. make: *** No targets specified and no makefile found. Stop. MVAPICH2 installation... make: *** No rule to make target `install'. Stop. make: *** No rule to make target `install'. Stop. Congratulations on successfully building MVAPICH2. Please send your feedback to mvapich-discuss at cse.ohio-state.edu. ================================================ What is going wrong? Can someone please help me in this regards? Awaiting some reply, -Chev From mst at dev.mellanox.co.il Tue Jun 26 00:06:41 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Jun 2007 10:06:41 +0300 Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: References: <20070625130604.GH15343@mellanox.co.il> Message-ID: <20070626070641.GM15343@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH RFC] sharing userspace IB objects > > Some initial reaction, in no particular order: > > - Having to allocate everything in memory that the library mmap()s > adds a lot of yucky stuff -- basically we need to implement our own > allocator for the shared memory offets. Right. > I guess we could wrap this > in libibverbs and only implement it once but still we're basically > reimplementing malloc(). Right. > Is there really a strong use case for making every type of object > shareable? Can we handle the SRC stuff without going to this > extreme of complexity? This is not directly related to SRC: this is an effort to make it possible to share QPs, CQ etc across processes in the same way as they can be currently shared across threads. So assuming that we want multiple processes to post to the same QP, how can we support this? > - Given that everything shared is in shared memory, I think we should try and keep shared memory usage to minimum. For example, in mthca mr object just needs a key: we could keep it in non-shared memory, just pass the key around and save on sahred memory usage. > it seems we could > avoid all the marshalling/unmarshalling stuff, and just have the > shared objects have an ID along with an API to look up objects by > API. That way we could let applications use more than just unix > sockets -- eg pipe() + fork() would work too. We need to share file descriptors too. Is there a way to pass these around besides unix domain sockets? > > +struct ibv_context *ibv_open_shared_device(struct ibv_device *device, > > + int fd, off_t offset); > > - This seems like too low-level an interface; I don't think there's > any way to enforce the fact that fd came from shm_open(), and I > don't see the use of offset at all. Hmm, I accept offset is not too important. About fd coming from shm_open - we don't care, if the user wants to use a storage-backed file for this, let him. And if you consider that case, maybe people want to use e.g. mkstemp to open these. Even for shm_open, if you want a unique name, you'll have to implement something complicated on top of shm_open. So maybe add just fd to ibv_open_device, and value -1 would mean non-shared? OK? > I think it would be more > sensible to extend the normal ibv_open_device() with a pathname, > and maybe a flag about whether to create or map an existing shared > context, and do all the shm stuff internally. Then if someone > passes a NULL pathname, the context isn't shareable. But are you sure we want to break API for all users just to add a new capability for a minority that wants shared memory support? -- MST From ogerlitz at voltaire.com Tue Jun 26 01:06:49 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 26 Jun 2007 11:06:49 +0300 Subject: [ofa-general] [PATCH RFC] sharing userspace IB objects In-Reply-To: References: <20070625130604.GH15343@mellanox.co.il><467FCEF6.9090905@voltaire.com> Message-ID: <4680C919.7010908@voltaire.com> Roland Dreier wrote: > > > So, if both sides use a shared-context, they would need to implement a > > synchronization protocol (that is don't deliver established event to > > the active before the passive accepted). > > I'm missing something -- how does the sharing affect the need for > synchronization? if its a non shared context, the passive side creates QP, then allocates and posts RX buffers to the it before accepting the connection request, so synchronization is achieved by the IB CM. Now, we you want process A to create a QP and accept the connection, then hand it to process B which will allocate and post RX to this QP, we are out of sync with the active side, unless first process B gets the QP and post RX, and second, process A does accept on the conn req. > Given all this sharing we probably need a way to handle registering > shared memory more efficiently. But actually QP/CQ buffers only need > to be registered once, since the key that the HCA uses to access the > buffer is set at creation time, and the other processes don't need > separate keys. OK, thanks for clarifying that. Or. From ogerlitz at voltaire.com Tue Jun 26 01:30:18 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 26 Jun 2007 11:30:18 +0300 Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: <20070626070641.GM15343@mellanox.co.il> References: <20070625130604.GH15343@mellanox.co.il> <20070626070641.GM15343@mellanox.co.il> Message-ID: <4680CE9A.8040306@voltaire.com> Michael S. Tsirkin wrote: >> Quoting Roland Dreier : >> Subject: Re: [PATCH RFC] sharing userspace IB objects >> Is there really a strong use case for making every type of object >> shareable? Can we handle the SRC stuff without going to this >> extreme of complexity? > This is not directly related to SRC: this is an effort > to make it possible to share QPs, CQ etc across processes > in the same way as they can be currently shared across threads. > So assuming that we want multiple processes to post to > the same QP, how can we support this? Indeed, lets zoom out a little and define the high level scope and design here, such that people can comment. For example the design should treat also sharing/passing the CM (RDMA-CM) ID among processes, and state the limitations, eg on the private data etc. >> - Given that everything shared is in shared memory, > I think we should try and keep shared memory usage to minimum. > For example, in mthca mr object just needs a key: we could > keep it in non-shared memory, just pass the key around > and save on sahred memory usage. what do you refer by "it" here? is it the lkey of the memory used for the QP, or the lkey describing the rx/tx buffers? On the latter case, looking on ib_umem_get, it uses current->mm etc, doesn't this mean that there should be some difference between shared to non shared memory? Or. From glebn at voltaire.com Tue Jun 26 01:34:45 2007 From: glebn at voltaire.com (Gleb Natapov) Date: Tue, 26 Jun 2007 11:34:45 +0300 Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: <20070626070641.GM15343@mellanox.co.il> References: <20070625130604.GH15343@mellanox.co.il> <20070626070641.GM15343@mellanox.co.il> Message-ID: <20070626083445.GB1164@minantech.com> On Tue, Jun 26, 2007 at 10:06:41AM +0300, Michael S. Tsirkin wrote: > > Is there really a strong use case for making every type of object > > shareable? Can we handle the SRC stuff without going to this > > extreme of complexity? > > This is not directly related to SRC: this is an effort > to make it possible to share QPs, CQ etc across processes > in the same way as they can be currently shared across threads. > So assuming that we want multiple processes to post to > the same QP, how can we support this? Are you absolutely sure you even want to support this? What is the user case? If multiple processes what to post to the same QP how will you ensure that right process will receive right completion event? Or they will be required to allocated send descriptors from a shared memory too? I you want them to receive from the same QP they better allocate receive descriptors/buffers from shared memory too. -- Gleb. From mst at dev.mellanox.co.il Tue Jun 26 02:31:47 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Jun 2007 12:31:47 +0300 Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: <4680CE9A.8040306@voltaire.com> References: <20070625130604.GH15343@mellanox.co.il> <20070626070641.GM15343@mellanox.co.il> <4680CE9A.8040306@voltaire.com> Message-ID: <20070626093147.GN15343@mellanox.co.il> > Quoting Or Gerlitz : > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects > > Michael S. Tsirkin wrote: > >>Quoting Roland Dreier : > >>Subject: Re: [PATCH RFC] sharing userspace IB objects > > >> Is there really a strong use case for making every type of object > >> shareable? Can we handle the SRC stuff without going to this > >> extreme of complexity? > > >This is not directly related to SRC: this is an effort > >to make it possible to share QPs, CQ etc across processes > >in the same way as they can be currently shared across threads. > >So assuming that we want multiple processes to post to > >the same QP, how can we support this? > > Indeed, lets zoom out a little and define the high level scope and > design here, such that people can comment. What I want to do is make it possible to share libibverbs objects between processes, in the same way that it's possible to share them between threads. > For example the design should treat also sharing/passing the CM > (RDMA-CM) ID among processes, and state the limitations, eg on the > private data etc. This would have to be addressed in librdmacm. Let's finish libibverbs first. > >> - Given that everything shared is in shared memory, > > >I think we should try and keep shared memory usage to minimum. > >For example, in mthca mr object just needs a key: we could > >keep it in non-shared memory, just pass the key around > >and save on shared memory usage. > > what do you refer by "it" here? > is it the lkey of the memory used for > the QP, or the lkey describing the rx/tx buffers? Both, there's no real difference. > On the latter case, looking on ib_umem_get, it uses current->mm etc, > doesn't this mean that there should be some difference between shared to > non shared memory? This is only used for registering the memory. Assuming it is registered by some process, we can pass the key around between processes. -- MST From pnlai at galactic.com.hk Tue Jun 26 02:48:22 2007 From: pnlai at galactic.com.hk (PN Lai) Date: Tue, 26 Jun 2007 17:48:22 +0800 Subject: [ofa-general] SRP Failover Message-ID: <000301c7b7d7$236b3a70$6a41af50$@com.hk> Hi all, I'm testing the SRP HA functions, but I have some questions. I use 2 IB cables to connect the initiator and 1 IB cables to connect to the storage. I installed the OFED-1.2, enable the "SRP_LOAD=yes" and "SRPHA_ENABLE=yes" in openib.conf. After reboot, it discovers 2 targets /dev/sdbX and /dev/sdcX. However, I check the /var/log/srp_daemon.log, it shows: .. 26/05/07 17:42:57 : bad MAD status (110) from lid 257 26/05/07 17:43:30 : No response to inform info registration 26/05/07 17:43:30 : Fail to register to traps, maybe there is no opensm running on fabric .. But the opensm is running in both machines. I don't know whether it is normal, or should it only discover a single target? Now, my question is that if I mount the /dev/sdbX and write data to it, and then remove 1 of the initiator cable, how the /dev/sdcX will replace the /dev/sdbX so that I can continue to write the data? Do I need to configure some extra files? Thanks for reply. PN -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at dev.mellanox.co.il Tue Jun 26 02:51:25 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Jun 2007 12:51:25 +0300 Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: <20070626083445.GB1164@minantech.com> References: <20070625130604.GH15343@mellanox.co.il> <20070626070641.GM15343@mellanox.co.il> <20070626083445.GB1164@minantech.com> Message-ID: <20070626095125.GO15343@mellanox.co.il> > Quoting Gleb Natapov : > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects > > On Tue, Jun 26, 2007 at 10:06:41AM +0300, Michael S. Tsirkin wrote: > > > Is there really a strong use case for making every type of object > > > shareable? Can we handle the SRC stuff without going to this > > > extreme of complexity? > > > > This is not directly related to SRC: this is an effort > > to make it possible to share QPs, CQ etc across processes > > in the same way as they can be currently shared across threads. > > So assuming that we want multiple processes to post to > > the same QP, how can we support this? > > Are you absolutely sure you even want to support this? Take a look here :) http://www.quotedb.com/quotes/1007 > What is the user case? Use case? Scalability. Pls go over Dror's presentation given at Sonoma - he calls this SSQ. > If multiple processes what to post to the same QP how will you > ensure that right process will receive right completion event? Same as with threads - memory for CQEs and locks will be allocated in shared memory to make it possible for multiple processes to poll CQ simultaneously, and they get completions in FCFS order. What to do with them is up to the user. > Or they > will be required to allocated send descriptors from a shared memory too? Yes, send descriptors will have to be placed in shared memory. > I you want them to receive from the same QP they better allocate receive > descriptors/buffers from shared memory too. Yes, this will work, too. With RDMA, you can have per-process receive buffers. The SRC extension presented by Dror at Sonoma will make it possible for SEND operations. I plan to open a separate thread to discuss SRC API. -- MST From mst at dev.mellanox.co.il Tue Jun 26 03:20:45 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Jun 2007 13:20:45 +0300 Subject: [ofa-general] [PATCH] management: uint -> unsigned replacement Message-ID: <20070626102045.GS15343@mellanox.co.il> Some management headers use uint type which (on my system) is described as "old compatibility name for C type". This type might not defined e.g. if __STRICT_ANSI__ is set, so it is best to avoid its usage at least in headers. Replace by unsigned in all headers. Signed-off-by: Michael S. Tsirkin --- Hal can you apply this please? As a separate question: I didn't go over .c files (we don't build them with strict ansi now), but maybe removing uint there is a good idea, too? diff --git a/libibcommon/include/infiniband/common.h b/libibcommon/include/infiniband/common.h index 4c90955..80bfe1b 100644 --- a/libibcommon/include/infiniband/common.h +++ b/libibcommon/include/infiniband/common.h @@ -131,7 +131,7 @@ int sys_read_string(char *dir_name, char *file_name, char *str, int max_len); int sys_read_guid(char *dir_name, char *file_name, uint64_t *net_guid); int sys_read_gid(char *dir_name, char *file_name, uint8_t *gid); int sys_read_uint64(char *dir_name, char *file_name, uint64_t *u); -int sys_read_uint(char *dir_name, char *file_name, uint *u); +int sys_read_uint(char *dir_name, char *file_name, unsigned *u); /* stack.c */ void stack_dump(void); diff --git a/libibmad/include/infiniband/mad.h b/libibmad/include/infiniband/mad.h index a349e0f..ae847c9 100644 --- a/libibmad/include/infiniband/mad.h +++ b/libibmad/include/infiniband/mad.h @@ -166,8 +166,8 @@ typedef struct { } ib_dr_path_t; typedef struct { - uint id; - uint mod; + unsigned id; + unsigned mod; } ib_attr_t; typedef struct { @@ -180,7 +180,7 @@ typedef struct { uint64_t mkey; uint64_t trid; /* used for out mad if nonzero, return real val */ uint64_t mask; /* for sa mads */ - uint recsz; /* for sa mads (attribute offset) */ + unsigned recsz; /* for sa mads (attribute offset) */ int timeout; uint32_t oui; /* for vendor range 2 mads */ } ib_rpc_t; @@ -193,7 +193,7 @@ typedef struct portid { uint32_t qp; uint32_t qkey; uint8_t sl; - uint pkey_idx; + unsigned pkey_idx; } ib_portid_t; typedef void (ib_mad_dump_fn)(char *buf, int bufsz, void *val, int valsz); @@ -566,23 +566,23 @@ enum SA_SIZES_ENUM { }; typedef struct ib_sa_call { - uint attrid; - uint mod; + unsigned attrid; + unsigned mod; uint64_t mask; - uint method; + unsigned method; uint64_t trid; /* used for out mad if nonzero, return real val */ - uint recsz; /* return field */ + unsigned recsz; /* return field */ ib_rmpp_hdr_t rmpp; } ib_sa_call_t; typedef struct ib_vendor_call { - uint method; - uint mgmt_class; - uint attrid; - uint mod; + unsigned method; + unsigned mgmt_class; + unsigned attrid; + unsigned mod; uint32_t oui; - uint timeout; + unsigned timeout; ib_rmpp_hdr_t rmpp; } ib_vendor_call_t; @@ -740,14 +740,14 @@ void * mad_rpc_rmpp(void *ibmad_port, ib_rpc_t *rpc, ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, void *data); /* smp.c */ -uint8_t * smp_query(void *buf, ib_portid_t *id, uint attrid, uint mod, - uint timeout); -uint8_t * smp_set(void *buf, ib_portid_t *id, uint attrid, uint mod, - uint timeout); +uint8_t * smp_query(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod, + unsigned timeout); +uint8_t * smp_set(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod, + unsigned timeout); inline static uint8_t * -safe_smp_query(void *rcvbuf, ib_portid_t *portid, uint attrid, uint mod, - uint timeout) +safe_smp_query(void *rcvbuf, ib_portid_t *portid, unsigned attrid, unsigned mod, + unsigned timeout) { uint8_t *p; @@ -759,8 +759,8 @@ safe_smp_query(void *rcvbuf, ib_portid_t *portid, uint attrid, uint mod, } inline static uint8_t * -safe_smp_set(void *rcvbuf, ib_portid_t *portid, uint attrid, uint mod, - uint timeout) +safe_smp_set(void *rcvbuf, ib_portid_t *portid, unsigned attrid, unsigned mod, + unsigned timeout) { uint8_t *p; @@ -773,15 +773,15 @@ safe_smp_set(void *rcvbuf, ib_portid_t *portid, uint attrid, uint mod, /* sa.c */ uint8_t * sa_call(void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa, - uint timeout); + unsigned timeout); uint8_t * sa_rpc_call(void *ibmad_port, void *rcvbuf, ib_portid_t *portid, - ib_sa_call_t *sa, uint timeout); + ib_sa_call_t *sa, unsigned timeout); int ib_path_query(ib_gid_t srcgid, ib_gid_t destgid, ib_portid_t *sm_id, void *buf); /* returns lid */ inline static uint8_t * safe_sa_call(void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa, - uint timeout) + unsigned timeout) { uint8_t *p; @@ -802,19 +802,19 @@ int ib_resolve_self(ib_portid_t *portid, int *portnum, ib_gid_t *gid); /* gs.c */ uint8_t *perf_classportinfo_query(void *rcvbuf, ib_portid_t *dest, int port, - uint timeout); + unsigned timeout); uint8_t *port_performance_query(void *rcvbuf, ib_portid_t *dest, int port, - uint timeout); + unsigned timeout); uint8_t *port_performance_reset(void *rcvbuf, ib_portid_t *dest, int port, - uint mask, uint timeout); + unsigned mask, unsigned timeout); uint8_t *port_performance_ext_query(void *rcvbuf, ib_portid_t *dest, int port, - uint timeout); + unsigned timeout); uint8_t *port_performance_ext_reset(void *rcvbuf, ib_portid_t *dest, int port, - uint mask, uint timeout); + unsigned mask, unsigned timeout); uint8_t *port_samples_control_query(void *rcvbuf, ib_portid_t *dest, int port, - uint timeout); + unsigned timeout); uint8_t *port_samples_result_query(void *rcvbuf, ib_portid_t *dest, int port, - uint timeout); + unsigned timeout); /* dump.c */ ib_mad_dump_fn diff --git a/libibumad/include/infiniband/umad.h b/libibumad/include/infiniband/umad.h index 9020649..6149c8c 100644 --- a/libibumad/include/infiniband/umad.h +++ b/libibumad/include/infiniband/umad.h @@ -120,13 +120,13 @@ typedef struct ib_user_mad { typedef struct umad_port { char ca_name[UMAD_CA_NAME_LEN]; int portnum; - uint base_lid; - uint lmc; - uint sm_lid; - uint sm_sl; - uint state; - uint phys_state; - uint rate; + unsigned base_lid; + unsigned lmc; + unsigned sm_lid; + unsigned sm_sl; + unsigned state; + unsigned phys_state; + unsigned rate; uint64_t capmask; uint64_t gid_prefix; uint64_t port_guid; @@ -134,7 +134,7 @@ typedef struct umad_port { typedef struct umad_ca { char ca_name[UMAD_CA_NAME_LEN]; - uint node_type; + unsigned node_type; int numports; char fw_ver[20]; char ca_type[40]; -- MST From tziporet at dev.mellanox.co.il Tue Jun 26 04:00:24 2007 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 26 Jun 2007 14:00:24 +0300 Subject: [ofa-general] Re: development process post ofed-1.2 gold. In-Reply-To: <4680305D.9030701@opengridcomputing.com> References: <4680305D.9030701@opengridcomputing.com> Message-ID: <4680F1C8.3020207@mellanox.co.il> Steve Wise wrote: > Hey Tziporet, > > Is there any process for fixing bugs post OFED-1.2 "gold"? > > If I fix some bugs in the iw_cxgb3 driver, for example, should I post > the patches and ask that they be pulled into the ofed_1_2 repository? > > Or am I on my own? > > Thanks, > > > Steve. > My suggestion is that we keep the ofed_1_2 branch alive, thus new fixes should be applied to the repository. In this way we will be able to do a stable release when we decide. Another question is regarding the daily build - I don't think we need them any more. We can do a weekly build, or run build in case of need (new patches submitted). What other people think about this? Beside this I will open a support page for OFED 1.2 on the Wiki (as we have for OFED 1.1). In this page we will document known bugs and point to the patches that fix them. People can use the ofed_patch.sh script (part of the docs RPM) to add or remove patches. Tziporet From glebn at voltaire.com Tue Jun 26 04:13:42 2007 From: glebn at voltaire.com (Gleb Natapov) Date: Tue, 26 Jun 2007 14:13:42 +0300 Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: <20070626095125.GO15343@mellanox.co.il> References: <20070625130604.GH15343@mellanox.co.il> <20070626070641.GM15343@mellanox.co.il> <20070626083445.GB1164@minantech.com> <20070626095125.GO15343@mellanox.co.il> Message-ID: <20070626111342.GC1164@minantech.com> On Tue, Jun 26, 2007 at 12:51:25PM +0300, Michael S. Tsirkin wrote: > > Quoting Gleb Natapov : > > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects > > > > On Tue, Jun 26, 2007 at 10:06:41AM +0300, Michael S. Tsirkin wrote: > > > > Is there really a strong use case for making every type of object > > > > shareable? Can we handle the SRC stuff without going to this > > > > extreme of complexity? > > > > > > This is not directly related to SRC: this is an effort > > > to make it possible to share QPs, CQ etc across processes > > > in the same way as they can be currently shared across threads. > > > So assuming that we want multiple processes to post to > > > the same QP, how can we support this? > > > > Are you absolutely sure you even want to support this? > > Take a look here :) > http://www.quotedb.com/quotes/1007 So there is still a chance you'll reconsider :) > > > What is the user case? > > Use case? Scalability. Pls go over Dror's presentation given at Sonoma - > he calls this SSQ. As far as I can tell he is talking about HW supported solution and not half baked SW one. > > > If multiple processes what to post to the same QP how will you > > ensure that right process will receive right completion event? > > Same as with threads - memory for CQEs and locks will be allocated > in shared memory to make it possible for multiple processes to poll > CQ simultaneously, and they get completions in FCFS order. > What to do with them is up to the user. Are you going to use this API? How? There is no point in discussing user API without specifying HOW user will be using it. You have to ask what user want and design your API accordingly and not other way around. So suppose I want to use proposed API to implement super scalable MPI. I setup shared QP/CQ/... and each rank start to post into the QP and receive completion from CQ and suppose rank A picked completion that belongs to rank B so I will need to setup out of band channel to pass this completion from A to B. This is not looks good at all to me. -- Gleb. From mst at dev.mellanox.co.il Tue Jun 26 04:44:02 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Jun 2007 14:44:02 +0300 Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: <20070626111342.GC1164@minantech.com> References: <20070625130604.GH15343@mellanox.co.il> <20070626070641.GM15343@mellanox.co.il> <20070626083445.GB1164@minantech.com> <20070626095125.GO15343@mellanox.co.il> <20070626111342.GC1164@minantech.com> Message-ID: <20070626114402.GT15343@mellanox.co.il> > Quoting Gleb Natapov : > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects > > On Tue, Jun 26, 2007 at 12:51:25PM +0300, Michael S. Tsirkin wrote: > > > Quoting Gleb Natapov : > > > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects > > > > > > On Tue, Jun 26, 2007 at 10:06:41AM +0300, Michael S. Tsirkin wrote: > > > > > Is there really a strong use case for making every type of object > > > > > shareable? Can we handle the SRC stuff without going to this > > > > > extreme of complexity? > > > > > > > > This is not directly related to SRC: this is an effort > > > > to make it possible to share QPs, CQ etc across processes > > > > in the same way as they can be currently shared across threads. > > > > So assuming that we want multiple processes to post to > > > > the same QP, how can we support this? > > > > > > Are you absolutely sure you even want to support this? > > > > Take a look here :) > > http://www.quotedb.com/quotes/1007 > So there is still a chance you'll reconsider :) Sure, if someone comes up with a better way to improve scalability for single-threaded applications. > > > > > What is the user case? > > > > Use case? Scalability. Pls go over Dror's presentation given at Sonoma - > > he calls this SSQ. > > As far as I can tell he is talking about HW supported solution and not > half baked SW one. No, sharing a send queue must be done in software. I don't really see the reason for sarcasm: do you see value in sharing resources between multiple threads? Why not multiple processes? Some people just don't want to program in multithreaded environment. > > > > > If multiple processes what to post to the same QP how will you > > > ensure that right process will receive right completion event? > > > > Same as with threads - memory for CQEs and locks will be allocated > > in shared memory to make it possible for multiple processes to poll > > CQ simultaneously, and they get completions in FCFS order. > > What to do with them is up to the user. > > Are you going to use this API? How? There is no point in discussing user > API without specifying HOW user will be using it. You have to ask what > user want and design your API accordingly and not other way around. > So suppose I want to use proposed API to implement super scalable MPI. We'd come up with MPI_Send implementation inside libibverbs:). Think layered - I'd like to make a minimal possible API change to make scalability improvements possible. > I setup shared QP/CQ/... and each rank start to post into the QP and > receive completion from CQ and suppose rank A picked completion that > belongs to rank B so I will need to setup out of band channel to pass > this completion from A to B. This is not looks good at all to me. This is not different from multiple threads sharing a CQ, really - and we do support this already. In the part of the message that you have cut out, I showed some use cases that avoid this "side channel" (which could be just shared memory btw). -- MST From jackm at dev.mellanox.co.il Tue Jun 26 04:52:11 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 26 Jun 2007 14:52:11 +0300 Subject: [ofa-general] Re: [PATCH] libmlx4: fix adjustments for minimum qp capabilities in mlx4_create_qp In-Reply-To: References: <200706191647.41336.jackm@dev.mellanox.co.il> <200706240900.16563.jackm@dev.mellanox.co.il> Message-ID: <200706261452.12193.jackm@dev.mellanox.co.il> On Sunday 24 June 2007 16:43, Roland Dreier wrote: > > But the function hasn't looked like that for a few weeks now, since > commit e7d06519. > Oops, my mistake (missed that commit when cherrypicking. I'm now using your libmlx4 directly). - Jack From glebn at voltaire.com Tue Jun 26 05:25:39 2007 From: glebn at voltaire.com (Gleb Natapov) Date: Tue, 26 Jun 2007 15:25:39 +0300 Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: <20070626114402.GT15343@mellanox.co.il> References: <20070625130604.GH15343@mellanox.co.il> <20070626070641.GM15343@mellanox.co.il> <20070626083445.GB1164@minantech.com> <20070626095125.GO15343@mellanox.co.il> <20070626111342.GC1164@minantech.com> <20070626114402.GT15343@mellanox.co.il> Message-ID: <20070626122539.GF1164@minantech.com> On Tue, Jun 26, 2007 at 02:44:02PM +0300, Michael S. Tsirkin wrote: > > Quoting Gleb Natapov : > > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects > > > > On Tue, Jun 26, 2007 at 12:51:25PM +0300, Michael S. Tsirkin wrote: > > > > Quoting Gleb Natapov : > > > > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects > > > > > > > > On Tue, Jun 26, 2007 at 10:06:41AM +0300, Michael S. Tsirkin wrote: > > > > > > Is there really a strong use case for making every type of object > > > > > > shareable? Can we handle the SRC stuff without going to this > > > > > > extreme of complexity? > > > > > > > > > > This is not directly related to SRC: this is an effort > > > > > to make it possible to share QPs, CQ etc across processes > > > > > in the same way as they can be currently shared across threads. > > > > > So assuming that we want multiple processes to post to > > > > > the same QP, how can we support this? > > > > > > > > Are you absolutely sure you even want to support this? > > > > > > Take a look here :) > > > http://www.quotedb.com/quotes/1007 > > So there is still a chance you'll reconsider :) > > Sure, if someone comes up with a better way to improve scalability > for single-threaded applications. What good is a solution that no one will use? No solution is better then a bad one because this will motivate people to look for proper solution. > > > > > > > > What is the user case? > > > > > > Use case? Scalability. Pls go over Dror's presentation given at Sonoma - > > > he calls this SSQ. > > > > As far as I can tell he is talking about HW supported solution and not > > half baked SW one. > > No, sharing a send queue must be done in software. I don't really see the reason > for sarcasm: do you see value in sharing resources between multiple threads? > Why not multiple processes? Some people just don't want to program > in multithreaded environment. Yes I see the value in sharing resources between threads and processes if done right. This proposition is far from being right. There is not sarcasm in my sentence either. You can't claim that what you propose is as seamless as it should be. I have no problem with sharing send queue. What I want to be able to do is to attach CQ from each process to a shared QP. When send posted by process A completes the completion is posted into A's CQ. HW should be able to multiplex this IMO. > > > > > > > > If multiple processes what to post to the same QP how will you > > > > ensure that right process will receive right completion event? > > > > > > Same as with threads - memory for CQEs and locks will be allocated > > > in shared memory to make it possible for multiple processes to poll > > > CQ simultaneously, and they get completions in FCFS order. > > > What to do with them is up to the user. > > > > Are you going to use this API? How? There is no point in discussing user > > API without specifying HOW user will be using it. You have to ask what > > user want and design your API accordingly and not other way around. > > So suppose I want to use proposed API to implement super scalable MPI. > > We'd come up with MPI_Send implementation inside libibverbs:). Think layered - I'd > like to make a minimal possible API change to make scalability improvements > possible. They are not really possible with proposed API (beyond academic papers that is). You are welcome to implement MPI_Send inside libibverbs. After all this is what Myricom did. > > > I setup shared QP/CQ/... and each rank start to post into the QP and > > receive completion from CQ and suppose rank A picked completion that > > belongs to rank B so I will need to setup out of band channel to pass > > this completion from A to B. This is not looks good at all to me. > > This is not different from multiple threads sharing a CQ, really - and we do This is very different from multiple threads sharing a CQ. In multi threaded scenario I can design my program in a way that each thread will be able to handle completion. We'll have to pass completion between processes in the scenario you propose. > support this already. In the part of the message that you have cut out, I > showed some use cases that avoid this "side channel" What? RDMA? What about a completion of RDMA operation? You'll have to pass it around. I agree that RDMA situation is much better then send/receive one, but there is no RDMAs without send/recv after it. > (which could be just shared memory btw). > And you introduce another scalability problem here. On a big SMP node will have to create channel between each pair of processes to pass completions and will have to poll each one of them besides polling CQ. Here goes you latency. And I am not saying this is not possible, I am saying it is so bad that it is not worth doing. -- Gleb. From mst at dev.mellanox.co.il Tue Jun 26 05:58:02 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Jun 2007 15:58:02 +0300 Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: <20070626122539.GF1164@minantech.com> References: <20070625130604.GH15343@mellanox.co.il> <20070626070641.GM15343@mellanox.co.il> <20070626083445.GB1164@minantech.com> <20070626095125.GO15343@mellanox.co.il> <20070626111342.GC1164@minantech.com> <20070626114402.GT15343@mellanox.co.il> <20070626122539.GF1164@minantech.com> Message-ID: <20070626125802.GU15343@mellanox.co.il> > > No, sharing a send queue must be done in software. I don't really see the reason > > for sarcasm: do you see value in sharing resources between multiple threads? > > Why not multiple processes? Some people just don't want to program > > in multithreaded environment. > > Yes I see the value in sharing resources between threads and processes > if done right. This proposition is far from being right. Ahem, *what* are you talking about? Sharing resources between threads was supported in libibverbs 1.0, *right from the start*. This is still the case with 1.1, and this API matches verbs quite closely which means that it can work pretty much on any hardware. You want to propose some enhancements, go ahead (and open a new thread for this). All *I* want to do is support sharing resources in singlethreaded environment. > There is not sarcasm in my sentence either. You can't claim that what you > propose is as seamless as it should be. I think it's as seamless as it *can* be. > I have no problem with sharing send queue. What I want to be able to do > is to attach CQ from each process to a shared QP. When send posted by > process A completes the completion is posted into A's CQ. HW should be > able to multiplex this IMO. Well, since there is no hardware that does this, why bother discussing this? > > > > > If multiple processes what to post to the same QP how will you > > > > > ensure that right process will receive right completion event? > > > > > > > > Same as with threads - memory for CQEs and locks will be allocated > > > > in shared memory to make it possible for multiple processes to poll > > > > CQ simultaneously, and they get completions in FCFS order. > > > > What to do with them is up to the user. > > > > > > Are you going to use this API? How? There is no point in discussing user > > > API without specifying HOW user will be using it. You have to ask what > > > user want and design your API accordingly and not other way around. > > > So suppose I want to use proposed API to implement super scalable MPI. > > > > We'd come up with MPI_Send implementation inside libibverbs:). Think layered - I'd > > like to make a minimal possible API change to make scalability improvements > > possible. > > They are not really possible with proposed API (beyond academic papers that is). I'm talking to MPI guys here, too, so I don't think there's real danger that the final API will be useless for them. > You are > welcome to implement MPI_Send inside libibverbs. After all this is what Myricom did. I think keeping a general verbs layer is a better approach for now. > > > > > I setup shared QP/CQ/... and each rank start to post into the QP and > > > receive completion from CQ and suppose rank A picked completion that > > > belongs to rank B so I will need to setup out of band channel to pass > > > this completion from A to B. This is not looks good at all to me. > > > > This is not different from multiple threads sharing a CQ, really - and we do > This is very different from multiple threads sharing a CQ. In > multi threaded scenario I can design my program in a way that each > thread will be able to handle completion. We'll have to pass > completion between processes in the scenario you propose. > > > support this already. In the part of the message that you have cut out, I > > showed some use cases that avoid this "side channel" > > What? RDMA? RDMA and SRC. > What about a completion of RDMA operation? You'll have to > pass it around. Since all it does it free up the buffers, it's quite possible that processing of send completions can be done by any process. This really depends on how the application wants to do this: again, you seem to ignore the fact that the issue is the same for multithreaded programs, and they seem to cope fine. > I agree that RDMA situation is much better then > send/receive one, but there is no RDMAs without send/recv after it. Not really - polling on data has been used in MPI for ages now. With SRC you can have separate completions on the receive side. > > (which could be just shared memory btw). > > And you introduce another scalability problem here. On a big SMP node > will have to create channel between each pair of processes to pass > completions and will have to poll each one of them besides polling CQ. > Here goes you latency. And I am not saying this is not possible, I am > saying it is so bad that it is not worth doing. No, you got that wrong: there need not be any real "channels" with shared memory: just a single data structure shared by all processes woul do. But again, you are getting into MPI design, which is the wrong layer to discuss here. -- MST From halr at voltaire.com Tue Jun 26 06:04:10 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Jun 2007 09:04:10 -0400 Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement In-Reply-To: <20070626102045.GS15343@mellanox.co.il> References: <20070626102045.GS15343@mellanox.co.il> Message-ID: <1182862966.10379.425353.camel@hal.voltaire.com> On Tue, 2007-06-26 at 06:20, Michael S. Tsirkin wrote: > Some management headers use uint type which (on my system) What's your system ? > is described as "old > compatibility name for C type". This type might not defined e.g. if > __STRICT_ANSI__ is set, Is strict ANSI a requirement ? > so it is best to avoid its usage at least in headers. > Replace by unsigned in all headers. > > Signed-off-by: Michael S. Tsirkin > > --- > > Hal can you apply this please? As a separate question: > I didn't go over .c files (we don't build them with strict ansi now), > but maybe removing uint there is a good idea, too? Yes but it will take more than this to make them strict ANSI. Is this as an OFED 1.2 follow on or just for master ? -- Hal > diff --git a/libibcommon/include/infiniband/common.h b/libibcommon/include/infiniband/common.h > index 4c90955..80bfe1b 100644 > --- a/libibcommon/include/infiniband/common.h > +++ b/libibcommon/include/infiniband/common.h > @@ -131,7 +131,7 @@ int sys_read_string(char *dir_name, char *file_name, char *str, int max_len); > int sys_read_guid(char *dir_name, char *file_name, uint64_t *net_guid); > int sys_read_gid(char *dir_name, char *file_name, uint8_t *gid); > int sys_read_uint64(char *dir_name, char *file_name, uint64_t *u); > -int sys_read_uint(char *dir_name, char *file_name, uint *u); > +int sys_read_uint(char *dir_name, char *file_name, unsigned *u); > > /* stack.c */ > void stack_dump(void); > diff --git a/libibmad/include/infiniband/mad.h b/libibmad/include/infiniband/mad.h > index a349e0f..ae847c9 100644 > --- a/libibmad/include/infiniband/mad.h > +++ b/libibmad/include/infiniband/mad.h > @@ -166,8 +166,8 @@ typedef struct { > } ib_dr_path_t; > > typedef struct { > - uint id; > - uint mod; > + unsigned id; > + unsigned mod; > } ib_attr_t; > > typedef struct { > @@ -180,7 +180,7 @@ typedef struct { > uint64_t mkey; > uint64_t trid; /* used for out mad if nonzero, return real val */ > uint64_t mask; /* for sa mads */ > - uint recsz; /* for sa mads (attribute offset) */ > + unsigned recsz; /* for sa mads (attribute offset) */ > int timeout; > uint32_t oui; /* for vendor range 2 mads */ > } ib_rpc_t; > @@ -193,7 +193,7 @@ typedef struct portid { > uint32_t qp; > uint32_t qkey; > uint8_t sl; > - uint pkey_idx; > + unsigned pkey_idx; > } ib_portid_t; > > typedef void (ib_mad_dump_fn)(char *buf, int bufsz, void *val, int valsz); > @@ -566,23 +566,23 @@ enum SA_SIZES_ENUM { > }; > > typedef struct ib_sa_call { > - uint attrid; > - uint mod; > + unsigned attrid; > + unsigned mod; > uint64_t mask; > - uint method; > + unsigned method; > > uint64_t trid; /* used for out mad if nonzero, return real val */ > - uint recsz; /* return field */ > + unsigned recsz; /* return field */ > ib_rmpp_hdr_t rmpp; > } ib_sa_call_t; > > typedef struct ib_vendor_call { > - uint method; > - uint mgmt_class; > - uint attrid; > - uint mod; > + unsigned method; > + unsigned mgmt_class; > + unsigned attrid; > + unsigned mod; > uint32_t oui; > - uint timeout; > + unsigned timeout; > ib_rmpp_hdr_t rmpp; > } ib_vendor_call_t; > > @@ -740,14 +740,14 @@ void * mad_rpc_rmpp(void *ibmad_port, ib_rpc_t *rpc, ib_portid_t *dport, > ib_rmpp_hdr_t *rmpp, void *data); > > /* smp.c */ > -uint8_t * smp_query(void *buf, ib_portid_t *id, uint attrid, uint mod, > - uint timeout); > -uint8_t * smp_set(void *buf, ib_portid_t *id, uint attrid, uint mod, > - uint timeout); > +uint8_t * smp_query(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod, > + unsigned timeout); > +uint8_t * smp_set(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod, > + unsigned timeout); > > inline static uint8_t * > -safe_smp_query(void *rcvbuf, ib_portid_t *portid, uint attrid, uint mod, > - uint timeout) > +safe_smp_query(void *rcvbuf, ib_portid_t *portid, unsigned attrid, unsigned mod, > + unsigned timeout) > { > uint8_t *p; > > @@ -759,8 +759,8 @@ safe_smp_query(void *rcvbuf, ib_portid_t *portid, uint attrid, uint mod, > } > > inline static uint8_t * > -safe_smp_set(void *rcvbuf, ib_portid_t *portid, uint attrid, uint mod, > - uint timeout) > +safe_smp_set(void *rcvbuf, ib_portid_t *portid, unsigned attrid, unsigned mod, > + unsigned timeout) > { > uint8_t *p; > > @@ -773,15 +773,15 @@ safe_smp_set(void *rcvbuf, ib_portid_t *portid, uint attrid, uint mod, > > /* sa.c */ > uint8_t * sa_call(void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa, > - uint timeout); > + unsigned timeout); > uint8_t * sa_rpc_call(void *ibmad_port, void *rcvbuf, ib_portid_t *portid, > - ib_sa_call_t *sa, uint timeout); > + ib_sa_call_t *sa, unsigned timeout); > int ib_path_query(ib_gid_t srcgid, ib_gid_t destgid, ib_portid_t *sm_id, > void *buf); /* returns lid */ > > inline static uint8_t * > safe_sa_call(void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa, > - uint timeout) > + unsigned timeout) > { > uint8_t *p; > > @@ -802,19 +802,19 @@ int ib_resolve_self(ib_portid_t *portid, int *portnum, ib_gid_t *gid); > > /* gs.c */ > uint8_t *perf_classportinfo_query(void *rcvbuf, ib_portid_t *dest, int port, > - uint timeout); > + unsigned timeout); > uint8_t *port_performance_query(void *rcvbuf, ib_portid_t *dest, int port, > - uint timeout); > + unsigned timeout); > uint8_t *port_performance_reset(void *rcvbuf, ib_portid_t *dest, int port, > - uint mask, uint timeout); > + unsigned mask, unsigned timeout); > uint8_t *port_performance_ext_query(void *rcvbuf, ib_portid_t *dest, int port, > - uint timeout); > + unsigned timeout); > uint8_t *port_performance_ext_reset(void *rcvbuf, ib_portid_t *dest, int port, > - uint mask, uint timeout); > + unsigned mask, unsigned timeout); > uint8_t *port_samples_control_query(void *rcvbuf, ib_portid_t *dest, int port, > - uint timeout); > + unsigned timeout); > uint8_t *port_samples_result_query(void *rcvbuf, ib_portid_t *dest, int port, > - uint timeout); > + unsigned timeout); > > /* dump.c */ > ib_mad_dump_fn > diff --git a/libibumad/include/infiniband/umad.h b/libibumad/include/infiniband/umad.h > index 9020649..6149c8c 100644 > --- a/libibumad/include/infiniband/umad.h > +++ b/libibumad/include/infiniband/umad.h > @@ -120,13 +120,13 @@ typedef struct ib_user_mad { > typedef struct umad_port { > char ca_name[UMAD_CA_NAME_LEN]; > int portnum; > - uint base_lid; > - uint lmc; > - uint sm_lid; > - uint sm_sl; > - uint state; > - uint phys_state; > - uint rate; > + unsigned base_lid; > + unsigned lmc; > + unsigned sm_lid; > + unsigned sm_sl; > + unsigned state; > + unsigned phys_state; > + unsigned rate; > uint64_t capmask; > uint64_t gid_prefix; > uint64_t port_guid; > @@ -134,7 +134,7 @@ typedef struct umad_port { > > typedef struct umad_ca { > char ca_name[UMAD_CA_NAME_LEN]; > - uint node_type; > + unsigned node_type; > int numports; > char fw_ver[20]; > char ca_type[40]; > From mst at dev.mellanox.co.il Tue Jun 26 06:24:57 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Jun 2007 16:24:57 +0300 Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement In-Reply-To: <1182862966.10379.425353.camel@hal.voltaire.com> References: <20070626102045.GS15343@mellanox.co.il> <1182862966.10379.425353.camel@hal.voltaire.com> Message-ID: <20070626132457.GA29602@mellanox.co.il> > Quoting Hal Rosenstock : > Subject: Re: [PATCH] management: uint -> unsigned replacement > > On Tue, 2007-06-26 at 06:20, Michael S. Tsirkin wrote: > > Some management headers use uint type which (on my system) > > What's your system ? SLES10. > > is described as "old > > compatibility name for C type". This type might not defined e.g. if > > __STRICT_ANSI__ is set, > > Is strict ANSI a requirement ? Not sure. The app in question does #define _XOPEN_SOURCE 600 > > so it is best to avoid its usage at least in headers. > > Replace by unsigned in all headers. > > > > Signed-off-by: Michael S. Tsirkin > > > > --- > > > > Hal can you apply this please? As a separate question: > > I didn't go over .c files (we don't build them with strict ansi now), > > but maybe removing uint there is a good idea, too? > > Yes but it will take more than this to make them strict ANSI. > > Is this as an OFED 1.2 follow on or just for master ? You decide. -- MST From glebn at voltaire.com Tue Jun 26 06:33:17 2007 From: glebn at voltaire.com (Gleb Natapov) Date: Tue, 26 Jun 2007 16:33:17 +0300 Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: <20070626125802.GU15343@mellanox.co.il> References: <20070625130604.GH15343@mellanox.co.il> <20070626070641.GM15343@mellanox.co.il> <20070626083445.GB1164@minantech.com> <20070626095125.GO15343@mellanox.co.il> <20070626111342.GC1164@minantech.com> <20070626114402.GT15343@mellanox.co.il> <20070626122539.GF1164@minantech.com> <20070626125802.GU15343@mellanox.co.il> Message-ID: <20070626133317.GH1164@minantech.com> On Tue, Jun 26, 2007 at 03:58:02PM +0300, Michael S. Tsirkin wrote: > > > No, sharing a send queue must be done in software. I don't really see the reason > > > for sarcasm: do you see value in sharing resources between multiple threads? > > > Why not multiple processes? Some people just don't want to program > > > in multithreaded environment. > > > > Yes I see the value in sharing resources between threads and processes > > if done right. This proposition is far from being right. > > Ahem, *what* are you talking about? Sharing resources between threads was supported in > libibverbs 1.0, *right from the start*. This is still the case with 1.1, and this API > matches verbs quite closely which means that it can work pretty much on any > hardware. Why do you think that I have a problem with multithreaded application is beyond my understanding. I have a problem with you thinking that peaking a completion by random process in FCFS order is a good idea. It has limited use for specially designed application. MPI is not one of them. > > You want to propose some enhancements, go ahead (and open a new thread for this). > All *I* want to do is support sharing resources in singlethreaded environment. > You asked for RFC? Don't do it next time if you don't want to hear any. > > There is not sarcasm in my sentence either. You can't claim that what you > > propose is as seamless as it should be. > > I think it's as seamless as it *can* be. If it can't be better it is not worth to be implemented. This my opinion. I can stop you from doing it :) > > > I have no problem with sharing send queue. What I want to be able to do > > is to attach CQ from each process to a shared QP. When send posted by > > process A completes the completion is posted into A's CQ. HW should be > > able to multiplex this IMO. > > Well, since there is no hardware that does this, why bother discussing this? Because Mellanox is a hardware company, so do improvements in the right place and don't add craft to library just to claim that you are super scalable. If it can't be implemented in HW then can you explain why please? > > > > > > > If multiple processes what to post to the same QP how will you > > > > > > ensure that right process will receive right completion event? > > > > > > > > > > Same as with threads - memory for CQEs and locks will be allocated > > > > > in shared memory to make it possible for multiple processes to poll > > > > > CQ simultaneously, and they get completions in FCFS order. > > > > > What to do with them is up to the user. > > > > > > > > Are you going to use this API? How? There is no point in discussing user > > > > API without specifying HOW user will be using it. You have to ask what > > > > user want and design your API accordingly and not other way around. > > > > So suppose I want to use proposed API to implement super scalable MPI. > > > > > > We'd come up with MPI_Send implementation inside libibverbs:). Think layered - I'd > > > like to make a minimal possible API change to make scalability improvements > > > possible. > > > > They are not really possible with proposed API (beyond academic papers that is). > > I'm talking to MPI guys here, too, so I don't think there's real danger > that the final API will be useless for them. So let them talk and specify here how they are gonna use it and we will have good use case for your design. > > > You are > > welcome to implement MPI_Send inside libibverbs. After all this is what Myricom did. > > I think keeping a general verbs layer is a better approach for now. Then don't propose something you are not going to implement. > > > > > > > > I setup shared QP/CQ/... and each rank start to post into the QP and > > > > receive completion from CQ and suppose rank A picked completion that > > > > belongs to rank B so I will need to setup out of band channel to pass > > > > this completion from A to B. This is not looks good at all to me. > > > > > > This is not different from multiple threads sharing a CQ, really - and we do > > This is very different from multiple threads sharing a CQ. In > > multi threaded scenario I can design my program in a way that each > > thread will be able to handle completion. We'll have to pass > > completion between processes in the scenario you propose. > > > > > support this already. In the part of the message that you have cut out, I > > > showed some use cases that avoid this "side channel" > > > > What? RDMA? > > RDMA and SRC. > > > What about a completion of RDMA operation? You'll have to > > pass it around. > > Since all it does it free up the buffers, it's quite possible > that processing of send completions can be done by any process. No it can't in case of MPI. MPI also progress user request on the event. Yes, you can design program where it will be possible, but not MPI. > This really depends on how the application wants to do this: > again, you seem to ignore the fact that the issue is the same for > multithreaded programs, and they seem to cope fine. No you sees to ignore the fact that multithreaded program is something _completely_ different. In multithreaded program _all_ state is shared between processes. In multiprocess scenario only a state you place into shared memory is shared. This difference is very important. > > > I agree that RDMA situation is much better then > > send/receive one, but there is no RDMAs without send/recv after it. > > Not really - polling on data has been used in MPI for ages now. You are greatly misinformed. Polling on data used only for limited number of peers for sending small messages and works only on Mellanox HCA on _some_ archs and greatly non-scalable in memory consumption and polling time. Go ask your MPI team. > With SRC you can have separate completions on the receive side. > > > > (which could be just shared memory btw). > > > > And you introduce another scalability problem here. On a big SMP node > > will have to create channel between each pair of processes to pass > > completions and will have to poll each one of them besides polling CQ. > > Here goes you latency. And I am not saying this is not possible, I am > > saying it is so bad that it is not worth doing. > > No, you got that wrong: there need not be any real "channels" with shared > memory: just a single data structure shared by all processes woul do. > But again, you are getting into MPI design, which is the wrong layer to discuss here. > I am talking about only application this is meant to be used by (in short term anyway). So if the design is bad for MPI it is bad. About "channels" you either create one between each pair of ranks or you use locking. Both solutions kills latency. -- Gleb. From mst at dev.mellanox.co.il Tue Jun 26 07:02:39 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Jun 2007 17:02:39 +0300 Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: <20070626133317.GH1164@minantech.com> References: <20070625130604.GH15343@mellanox.co.il> <20070626070641.GM15343@mellanox.co.il> <20070626083445.GB1164@minantech.com> <20070626095125.GO15343@mellanox.co.il> <20070626111342.GC1164@minantech.com> <20070626114402.GT15343@mellanox.co.il> <20070626122539.GF1164@minantech.com> <20070626125802.GU15343@mellanox.co.il> <20070626133317.GH1164@minantech.com> Message-ID: <20070626140239.GB29602@mellanox.co.il> > Quoting Gleb Natapov : > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects > > On Tue, Jun 26, 2007 at 03:58:02PM +0300, Michael S. Tsirkin wrote: > > > > No, sharing a send queue must be done in software. I don't really see the reason > > > > for sarcasm: do you see value in sharing resources between multiple threads? > > > > Why not multiple processes? Some people just don't want to program > > > > in multithreaded environment. > > > > > > Yes I see the value in sharing resources between threads and processes > > > if done right. This proposition is far from being right. > > > > Ahem, *what* are you talking about? Sharing resources between threads was supported in > > libibverbs 1.0, *right from the start*. This is still the case with 1.1, and this API > > matches verbs quite closely which means that it can work pretty much on any > > hardware. > > Why do you think that I have a problem with multithreaded application is > beyond my understanding. I have a problem with you thinking that peaking a > completion by random process in FCFS order is a good idea. Should that have been "picking"? I keep telling you. With multithreaded applications *that's what currently happens*. If multiple threads poll a CQ, which one gets which completion is currently unspecified. Are you worried about this? If not, why are you worried when multiple processes do this? Look here, hardware features do *not* just materialize when you build an API for them. What good would a pretty API that no hardware supports be? It's the other way around: I'm trying to extend our API to improve scalability with existing hardware. -- MST From mhanafi at csc.com Tue Jun 26 07:12:27 2007 From: mhanafi at csc.com (Mahmoud Hanafi) Date: Tue, 26 Jun 2007 10:12:27 -0400 Subject: [ofa-general] low performance with multiple LUNs on a single port with ib_srp In-Reply-To: <537C6C0940C6C143AA46A88946B8541708BB417C@ORNLEXCHANGE.ornl.gov> Message-ID: Here are some performance results that I was able to achieve running across several Luns. Config setting 1 host port to 1 ddn port. OFED1.2rc6 [root at io1 IB]# cat /etc/modprobe.conf alias scsi_hostadapter qla2xxx alias scsi_hostadapter1 megaraid_sas alias scsi_hostadapter2 qla2400 alias usb-controller ehci-hcd alias usb-controller1 uhci-hcd alias ib0 ib_ipoib alias ib1 ib_ipoib alias net-pf-27 ib_sdp alias lustre llite options lnet networks=o2ib alias eth1 bnx2 alias eth0 bnx2 options ib_srp srp_sg_tablesize=256 [root at io1 IB]# cat /etc/srp_daemon.conf a max_sect=8192,max_cmd_per_lun=3 Write (MB/sec) Number of LUNS "Rec Length (KB)" 1 2 3 4 5 6 7 16 23 37 41 47 51 55 57 32 44 72 82 94 102 109 114 64 79 136 163 187 201 215 226 128 131 247 310 352 380 405 426 256 194 363 477 549 616 673 698 512 299 558 553 670 717 725 727 1,024 434 591 718 725 725 725 726 2,048 465 608 687 723 725 726 726 4,096 523 695 722 726 727 727 726 8,192 537 702 726 727 728 726 727 Read (MB/sec) Number of LUNS "Rec Length (KB)" 1 2 3 4 5 6 7 16 26 41 45 56 60 63 62 32 48 78 97 107 117 122 124 64 81 140 172 196 215 227 237 128 126 207 269 314 347 373 391 256 174 271 389 482 500 537 546 512 255 375 418 478 528 556 562 1,024 330 430 505 554 564 564 564 2,048 326 445 527 553 561 563 564 4,096 357 513 556 562 564 564 565 8,192 360 520 558 564 565 565 565 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- "Canon, Richard Shane" Sent by: general-bounces at lists.openfabrics.org 06/25/2007 05:51 PM To Vu Pham cc general at lists.openfabrics.org Subject [ofa-general] low performance with multiple LUNs on a single port with ib_srp Greetings, Hopefully the subject says it all… I’ve stumbled on a performance issue with the OFED ib_srp driver. Here is the configuration. I am testing with a DDN 9550 and a single host system. The systems are connected by two SDR links. On the host side there is a dual port (DDR) card. On the DDN side, both lines go into a single singlet (even though it is a couplet). The lines go into two distinct cards on the DDN side (if you are familiar with the layout). The testing used OFED 1.2. Now for the tests… If I run a single stream test I’m seeing good result with over 700 MB/s. These tests are run using sg_dd with the directio flag. If I run two concurrent streams against two LUNs that are each presented over a single port on the DDN (and therefore accessed by a single port on the host side), the aggregate performance drop to around 120 MB/s (60 MB/s per stream). Just to confirm it isn’t a problem on the DDN side, I repeated these tests with the IBGD driver. There I consistently saw about 600-650 MB/s on the port regardless of the number of LUNs I tested with. Any ideas on what the problem is? Also, if this doesn’t make sense, let me know and I will try to clarify further. Thanks, --Shane Canon -- R. Shane Canon National Center for Computational Science Oak Ridge National Laboratory canonrs at ornl.gov _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From glebn at voltaire.com Tue Jun 26 07:13:49 2007 From: glebn at voltaire.com (Gleb Natapov) Date: Tue, 26 Jun 2007 17:13:49 +0300 Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: <20070626140239.GB29602@mellanox.co.il> References: <20070626070641.GM15343@mellanox.co.il> <20070626083445.GB1164@minantech.com> <20070626095125.GO15343@mellanox.co.il> <20070626111342.GC1164@minantech.com> <20070626114402.GT15343@mellanox.co.il> <20070626122539.GF1164@minantech.com> <20070626125802.GU15343@mellanox.co.il> <20070626133317.GH1164@minantech.com> <20070626140239.GB29602@mellanox.co.il> Message-ID: <20070626141349.GJ1164@minantech.com> On Tue, Jun 26, 2007 at 05:02:39PM +0300, Michael S. Tsirkin wrote: > > Quoting Gleb Natapov : > > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects > > > > On Tue, Jun 26, 2007 at 03:58:02PM +0300, Michael S. Tsirkin wrote: > > > > > No, sharing a send queue must be done in software. I don't really see the reason > > > > > for sarcasm: do you see value in sharing resources between multiple threads? > > > > > Why not multiple processes? Some people just don't want to program > > > > > in multithreaded environment. > > > > > > > > Yes I see the value in sharing resources between threads and processes > > > > if done right. This proposition is far from being right. > > > > > > Ahem, *what* are you talking about? Sharing resources between threads was supported in > > > libibverbs 1.0, *right from the start*. This is still the case with 1.1, and this API > > > matches verbs quite closely which means that it can work pretty much on any > > > hardware. > > > > Why do you think that I have a problem with multithreaded application is > > beyond my understanding. I have a problem with you thinking that peaking a > > completion by random process in FCFS order is a good idea. > > Should that have been "picking"? I keep telling you. With multithreaded Yes "picking". Sorry :) > applications *that's what currently happens*. If multiple threads poll a CQ, > which one gets which completion is currently unspecified. Are you > worried about this? If not, why are you worried when multiple > processes do this? You've missed my sentence about difference between multithreaded application and what you propose. The difference is HUGE (I can't write bigger letters sorry about that). I can design a multithreaded MPI so that each thread will be capable to progress MPI send/recv request (and then I don't care what thread gets which completion. I can't do it with multiprocess scenario. > > Look here, hardware features do *not* just materialize when you build an API for > them. What good would a pretty API that no hardware supports be? It's the > other way around: I'm trying to extend our API to improve scalability with > existing hardware. > Then this API will stick forever. And HW implementation will have different API anyway. And that what I am trying to point. I don't thing Mellanox implemented SRQ API before it was available in HW. If Mellanox think this is such a great idea (and it is) why not put implementation where it belongs (in HW that is). -- Gleb. From tziporet at mellanox.co.il Tue Jun 26 07:27:21 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 26 Jun 2007 17:27:21 +0300 Subject: [ofa-general] Toward next OFED release (1.3) Message-ID: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> Hi All, On next Monday we will have the first meeting to close OFED 1.3 features and schedule. As a preparation I send here the list of features we already reviewed in Sonoma, and other features I see in progress on the general list discussions. I know this is a long mail :-( but I ask each of the maintainers/customers to review this list and send comments and other requests. There are some ULPs that I placed "?" and the owner should review and reply with the plans. Thanks, Tziporet Main New Features ============== Base kernel: 2.6.23 (we will start with 2.6.22 but will move to 2.6.23) Install: * Minimize integration effort into OS distribution * Break the packages RPMs (work with Novell and Redhat) Package: * Sources arrangement for the end user (for the labs) * Reduce compilation warnings QoS: * OSM * CM & CMA * ULPs: SDP, SRP, IPoIB, RDS? Core: * Updated SA cache * User space events registration * Preparations for IB routers libibverbs: * New verbs: * Scalable Reliable Connected Transport (with Mellanox ConnectX) * Shared Send Queue * Reliable Multicast ? Management: * Multiple partitions * OpenSM * More routing performance improvements * Even more speedups * Better packaging/installation "Native" daemon mode * Performance management * Quality of Service manager: Based on IBTA annex * More diagnostics - Hal please update ULPs: * IPoIB: NAPI; CM in GA; Bonding in GA * NFS over RDMA integration * RDS: RDMA API (using FMRs); GA quality with Oracle 11 * SDP: Keepalive; Asynch IO (Zero Copy) * SRP: HA in GA * VNIC: ? Qlogic - please update * iSER: ? Voltaire - please update * uDAPL - ? Arlin please update iWARP: (Steve please update if needed) * iwarp-specific verbs * iwarp-specific async events * API for MPA options (CRC/Markers) * API for streaming mode IO (needed for compliant iSER) * Possibly other ULPs (RDS, SDP, iSER) MPIs: Integrate the new MPI releases that are on time for OFED 1.3 * Jeff - please update about Open MPI * DK: Please update regarding MVAPICH and MVAPICH2 OFED 1.3 System Matrix * CPU Arch: X86, x86_64, PPC64, ia64 * kernel.org: kernel 2.6.23 * Novell: SLES 10; SLES 10 SP1 * Redhat: RHEL 4 (up4 and up5); RHEL 5 (can we drop RHEL4up4 since up6 will probably be out till this release is out?) * Free distros (Fedora, SuSE Pro, Ubuntu) - basic testing only Tziporet Koren Software Director Mellanox Technologies mailto: tziporet at mellanox.co.il Tel +972-4-9097200, ext 380 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at dev.mellanox.co.il Tue Jun 26 07:26:43 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Jun 2007 17:26:43 +0300 Subject: [ofa-general] bug 667 Message-ID: <20070626142643.GC29602@mellanox.co.il> Sean, could you look at bug 667 please? rping seems to be crashing after connect error. -- MST From mst at dev.mellanox.co.il Tue Jun 26 07:37:36 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Jun 2007 17:37:36 +0300 Subject: [ofa-general] [Bug 662] In-Reply-To: <20070626142643.GC29602@mellanox.co.il> References: <20070626142643.GC29602@mellanox.co.il> Message-ID: <20070626143735.GD29602@mellanox.co.il> > Quoting Michael S. Tsirkin : > Subject: bug 667 > > Sean, could you look at bug 667 please? > rping seems to be crashing after connect error. Here's a backtrace from the core dump. # rping -c -d -a 11.4.3.174 ipaddr (11.4.3.174) libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. created cm_id 0x505f10 cma_event type 1 cma_id 0x505f10 (parent) cma event 1, error -110 waiting for addr/route resolution state 1 Segmentation fault (core dumped) # gdb `which rping` GNU gdb 6.4 Copyright 2005 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-suse-linux"...Using host libthread_db library "/lib64/libthread_db.so.1". (gdb) core core.29968 Core was generated by `rping -c -d -a 11.4.3.174'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/local/ofed/lib64/librdmacm.so.1...done. Loaded symbols for /usr/local/ofed/lib64/librdmacm.so.1 Reading symbols from /usr/local/ofed/lib64/libibverbs.so.1...done. Loaded symbols for /usr/local/ofed/lib64/libibverbs.so.1 Reading symbols from /lib64/libpthread.so.0...done. Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libdl.so.2...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /usr/local/ofed/lib64/libcxgb3-rdmav2.so...done. Loaded symbols for /usr/local/ofed/lib64/libcxgb3-rdmav2.so Reading symbols from /usr/local/ofed/lib64/libmthca-rdmav2.so...done. Loaded symbols for /usr/local/ofed/lib64/libmthca-rdmav2.so #0 __ibv_alloc_pd (context=0x0) at src/verbs.c:143 143 pd = context->ops.alloc_pd(context); (gdb) where #0 __ibv_alloc_pd (context=0x0) at src/verbs.c:143 #1 0x00000000004015e6 in rping_setup_qp (cb=0x505010, cm_id=0x505f10) at examples/rping.c:514 #2 0x000000000040270b in main (argc=5, argv=0x7fffe0117238) at examples/rping.c:936 (gdb) frame 1 #1 0x00000000004015e6 in rping_setup_qp (cb=0x505010, cm_id=0x505f10) at examples/rping.c:514 514 cb->pd = ibv_alloc_pd(cm_id->verbs); (gdb) p cm_id->verbs $1 = (struct ibv_context *) 0x0 (gdb) p (struct cma_id_private *)cm_id $2 = (struct cma_id_private *) 0x505f10 (gdb) p *$2 $3 = {id = {verbs = 0x0, channel = 0x505ef0, context = 0x505010, qp = 0x0, route = { addr = {src_addr = {sa_family = 0, sa_data = '\0' }, src_pad = '\0' , dst_addr = {sa_family = 2, sa_data = "\000\000\v\004\003�\000\000\000\000\000\000\000"}, dst_pad = '\0' , addr = {ibaddr = {sgid = { raw = '\0' , global = {subnet_prefix = 0, interface_id = 0}}, dgid = {raw = '\0' , global = { subnet_prefix = 0, interface_id = 0}}, pkey = 0}}}, path_rec = 0x0, num_paths = 0}, ps = RDMA_PS_TCP, port_num = 0 '\0'}, cma_dev = 0x0, events_completed = 0, connect_error = 0, cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\0' , __align = 0}, mut = { __data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\0' , __align = 0}, handle = 0, mc_list = 0x0} (gdb) where #0 __ibv_alloc_pd (context=0x0) at src/verbs.c:143 #1 0x00000000004015e6 in rping_setup_qp (cb=0x505010, cm_id=0x505f10) at examples/rping.c:514 #2 0x000000000040270b in main (argc=5, argv=0x7fffe0117238) at examples/rping.c:936 (gdb) -- MST From canonrs at ornl.gov Tue Jun 26 07:55:29 2007 From: canonrs at ornl.gov (Canon, Richard Shane) Date: Tue, 26 Jun 2007 10:55:29 -0400 Subject: [ofa-general] low performance with multiple LUNs on a single portwith ib_srp In-Reply-To: References: <537C6C0940C6C143AA46A88946B8541708BB417C@ORNLEXCHANGE.ornl.gov> Message-ID: <537C6C0940C6C143AA46A88946B8541708BB4452@ORNLEXCHANGE.ornl.gov> Mahmoud, Thanks for the hint. I tried that out and it definitely helped. The key parameter is the max_cmd_per_lun. I think at 16 (which is what I was using) it was overflowing something in the stack. I tried both 3 and 5. With 5 I was able to get over 700 MB/s for one up to four LUNs on a single port. I was able to get 750 MB/s when using over two LUNs. So that looks much better. Thanks, --Shane ________________________________ From: Mahmoud Hanafi [mailto:mhanafi at csc.com] Sent: Tuesday, June 26, 2007 10:12 AM To: Canon, Richard Shane Cc: general at lists.openfabrics.org; Vu Pham Subject: Re: [ofa-general] low performance with multiple LUNs on a single portwith ib_srp Here are some performance results that I was able to achieve running across several Luns. Config setting 1 host port to 1 ddn port. OFED1.2rc6 [root at io1 IB]# cat /etc/modprobe.conf alias scsi_hostadapter qla2xxx alias scsi_hostadapter1 megaraid_sas alias scsi_hostadapter2 qla2400 alias usb-controller ehci-hcd alias usb-controller1 uhci-hcd alias ib0 ib_ipoib alias ib1 ib_ipoib alias net-pf-27 ib_sdp alias lustre llite options lnet networks=o2ib alias eth1 bnx2 alias eth0 bnx2 options ib_srp srp_sg_tablesize=256 [root at io1 IB]# cat /etc/srp_daemon.conf a max_sect=8192,max_cmd_per_lun=3 Write (MB/sec) Number of LUNS "Rec Length (KB)" 1 2 3 4 5 6 7 16 23 37 41 47 51 55 57 32 44 72 82 94 102 109 114 64 79 136 163 187 201 215 226 128 131 247 310 352 380 405 426 256 194 363 477 549 616 673 698 512 299 558 553 670 717 725 727 1,024 434 591 718 725 725 725 726 2,048 465 608 687 723 725 726 726 4,096 523 695 722 726 727 727 726 8,192 537 702 726 727 728 726 727 Read (MB/sec) Number of LUNS "Rec Length (KB)" 1 2 3 4 5 6 7 16 26 41 45 56 60 63 62 32 48 78 97 107 117 122 124 64 81 140 172 196 215 227 237 128 126 207 269 314 347 373 391 256 174 271 389 482 500 537 546 512 255 375 418 478 528 556 562 1,024 330 430 505 554 564 564 564 2,048 326 445 527 553 561 563 564 4,096 357 513 556 562 564 564 565 8,192 360 520 558 564 565 565 565 ------------------------------------------------------------------------ ------------------------------------------------------------------------ -------------------------------- This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose. ------------------------------------------------------------------------ ------------------------------------------------------------------------ -------------------------------- "Canon, Richard Shane" Sent by: general-bounces at lists.openfabrics.org 06/25/2007 05:51 PM To Vu Pham cc general at lists.openfabrics.org Subject [ofa-general] low performance with multiple LUNs on a single port with ib_srp Greetings, Hopefully the subject says it all... I've stumbled on a performance issue with the OFED ib_srp driver. Here is the configuration. I am testing with a DDN 9550 and a single host system. The systems are connected by two SDR links. On the host side there is a dual port (DDR) card. On the DDN side, both lines go into a single singlet (even though it is a couplet). The lines go into two distinct cards on the DDN side (if you are familiar with the layout). The testing used OFED 1.2. Now for the tests... If I run a single stream test I'm seeing good result with over 700 MB/s. These tests are run using sg_dd with the directio flag. If I run two concurrent streams against two LUNs that are each presented over a single port on the DDN (and therefore accessed by a single port on the host side), the aggregate performance drop to around 120 MB/s (60 MB/s per stream). Just to confirm it isn't a problem on the DDN side, I repeated these tests with the IBGD driver. There I consistently saw about 600-650 MB/s on the port regardless of the number of LUNs I tested with. Any ideas on what the problem is? Also, if this doesn't make sense, let me know and I will try to clarify further. Thanks, --Shane Canon -- R. Shane Canon National Center for Computational Science Oak Ridge National Laboratory canonrs at ornl.gov _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Tue Jun 26 08:02:30 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 26 Jun 2007 10:02:30 -0500 Subject: [ofa-general] [Bug 662] In-Reply-To: <20070626143735.GD29602@mellanox.co.il> References: <20070626142643.GC29602@mellanox.co.il> <20070626143735.GD29602@mellanox.co.il> Message-ID: <46812A86.9000505@opengridcomputing.com> I think the bug is in rping_bind_client(). If addr resolution fails via a ADDR_ERROR event, then rping_bind_client() wakes up and mistakenly returns variable 'ret' which is zero. It should return non-zero in this case. Steve. Michael S. Tsirkin wrote: >> Quoting Michael S. Tsirkin : >> Subject: bug 667 >> >> Sean, could you look at bug 667 please? >> rping seems to be crashing after connect error. > > Here's a backtrace from the core dump. > > # rping -c -d -a 11.4.3.174 > ipaddr (11.4.3.174) > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. > This will severely limit memory registrations. > created cm_id 0x505f10 > cma_event type 1 cma_id 0x505f10 (parent) > cma event 1, error -110 > waiting for addr/route resolution state 1 > Segmentation fault (core dumped) > # gdb `which rping` > GNU gdb 6.4 > Copyright 2005 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "x86_64-suse-linux"...Using host libthread_db library "/lib64/libthread_db.so.1". > > (gdb) core core.29968 > Core was generated by `rping -c -d -a 11.4.3.174'. > Program terminated with signal 11, Segmentation fault. > Reading symbols from /usr/local/ofed/lib64/librdmacm.so.1...done. > Loaded symbols for /usr/local/ofed/lib64/librdmacm.so.1 > Reading symbols from /usr/local/ofed/lib64/libibverbs.so.1...done. > Loaded symbols for /usr/local/ofed/lib64/libibverbs.so.1 > Reading symbols from /lib64/libpthread.so.0...done. > Loaded symbols for /lib64/libpthread.so.0 > Reading symbols from /lib64/libdl.so.2...done. > Loaded symbols for /lib64/libdl.so.2 > Reading symbols from /lib64/libc.so.6...done. > Loaded symbols for /lib64/libc.so.6 > Reading symbols from /lib64/ld-linux-x86-64.so.2...done. > Loaded symbols for /lib64/ld-linux-x86-64.so.2 > Reading symbols from /usr/local/ofed/lib64/libcxgb3-rdmav2.so...done. > Loaded symbols for /usr/local/ofed/lib64/libcxgb3-rdmav2.so > Reading symbols from /usr/local/ofed/lib64/libmthca-rdmav2.so...done. > Loaded symbols for /usr/local/ofed/lib64/libmthca-rdmav2.so > #0 __ibv_alloc_pd (context=0x0) at src/verbs.c:143 > 143 pd = context->ops.alloc_pd(context); > (gdb) where > #0 __ibv_alloc_pd (context=0x0) at src/verbs.c:143 > #1 0x00000000004015e6 in rping_setup_qp (cb=0x505010, cm_id=0x505f10) > at examples/rping.c:514 > #2 0x000000000040270b in main (argc=5, argv=0x7fffe0117238) at examples/rping.c:936 > (gdb) frame 1 > #1 0x00000000004015e6 in rping_setup_qp (cb=0x505010, cm_id=0x505f10) > at examples/rping.c:514 > 514 cb->pd = ibv_alloc_pd(cm_id->verbs); > (gdb) p cm_id->verbs > $1 = (struct ibv_context *) 0x0 > (gdb) p (struct cma_id_private *)cm_id > $2 = (struct cma_id_private *) 0x505f10 > (gdb) p *$2 > $3 = {id = {verbs = 0x0, channel = 0x505ef0, context = 0x505010, qp = 0x0, route = { > addr = {src_addr = {sa_family = 0, sa_data = '\0' }, > src_pad = '\0' , dst_addr = {sa_family = 2, > sa_data = "\000\000\v\004\003�\000\000\000\000\000\000\000"}, > dst_pad = '\0' , addr = {ibaddr = {sgid = { > raw = '\0' , global = {subnet_prefix = 0, > interface_id = 0}}, dgid = {raw = '\0' , global = { > subnet_prefix = 0, interface_id = 0}}, pkey = 0}}}, path_rec = 0x0, > num_paths = 0}, ps = RDMA_PS_TCP, port_num = 0 '\0'}, cma_dev = 0x0, > events_completed = 0, connect_error = 0, cond = {__data = {__lock = 0, __futex = 0, > __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, > __broadcast_seq = 0}, __size = '\0' , __align = 0}, mut = { > __data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, > __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, > __size = '\0' , __align = 0}, handle = 0, mc_list = 0x0} > (gdb) where > #0 __ibv_alloc_pd (context=0x0) at src/verbs.c:143 > #1 0x00000000004015e6 in rping_setup_qp (cb=0x505010, cm_id=0x505f10) > at examples/rping.c:514 > #2 0x000000000040270b in main (argc=5, argv=0x7fffe0117238) at examples/rping.c:936 > (gdb) > > From sweitzen at cisco.com Tue Jun 26 08:53:31 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 26 Jun 2007 08:53:31 -0700 Subject: [ofa-general] Re: development process post ofed-1.2 gold. In-Reply-To: <4680F1C8.3020207@mellanox.co.il> References: <4680305D.9030701@opengridcomputing.com> <4680F1C8.3020207@mellanox.co.il> Message-ID: > My suggestion is that we keep the ofed_1_2 branch alive, thus > new fixes > should be applied to the repository. > In this way we will be able to do a stable release when we decide. > Another question is regarding the daily build - I don't think we need > them any more. We can do a weekly build, or run build in case of need > (new patches submitted). What other people think about this? Weekly and on-demand builds sound OK to me. Scott From swise at opengridcomputing.com Tue Jun 26 08:53:43 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 26 Jun 2007 10:53:43 -0500 Subject: [ofa-general] Re: librdmacm code confusion wrt iWarp In-Reply-To: <000101c740b7$4abbc140$ff0da8c0@amr.corp.intel.com> References: <000101c740b7$4abbc140$ff0da8c0@amr.corp.intel.com> Message-ID: <46813687.2060801@opengridcomputing.com> Sean Hefty wrote: > Steve, > > I'm looking at rdma_create_qp() in librdmacm. There's a section of code in > there: > > if (id->ps == RDMA_PS_UDP) > ret = ucma_init_ud_qp(id_priv, qp); > else > ret = ucma_init_ib_qp(id_priv, qp); > > Both of these calls transition the QP to INIT, so that the user can post > receives before trying to establish a connection. iWarp is handled the same as > IB, which confuses me, since it is treated differently in the kernel. I'm > assuming that the librdmacm works for you over iWarp, but I'd like to understand > this better. > The actual work for setting init-state qp attributes and moving the qp to INIT state is done in the kernel CMA modules. Thus librdmacm doesn't need to do anything specific for iwarp in user mode. It calls into the kernel and the ucma module ends up calling the kernel rdma_init_qp_attr() which does the switch on the transport type. The design goal when Tom added iwarp into librdmacm was minimal impact to the existing code. So there is very little code in librdmacm that switches on the transport type... Steve. From swise at opengridcomputing.com Tue Jun 26 08:55:54 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 26 Jun 2007 10:55:54 -0500 Subject: [ofa-general] Re: development process post ofed-1.2 gold. In-Reply-To: References: <4680305D.9030701@opengridcomputing.com> <4680F1C8.3020207@mellanox.co.il> Message-ID: <4681370A.5050306@opengridcomputing.com> Scott Weitzenkamp (sweitzen) wrote: >> My suggestion is that we keep the ofed_1_2 branch alive, thus >> new fixes >> should be applied to the repository. >> In this way we will be able to do a stable release when we decide. >> Another question is regarding the daily build - I don't think we need >> them any more. We can do a weekly build, or run build in case of need >> (new patches submitted). What other people think about this? > > Weekly and on-demand builds sound OK to me. > > Scott ditto From sweitzen at cisco.com Tue Jun 26 08:58:16 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 26 Jun 2007 08:58:16 -0700 Subject: [ofa-general] SRP Failover In-Reply-To: <000301c7b7d7$236b3a70$6a41af50$@com.hk> References: <000301c7b7d7$236b3a70$6a41af50$@com.hk> Message-ID: You need to configure Device Mapper Multipath or some other multipathing software to get HA. What OS are you running? Steps for RHEL are: 1) Edit /etc/multipath.conf and comment out devnode_blacklist (RHEL4) or blacklist (RHEL5) entry. 2) Run "chkconfig multipathd on". 3) Reboot. 4) After reboot, /dev/mapper should be populated with mutipath block device entries. 5) You can run "multipath -l" to view the multipath status. Steps for SLES10 are similar: 1) Run "chkconfig boot.multipath on". 2) Run "chkconfig multipathd on". 3) Reboot. 4) After reboot, /dev/mapper should be populated with mutipath block device entries. 5) You can run "multipath -l" to view the multipath status. You use the /dev/mapper block devices, not /dev/sd* block devices. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of PN Lai Sent: Tuesday, June 26, 2007 2:48 AM To: general at lists.openfabrics.org Subject: [ofa-general] SRP Failover Hi all, I'm testing the SRP HA functions, but I have some questions. I use 2 IB cables to connect the initiator and 1 IB cables to connect to the storage. I installed the OFED-1.2, enable the "SRP_LOAD=yes" and "SRPHA_ENABLE=yes" in openib.conf. After reboot, it discovers 2 targets /dev/sdbX and /dev/sdcX. However, I check the /var/log/srp_daemon.log, it shows: .... 26/05/07 17:42:57 : bad MAD status (110) from lid 257 26/05/07 17:43:30 : No response to inform info registration 26/05/07 17:43:30 : Fail to register to traps, maybe there is no opensm running on fabric .... But the opensm is running in both machines. I don't know whether it is normal, or should it only discover a single target? Now, my question is that if I mount the /dev/sdbX and write data to it, and then remove 1 of the initiator cable, how the /dev/sdcX will replace the /dev/sdbX so that I can continue to write the data? Do I need to configure some extra files? Thanks for reply. PN -------------- next part -------------- An HTML attachment was scrubbed... URL: From rowland at cse.ohio-state.edu Tue Jun 26 10:06:12 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Tue, 26 Jun 2007 13:06:12 -0400 Subject: [ofa-general] Installation problem with mvapich2 In-Reply-To: <1c16cdf90706252229p2a6466a1l81d5411821252744@mail.gmail.com> References: <1c16cdf90706252229p2a6466a1l81d5411821252744@mail.gmail.com> Message-ID: <46814784.1040809@cse.ohio-state.edu> Chevchenkovic Chevchenkovic wrote: > Hi, > I am trying to install mvapich2 on my system. So i do the following: > 1. untar mvapich2-0.9.8.tar.gz > 2. go to make.mvapich2.gen2 file and set the prefix as > /root/chev/temp/mvapich2-0.9.8/ > > Then we execute the instruction as : > ./make.mvapich2.gen2 > > I get the following as output: > ========================================================= > Configuring MVAPICH2... > Configuring MPICH2 version MVAPICH2-0.9.8 with > --prefix=/root/chev/temp/mvapich2-0.9.8/ --enable-g=dbg > --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd > --disable-romio --without-mpe > sourcing /root/chev/temp/mvapich2-0.9.8/src/pm/mpd/setup_pm > checking for gcc... gcc > checking for C compiler default output file name... configure: error: > C compiler cannot create executables > See `config.log' for more details. > Configuring MPICH2 version MVAPICH2-0.9.8 with > --prefix=/root/chev/chev/mvapich2-0.9.8/ --enable-g=dbg > --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd > --disable-romio --without-mpe > sourcing /root/chev/temp/mvapich2-0.9.8/src/pm/mpd/setup_pm > checking for gcc... gcc > checking for C compiler default output file name... configure: error: > C compiler cannot create executables > See `config.log' for more details. > Building MVAPICH2... > make: *** No targets specified and no makefile found. Stop. > make: *** No targets specified and no makefile found. Stop. > MVAPICH2 installation... > make: *** No rule to make target `install'. Stop. > make: *** No rule to make target `install'. Stop. > Congratulations on successfully building MVAPICH2. Please send your > feedback to mvapich-discuss at cse.ohio-state.edu. > ================================================ > > > What is going wrong? > Can someone please help me in this regards? > Awaiting some reply, Can you look in the config.log file that is generated? It should tell you why the C compiler cannot create executables when run by configure. It is impossible to tell from the above output alone. -- Shaun Rowland rowland at cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From robert.j.woodruff at intel.com Tue Jun 26 10:12:03 2007 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Tue, 26 Jun 2007 10:12:03 -0700 Subject: [ofa-general] bug 667 In-Reply-To: <20070626142643.GC29602@mellanox.co.il> Message-ID: FYI - Sean is out on vacation, he will be back Thursday. -----Original Message----- From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Michael S. Tsirkin Sent: Tuesday, June 26, 2007 7:27 AM To: Hefty, Sean; general at lists.openfabrics.org Subject: [ofa-general] bug 667 Sean, could you look at bug 667 please? rping seems to be crashing after connect error. -- MST _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From glebn at voltaire.com Tue Jun 26 10:21:30 2007 From: glebn at voltaire.com (Gleb Natapov) Date: Tue, 26 Jun 2007 20:21:30 +0300 Subject: [ofa-general] Toward next OFED release (1.3) In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> Message-ID: <20070626172130.GB26637@minantech.com> On Tue, Jun 26, 2007 at 05:27:21PM +0300, Tziporet Koren wrote: > libibverbs: > * New verbs: > * Scalable Reliable Connected Transport (with Mellanox ConnectX) > * Shared Send Queue > * Reliable Multicast ? > What about allowing to allocate coherent memory for CQ inside the kernel to fix issue with Altix machines? -- Gleb. From ardavis at ichips.intel.com Tue Jun 26 10:36:38 2007 From: ardavis at ichips.intel.com (Arlin Davis) Date: Tue, 26 Jun 2007 10:36:38 -0700 Subject: [ofa-general] Toward next OFED release (1.3) In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> Message-ID: <46814EA6.1010300@ichips.intel.com> Tziporet Koren wrote: > ULPs: > > * IPoIB: NAPI; CM in GA; Bonding in GA > * NFS over RDMA integration > * RDS: RDMA API (using FMRs); GA quality with Oracle 11 > * SDP: Keepalive; Asynch IO (Zero Copy) > * SRP: HA in GA > * VNIC: ? Qlogic - please update > * iSER: ? Voltaire - please update > uDAPL - DAT 2.0 support with IB extensions for immediate data, atomics; Add extensions for new verbs (SRCT,SSQ,RM) > From rdreier at cisco.com Tue Jun 26 10:47:16 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Jun 2007 10:47:16 -0700 Subject: [ofa-general] Toward next OFED release (1.3) In-Reply-To: <20070626172130.GB26637@minantech.com> (Gleb Natapov's message of "Tue, 26 Jun 2007 20:21:30 +0300") References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> <20070626172130.GB26637@minantech.com> Message-ID: > What about allowing to allocate coherent memory for CQ inside the kernel > to fix issue with Altix machines? Sorry... I've been remiss in posting about this. I would actually prefer to see an extension to the dma_map_sg() interface (a new flag perhaps?) that would set the right magic bit in the DMA address on altix. The refactoring of ib_umem_get() to be called by low-level drivers makes this a fairly clean approach, and it avoids the problems with using dma_alloc_coherent() to allocate userspace buffers (for example, dma_alloc_coherent() uses up kernel virtual addresses, which may be scarce on 32 bit architectures). - R. From sashak at voltaire.com Tue Jun 26 10:55:54 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 26 Jun 2007 20:55:54 +0300 Subject: [ofa-general] Fwd: [ANNOUNCE] GIT 1.5.2.2 In-Reply-To: <20070625043809.GA29772@mellanox.co.il> References: <20070625043809.GA29772@mellanox.co.il> Message-ID: <20070626175554.GL25653@sashak.voltaire.com> On 07:38 Mon 25 Jun , Michael S. Tsirkin wrote: > FYI > I think git-gui updates make it worth while to upgrade. > Sasha? I guess nobody uses git-gui on server side. Isn't it? Sasha From akepner at sgi.com Tue Jun 26 10:53:43 2007 From: akepner at sgi.com (akepner at sgi.com) Date: Tue, 26 Jun 2007 10:53:43 -0700 Subject: [ofa-general] Toward next OFED release (1.3) In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> <20070626172130.GB26637@minantech.com> Message-ID: <20070626175343.GB5951@sgi.com> On Tue, Jun 26, 2007 at 10:47:16AM -0700, Roland Dreier wrote: > > What about allowing to allocate coherent memory for CQ inside the kernel > > to fix issue with Altix machines? > > Sorry... I've been remiss in posting about this. I would actually > prefer to see an extension to the dma_map_sg() interface (a new flag > perhaps?) that would set the right magic bit in the DMA address on > altix. The refactoring of ib_umem_get() to be called by low-level > drivers makes this a fairly clean approach, and it avoids the problems > with using dma_alloc_coherent() to allocate userspace buffers (for > example, dma_alloc_coherent() uses up kernel virtual addresses, which > may be scarce on 32 bit architectures). > Check. Generating a patch for OFED 1.3 is on my to do list. -- Arthur From sashak at voltaire.com Tue Jun 26 11:01:57 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 26 Jun 2007 21:01:57 +0300 Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement In-Reply-To: <20070626132457.GA29602@mellanox.co.il> References: <20070626102045.GS15343@mellanox.co.il> <1182862966.10379.425353.camel@hal.voltaire.com> <20070626132457.GA29602@mellanox.co.il> Message-ID: <20070626180157.GM25653@sashak.voltaire.com> On 16:24 Tue 26 Jun , Michael S. Tsirkin wrote: > > Quoting Hal Rosenstock : > > Subject: Re: [PATCH] management: uint -> unsigned replacement > > > > On Tue, 2007-06-26 at 06:20, Michael S. Tsirkin wrote: > > > Some management headers use uint type which (on my system) > > > > What's your system ? > > SLES10. > > > > is described as "old > > > compatibility name for C type". This type might not defined e.g. if > > > __STRICT_ANSI__ is set, > > > > Is strict ANSI a requirement ? Even if not, what is a reason to use uint there instead of just unsigned? I don't know. I like this patch. Sasha > > Not sure. The app in question does > #define _XOPEN_SOURCE 600 > > > > so it is best to avoid its usage at least in headers. > > > Replace by unsigned in all headers. > > > > > > Signed-off-by: Michael S. Tsirkin > > > > > > --- > > > > > > Hal can you apply this please? As a separate question: > > > I didn't go over .c files (we don't build them with strict ansi now), > > > but maybe removing uint there is a good idea, too? > > > > Yes but it will take more than this to make them strict ANSI. > > > > Is this as an OFED 1.2 follow on or just for master ? > > You decide. > > -- > MST > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Tue Jun 26 11:01:47 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Jun 2007 11:01:47 -0700 Subject: [ofa-general] Toward next OFED release (1.3) In-Reply-To: <20070626175343.GB5951@sgi.com> (akepner@sgi.com's message of "Tue, 26 Jun 2007 10:53:43 -0700") References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> <20070626172130.GB26637@minantech.com> <20070626175343.GB5951@sgi.com> Message-ID: > > Sorry... I've been remiss in posting about this. I would actually > > prefer to see an extension to the dma_map_sg() interface (a new flag > > perhaps?) that would set the right magic bit in the DMA address on > > altix. The refactoring of ib_umem_get() to be called by low-level > > drivers makes this a fairly clean approach, and it avoids the problems > > with using dma_alloc_coherent() to allocate userspace buffers (for > > example, dma_alloc_coherent() uses up kernel virtual addresses, which > > may be scarce on 32 bit architectures). > > > > Check. > > Generating a patch for OFED 1.3 is on my to do list. That's great, but please let's not think about it as a patch "for OFED 1.3." I think this sort of change to the user/kernel interface really needs to go upstream before it goes into OFED, so just work on getting the changes into the kernel and libmthca, and then we can worry about getting them into an OFED release. - R. From glebn at voltaire.com Tue Jun 26 11:05:51 2007 From: glebn at voltaire.com (Gleb Natapov) Date: Tue, 26 Jun 2007 21:05:51 +0300 Subject: [ofa-general] Toward next OFED release (1.3) In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> <20070626172130.GB26637@minantech.com> Message-ID: <20070626180551.GG26637@minantech.com> On Tue, Jun 26, 2007 at 10:47:16AM -0700, Roland Dreier wrote: > > > What about allowing to allocate coherent memory for CQ inside the kernel > > to fix issue with Altix machines? > > Sorry... I've been remiss in posting about this. I would actually > prefer to see an extension to the dma_map_sg() interface (a new flag > perhaps?) that would set the right magic bit in the DMA address on > altix. The refactoring of ib_umem_get() to be called by low-level > drivers makes this a fairly clean approach, and it avoids the problems > with using dma_alloc_coherent() to allocate userspace buffers (for > example, dma_alloc_coherent() uses up kernel virtual addresses, which > may be scarce on 32 bit architectures). > While this make sense it would be hard to push into the kernel proper. Or no? Are you going to do that? -- Gleb. From rdreier at cisco.com Tue Jun 26 11:33:04 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Jun 2007 11:33:04 -0700 Subject: [ofa-general] Toward next OFED release (1.3) In-Reply-To: <20070626180551.GG26637@minantech.com> (Gleb Natapov's message of "Tue, 26 Jun 2007 21:05:51 +0300") References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> <20070626172130.GB26637@minantech.com> <20070626180551.GG26637@minantech.com> Message-ID: > > Sorry... I've been remiss in posting about this. I would actually > > prefer to see an extension to the dma_map_sg() interface (a new flag > > perhaps?) that would set the right magic bit in the DMA address on > > altix. The refactoring of ib_umem_get() to be called by low-level > > drivers makes this a fairly clean approach, and it avoids the problems > > with using dma_alloc_coherent() to allocate userspace buffers (for > > example, dma_alloc_coherent() uses up kernel virtual addresses, which > > may be scarce on 32 bit architectures). > While this make sense it would be hard to push into the kernel proper. > Or no? Are you going to do that? I don't think I'm willing to merge a fix that uses dma_alloc_coherent() inside the kernel so this alternate fix is probably easier to merge. Yes, it does mean an extension to the DMA mapping API but I think getting that right will be useful in terms of making sure what we're doing really makes sense. - R. From madhu.lakshmanan at qlogic.com Tue Jun 26 12:15:54 2007 From: madhu.lakshmanan at qlogic.com (Lakshmanan, Madhu) Date: Tue, 26 Jun 2007 14:15:54 -0500 Subject: [ofa-general] Toward next OFED release (1.3) In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE06119291460@EPEXCH2.qlogic.org> > From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On > Behalf Of Tziporet Koren > Subject: [ofa-general] Toward next OFED release (1.3) > > Hi All, > > On next Monday we will have the first meeting to close OFED 1.3 features and schedule. > As a preparation I send here the list of features we already reviewed in Sonoma, and other > features I see in progress on the general list discussions. > > I know this is a long mail :-( but I ask each of the maintainers/customers to review this list and > send comments and other requests. > > There are some ULPs that I placed "?" and the owner should review and reply with the plans. > > Thanks, > Tziporet > > > Main New Features > ============== > Base kernel: 2.6.23 (we will start with 2.6.22 but will move to 2.6.23) > Install: > > * Minimize integration effort into OS distribution > * Break the packages RPMs (work with Novell and Redhat) > > > Package: > > * Sources arrangement for the end user (for the labs) > * Reduce compilation warnings > > > QoS: > > * OSM > * CM & CMA > * ULPs: SDP, SRP, IPoIB, RDS? > > > Core: > > * Updated SA cache > * User space events registration > * Preparations for IB routers > > > libibverbs: > > * New verbs: > > * Scalable Reliable Connected Transport (with Mellanox ConnectX) > * Shared Send Queue > * Reliable Multicast ? > > > Management: > > * Multiple partitions > * OpenSM > > * More routing performance improvements > * Even more speedups > * Better packaging/installation > * "Native" daemon mode > * Performance management > * Quality of Service manager: Based on IBTA annex > > * More diagnostics - Hal please update > > > ULPs: > > * IPoIB: NAPI; CM in GA; Bonding in GA > * NFS over RDMA integration > * RDS: RDMA API (using FMRs); GA quality with Oracle 11 > * SDP: Keepalive; Asynch IO (Zero Copy) > * SRP: HA in GA > * VNIC: ? Qlogic - please update VNIC: - GA quality. Not a technology preview version anymore. - Added support for QLogic EVIC (10 Gbps Infiniband-to-Ethernet gateway) - in GA - mlx4 and ipath support - in GA Thanks, Madhu Lakshmanan QLogic Corporation From tziporet at dev.mellanox.co.il Tue Jun 26 12:19:48 2007 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 26 Jun 2007 22:19:48 +0300 Subject: [ewg] Re: [ofa-general] Toward next OFED release (1.3) In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> <20070626172130.GB26637@minantech.com> <20070626175343.GB5951@sgi.com> Message-ID: <468166D4.20204@mellanox.co.il> Roland Dreier wrote: > That's great, but please let's not think about it as a patch "for OFED > 1.3." I think this sort of change to the user/kernel interface really > needs to go upstream before it goes into OFED, so just work on getting > the changes into the kernel and libmthca, and then we can worry about > getting them into an OFED release. > This comment is aligned with OFED development methodology. Regarding all kernel modules that are part of Linux: we first push the change to the kernel and base OFED on this code. We take kernel patches for bug fixes and portions that are targeted for the kernel inclusion. OFED does not come to be a bypass for the Linux kernel development process. Regarding user space libraries - OFED is based on the sources from git of each package and any change should be coordinated with the library owner: libibverbs - Roland libumad - Hal librdmacm & libcm - Sean uDAPL - Arlin Tziporet From halr at voltaire.com Tue Jun 26 12:18:55 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Jun 2007 15:18:55 -0400 Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement In-Reply-To: <20070626180157.GM25653@sashak.voltaire.com> References: <20070626102045.GS15343@mellanox.co.il> <1182862966.10379.425353.camel@hal.voltaire.com> <20070626132457.GA29602@mellanox.co.il> <20070626180157.GM25653@sashak.voltaire.com> Message-ID: <1182885534.28870.527.camel@hal.voltaire.com> On Tue, 2007-06-26 at 14:01, Sasha Khapyorsky wrote: > On 16:24 Tue 26 Jun , Michael S. Tsirkin wrote: > > > Quoting Hal Rosenstock : > > > Subject: Re: [PATCH] management: uint -> unsigned replacement > > > > > > On Tue, 2007-06-26 at 06:20, Michael S. Tsirkin wrote: > > > > Some management headers use uint type which (on my system) > > > > > > What's your system ? > > > > SLES10. > > > > > > is described as "old > > > > compatibility name for C type". This type might not defined e.g. if > > > > __STRICT_ANSI__ is set, > > > > > > Is strict ANSI a requirement ? > > Even if not, I was just trying to determine how much further we needed to go down this path. -- Hal > what is a reason to use uint there instead of just unsigned? > I don't know. I like this patch. > > Sasha > > > > > Not sure. The app in question does > > #define _XOPEN_SOURCE 600 > > > > > > so it is best to avoid its usage at least in headers. > > > > Replace by unsigned in all headers. > > > > > > > > Signed-off-by: Michael S. Tsirkin > > > > > > > > --- > > > > > > > > Hal can you apply this please? As a separate question: > > > > I didn't go over .c files (we don't build them with strict ansi now), > > > > but maybe removing uint there is a good idea, too? > > > > > > Yes but it will take more than this to make them strict ANSI. > > > > > > Is this as an OFED 1.2 follow on or just for master ? > > > > You decide. > > > > -- > > MST > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Tue Jun 26 12:34:10 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Jun 2007 12:34:10 -0700 Subject: [ewg] Re: [ofa-general] Toward next OFED release (1.3) In-Reply-To: <468166D4.20204@mellanox.co.il> (Tziporet Koren's message of "Tue, 26 Jun 2007 22:19:48 +0300") References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> <20070626172130.GB26637@minantech.com> <20070626175343.GB5951@sgi.com> <468166D4.20204@mellanox.co.il> Message-ID: > This comment is aligned with OFED development methodology. > Regarding all kernel modules that are part of Linux: we first push the > change to the kernel and base OFED on this code. > We take kernel patches for bug fixes and portions that are targeted > for the kernel inclusion. > OFED does not come to be a bypass for the Linux kernel development process. Right, I think we agree on things here. I just want to emphasize that the best and easiest way to get things into OFED is to get them into upstream sources. And I hope OFED maintainers will start to push back on patch submissions to OFED that have not at least been submitted for upstream inclusion. - R. From rdreier at cisco.com Tue Jun 26 12:34:46 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Jun 2007 12:34:46 -0700 Subject: [ewg] RE: [ofa-general] Toward next OFED release (1.3) In-Reply-To: <4FB1BCCAE6CAED44A1DC005B1DE06119291460@EPEXCH2.qlogic.org> (Madhu Lakshmanan's message of "Tue, 26 Jun 2007 14:15:54 -0500") References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> <4FB1BCCAE6CAED44A1DC005B1DE06119291460@EPEXCH2.qlogic.org> Message-ID: > VNIC: > - GA quality. Not a technology preview version anymore. > - Added support for QLogic EVIC (10 Gbps Infiniband-to-Ethernet > gateway) - in GA I hope there will be some attempt to get these drivers merged upstream too. - R. From mst at dev.mellanox.co.il Tue Jun 26 12:35:12 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Tue, 26 Jun 2007 22:35:12 +0300 Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: <20070626141349.GJ1164@minantech.com> References: <20070626070641.GM15343@mellanox.co.il> <20070626083445.GB1164@minantech.com> <20070626095125.GO15343@mellanox.co.il> <20070626111342.GC1164@minantech.com> <20070626114402.GT15343@mellanox.co.il> <20070626122539.GF1164@minantech.com> <20070626125802.GU15343@mellanox.co.il> <20070626133317.GH1164@minantech.com> <20070626140239.GB29602@mellanox.co.il> <20070626141349.GJ1164@minantech.com> Message-ID: <20070626193512.GC6426@mellanox.co.il> > Quoting Gleb Natapov : > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects > > On Tue, Jun 26, 2007 at 05:02:39PM +0300, Michael S. Tsirkin wrote: > > > Quoting Gleb Natapov : > > > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects > > > > > > On Tue, Jun 26, 2007 at 03:58:02PM +0300, Michael S. Tsirkin wrote: > > > > > > No, sharing a send queue must be done in software. I don't really see the reason > > > > > > for sarcasm: do you see value in sharing resources between multiple threads? > > > > > > Why not multiple processes? Some people just don't want to program > > > > > > in multithreaded environment. > > > > > > > > > > Yes I see the value in sharing resources between threads and processes > > > > > if done right. This proposition is far from being right. > > > > > > > > Ahem, *what* are you talking about? Sharing resources between threads was supported in > > > > libibverbs 1.0, *right from the start*. This is still the case with 1.1, and this API > > > > matches verbs quite closely which means that it can work pretty much on any > > > > hardware. > > > > > > Why do you think that I have a problem with multithreaded application is > > > beyond my understanding. I have a problem with you thinking that peaking a > > > completion by random process in FCFS order is a good idea. > > > > Should that have been "picking"? I keep telling you. With multithreaded > Yes "picking". Sorry :) > > > applications *that's what currently happens*. If multiple threads poll a CQ, > > which one gets which completion is currently unspecified. Are you > > worried about this? If not, why are you worried when multiple > > processes do this? > You've missed my sentence about difference between multithreaded > application and what you propose. The difference is HUGE (I can't write > bigger letters sorry about that). I can design a multithreaded MPI so > that each thread will be capable to progress MPI send/recv request (and then > I don't care what thread gets which completion. I can't do it with multiprocess > scenario. Well, with shared memory, the difference between thread and process is not that huge. And with the proposed API, you will be able to do just that. -- MST From madhu.lakshmanan at qlogic.com Tue Jun 26 12:46:51 2007 From: madhu.lakshmanan at qlogic.com (Lakshmanan, Madhu) Date: Tue, 26 Jun 2007 14:46:51 -0500 Subject: [ewg] RE: [ofa-general] Toward next OFED release (1.3) In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com><4FB1BCCAE6CAED44A1DC005B1DE06119291460@EPEXCH2.qlogic.org> Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE0611929146A@EPEXCH2.qlogic.org> > From: Roland Dreier [mailto:rdreier at cisco.com] > Subject: Re: [ewg] RE: [ofa-general] Toward next OFED release (1.3) > > > VNIC: > > - GA quality. Not a technology preview version anymore. > > - Added support for QLogic EVIC (10 Gbps Infiniband-to-Ethernet > > gateway) - in GA > > I hope there will be some attempt to get these drivers merged upstream too. > > - R. Agreed in principle. We hope to address that issue soon. Madhu From sweitzen at cisco.com Tue Jun 26 12:49:16 2007 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 26 Jun 2007 12:49:16 -0700 Subject: [ewg] RE: [ofa-general] Toward next OFED release (1.3) In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com><4FB1BCCAE6CAED44A1DC005B1DE06119291460@EPEXCH2.qlogic.org> Message-ID: > I hope there will be some attempt to get these drivers merged > upstream too. How about SDP, are we ready to try to merge it upstream? Scott From glebn at voltaire.com Tue Jun 26 12:54:01 2007 From: glebn at voltaire.com (Gleb Natapov) Date: Tue, 26 Jun 2007 22:54:01 +0300 Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: <20070626193512.GC6426@mellanox.co.il> References: <20070626083445.GB1164@minantech.com> <20070626095125.GO15343@mellanox.co.il> <20070626111342.GC1164@minantech.com> <20070626114402.GT15343@mellanox.co.il> <20070626122539.GF1164@minantech.com> <20070626125802.GU15343@mellanox.co.il> <20070626133317.GH1164@minantech.com> <20070626140239.GB29602@mellanox.co.il> <20070626141349.GJ1164@minantech.com> <20070626193512.GC6426@mellanox.co.il> Message-ID: <20070626195401.GH26637@minantech.com> On Tue, Jun 26, 2007 at 10:35:12PM +0300, Michael S. Tsirkin wrote: > > > applications *that's what currently happens*. If multiple threads poll a CQ, > > > which one gets which completion is currently unspecified. Are you > > > worried about this? If not, why are you worried when multiple > > > processes do this? > > You've missed my sentence about difference between multithreaded > > application and what you propose. The difference is HUGE (I can't write > > bigger letters sorry about that). I can design a multithreaded MPI so > > that each thread will be capable to progress MPI send/recv request (and then > > I don't care what thread gets which completion. I can't do it with multiprocess > > scenario. > > Well, with shared memory, the difference between thread and process is not that huge. > And with the proposed API, you will be able to do just that. > With your logic kernel can send signal to any process no matter which process actually caused it. After all this is what it does with threads. You are thinking about syntactic benchmark that just send random data to a peer and free it on completion. The real program has much more state associated with each operation and corespondent completion. And received data have to be actually processed by a process it was send to and not just by any process. Unless you'll stop repeating your mantra that threads are just like processes with shared memory segment we will not be able to address shortcomings of your proposal. -- Gleb. From halr at voltaire.com Tue Jun 26 13:21:53 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Jun 2007 16:21:53 -0400 Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement In-Reply-To: <20070626102045.GS15343@mellanox.co.il> References: <20070626102045.GS15343@mellanox.co.il> Message-ID: <1182889307.28870.4809.camel@hal.voltaire.com> On Tue, 2007-06-26 at 06:20, Michael S. Tsirkin wrote: > Some management headers use uint type which (on my system) is described as "old > compatibility name for C type". This type might not defined e.g. if > __STRICT_ANSI__ is set, so it is best to avoid its usage at least in headers. > Replace by unsigned in all headers. > > Signed-off-by: Michael S. Tsirkin Thanks. Applied (to master only so far but it does seem since a goal of OFED 1.2 is to support SLES 10 that is should be provided there as well. That will be forthcoming.) Also, I am working on updating the management library sources similarly although I don't see an imperative to move those changes to OFED 1.2. -- Hal From rdreier at cisco.com Tue Jun 26 14:15:33 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Jun 2007 14:15:33 -0700 Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: <20070626070641.GM15343@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 26 Jun 2007 10:06:41 +0300") References: <20070625130604.GH15343@mellanox.co.il> <20070626070641.GM15343@mellanox.co.il> Message-ID: > This is not directly related to SRC: this is an effort > to make it possible to share QPs, CQ etc across processes > in the same way as they can be currently shared across threads. > So assuming that we want multiple processes to post to > the same QP, how can we support this? This looks like a lot of work for an unknown gain. Who is going to really use this? ie is it worth the trouble? > > - Given that everything shared is in shared memory, > > I think we should try and keep shared memory usage to minimum. > For example, in mthca mr object just needs a key: we could > keep it in non-shared memory, just pass the key around > and save on sahred memory usage. This comment made me realize there are a few more problems here. What happens if I do ibv_reg_mr() in one process, pass the MR to another process, and then do ibv_dereg_mr() in the second process? What about if someone registers a region in shared memory -- are there any fork/copy-on-write issues with that? I think there are probably bugs in the locked_vm accounting in the kernel right now -- it doesn't take into account the possibility of passing context fds from one process to another. In general what do you think the rules for destroying objects should be? What if process A creates a QP, passes it to process B, and then process A dies? Should the QP still be usable? Should process B be able to destroy it? What if process A is still alive -- should process B be able to destroy the QP? > We need to share file descriptors too. Is there a way to pass these > around besides unix domain sockets? I guess we need this to be able to re-mmap doorbell pages etc, right? I wonder if there's a better way around that... maybe extending the kernel interface so that unrelated processes can share a context, eg by putting contexts in a filesystem or something like that. > But are you sure we want to break API for all users just to add > a new capability for a minority that wants shared memory support? Yes, you're right... better to be backward compatible and have a new API for shared stuff. - R. From rdreier at cisco.com Tue Jun 26 15:11:28 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Jun 2007 15:11:28 -0700 Subject: [ofa-general] Re: [PATCH 01/28] IB/ipath: include to fix ppc64 build In-Reply-To: <20070619234035.3794.7544.stgit@bauxite.internal.keyresearch.com> (Arthur Jones's message of "Tue, 19 Jun 2007 16:40:35 -0700") References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234035.3794.7544.stgit@bauxite.internal.keyresearch.com> Message-ID: Thanks, I applied all of these patches except 15/28 (waiting for a revised version with comments for the barriers) and {26,27}/28 (see separate replies). Also it would be great to get a MAINTAINERS update soon... - R. From rdreier at cisco.com Tue Jun 26 15:13:11 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Jun 2007 15:13:11 -0700 Subject: [ofa-general] Re: [PATCH 26/28] IB/ipath - print warning if LID not acquired and link ACTIVE within one minute In-Reply-To: <20070619234303.3794.75856.stgit@bauxite.internal.keyresearch.com> (Arthur Jones's message of "Tue, 19 Jun 2007 16:43:04 -0700") References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234303.3794.75856.stgit@bauxite.internal.keyresearch.com> Message-ID: This has come up before -- the feeling was that this checking shouldn't be in a low-level driver. Either warning for no LID makes sense for any IB device and therefore should be in the IB midlayer, or it doesn't make sense and ipath shouldn't do it. - R. From rdreier at cisco.com Tue Jun 26 15:13:44 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Jun 2007 15:13:44 -0700 Subject: [ofa-general] Re: [PATCH 27/28] IB/ipath - when we check for LID availability, check for lack of interrupts too. In-Reply-To: <20070619234309.3794.784.stgit@bauxite.internal.keyresearch.com> (Arthur Jones's message of "Tue, 19 Jun 2007 16:43:10 -0700") References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234309.3794.784.stgit@bauxite.internal.keyresearch.com> Message-ID: I didn't apply this either because it depends on 26/28 and I held off on that one. I think checking for interrupts in a low-level driver *is* sane though... From arthur.jones at qlogic.com Tue Jun 26 15:16:57 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 26 Jun 2007 15:16:57 -0700 Subject: [ofa-general] Re: [PATCH 01/28] IB/ipath: include to fix ppc64 build In-Reply-To: References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234035.3794.7544.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070626221657.GO29798@bauxite.pathscale.com> hi roland, ... On Tue, Jun 26, 2007 at 03:11:28PM -0700, Roland Dreier wrote: > Thanks, I applied all of these patches except 15/28 (waiting for a > revised version with comments for the barriers) and {26,27}/28 (see > separate replies). thanks... > Also it would be great to get a MAINTAINERS update soon... ok, i have the patch in my tree (along w/ a couple others), i was holding onto them until i got a chance to test them. shall i send off the MAINTAINERS patch separately? i expect to be able to get to testing by the end of this week... arthur From arthur.jones at qlogic.com Tue Jun 26 15:25:56 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 26 Jun 2007 15:25:56 -0700 Subject: [ofa-general] Re: [PATCH 26/28] IB/ipath - print warning if LID not acquired and link ACTIVE within one minute In-Reply-To: References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234303.3794.75856.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070626222556.GP29798@bauxite.pathscale.com> sorry to have missed the fun. does this mean that there's a patch pending to remove the gazillion link down messages in drivers/net? anyway, do we want it in the IB midlayer? i'd definitely like it somewhere, user space is a bit cumbersome for a such a simple check... arthur On Tue, Jun 26, 2007 at 03:13:11PM -0700, Roland Dreier wrote: > This has come up before -- the feeling was that this checking > shouldn't be in a low-level driver. Either warning for no LID makes > sense for any IB device and therefore should be in the IB midlayer, or > it doesn't make sense and ipath shouldn't do it. > > - R. From rdreier at cisco.com Tue Jun 26 15:26:04 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Jun 2007 15:26:04 -0700 Subject: [ofa-general] [PATCH/RFC] IB/mthca: Remove MSI support Message-ID: Is there any point in having MSI support in mthca, given that the hardware also does MSI-X, which is much more useful? Is anyone using MSI instead of MSI-X, and if so why? What do people think about applying this for 2.6.23? diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h index 9bae3cc..1002482 100644 --- a/drivers/infiniband/hw/mthca/mthca_dev.h +++ b/drivers/infiniband/hw/mthca/mthca_dev.h @@ -60,7 +60,6 @@ enum { MTHCA_FLAG_DDR_HIDDEN = 1 << 1, MTHCA_FLAG_SRQ = 1 << 2, - MTHCA_FLAG_MSI = 1 << 3, MTHCA_FLAG_MSI_X = 1 << 4, MTHCA_FLAG_NO_LAM = 1 << 5, MTHCA_FLAG_FMR = 1 << 6, diff --git a/drivers/infiniband/hw/mthca/mthca_eq.c b/drivers/infiniband/hw/mthca/mthca_eq.c index 8ec9fa1..a6ae4d9 100644 --- a/drivers/infiniband/hw/mthca/mthca_eq.c +++ b/drivers/infiniband/hw/mthca/mthca_eq.c @@ -842,8 +842,7 @@ int mthca_init_eq_table(struct mthca_dev *dev) if (err) goto err_out_free; - if (dev->mthca_flags & MTHCA_FLAG_MSI || - dev->mthca_flags & MTHCA_FLAG_MSI_X) { + if (dev->mthca_flags & MTHCA_FLAG_MSI_X) { dev->eq_table.clr_mask = 0; } else { dev->eq_table.clr_mask = @@ -854,8 +853,7 @@ int mthca_init_eq_table(struct mthca_dev *dev) dev->eq_table.arm_mask = 0; - intr = (dev->mthca_flags & MTHCA_FLAG_MSI) ? - 128 : dev->eq_table.inta_pin; + intr = dev->eq_table.inta_pin; err = mthca_create_eq(dev, dev->limits.num_cqs + MTHCA_NUM_SPARE_EQE, (dev->mthca_flags & MTHCA_FLAG_MSI_X) ? 128 : intr, diff --git a/drivers/infiniband/hw/mthca/mthca_main.c b/drivers/infiniband/hw/mthca/mthca_main.c index aa563e6..f5abdbf 100644 --- a/drivers/infiniband/hw/mthca/mthca_main.c +++ b/drivers/infiniband/hw/mthca/mthca_main.c @@ -67,7 +67,7 @@ MODULE_PARM_DESC(msi_x, "attempt to use MSI-X if nonzero"); static int msi = 0; module_param(msi, int, 0444); -MODULE_PARM_DESC(msi, "attempt to use MSI if nonzero"); +MODULE_PARM_DESC(msi, "(MSI support has been removed; ignored)"); #else /* CONFIG_PCI_MSI */ @@ -837,7 +837,7 @@ static int mthca_setup_hca(struct mthca_dev *dev) dev->mthca_flags & MTHCA_FLAG_MSI_X ? dev->eq_table.eq[MTHCA_EQ_CMD].msi_x_vector : dev->pdev->irq); - if (dev->mthca_flags & (MTHCA_FLAG_MSI | MTHCA_FLAG_MSI_X)) + if (dev->mthca_flags & MTHCA_FLAG_MSI_X) mthca_err(dev, "Try again with MSI/MSI-X disabled.\n"); else mthca_err(dev, "BIOS or ACPI interrupt routing problem?\n"); @@ -1117,9 +1117,8 @@ static int __mthca_init_one(struct pci_dev *pdev, int hca_type) if (msi_x && !mthca_enable_msi_x(mdev)) mdev->mthca_flags |= MTHCA_FLAG_MSI_X; - if (msi && !(mdev->mthca_flags & MTHCA_FLAG_MSI_X) && - !pci_enable_msi(pdev)) - mdev->mthca_flags |= MTHCA_FLAG_MSI; + if (msi) + mthca_warn(mdev, "MSI support has been removed; msi flag is ignored.\n"); if (mthca_cmd_init(mdev)) { mthca_err(mdev, "Failed to init command interface, aborting.\n"); @@ -1188,8 +1187,6 @@ err_cmd: err_free_dev: if (mdev->mthca_flags & MTHCA_FLAG_MSI_X) pci_disable_msix(pdev); - if (mdev->mthca_flags & MTHCA_FLAG_MSI) - pci_disable_msi(pdev); ib_dealloc_device(&mdev->ib_dev); @@ -1236,8 +1233,6 @@ static void __mthca_remove_one(struct pci_dev *pdev) if (mdev->mthca_flags & MTHCA_FLAG_MSI_X) pci_disable_msix(pdev); - if (mdev->mthca_flags & MTHCA_FLAG_MSI) - pci_disable_msi(pdev); ib_dealloc_device(&mdev->ib_dev); mthca_release_regions(pdev, mdev->mthca_flags & From rdreier at cisco.com Tue Jun 26 15:27:56 2007 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 26 Jun 2007 15:27:56 -0700 Subject: [ofa-general] Re: [PATCH 01/28] IB/ipath: include to fix ppc64 build In-Reply-To: <20070626221657.GO29798@bauxite.pathscale.com> (Arthur Jones's message of "Tue, 26 Jun 2007 15:16:57 -0700") References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234035.3794.7544.stgit@bauxite.internal.keyresearch.com> <20070626221657.GO29798@bauxite.pathscale.com> Message-ID: > ok, i have the patch in my tree (along w/ a couple > others), i was holding onto them until i got a chance > to test them. shall i send off the MAINTAINERS patch > separately? i expect to be able to get to testing by > the end of this week... whatever is fine, as long as the MAINTAINERS update is in the pipeline it's not particularly urgent. From arthur.jones at qlogic.com Tue Jun 26 15:29:21 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Tue, 26 Jun 2007 15:29:21 -0700 Subject: [ofa-general] Re: [PATCH 27/28] IB/ipath - when we check for LID availability, check for lack of interrupts too. In-Reply-To: References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234309.3794.784.stgit@bauxite.internal.keyresearch.com> Message-ID: <20070626222921.GQ29798@bauxite.pathscale.com> hi roland, ... On Tue, Jun 26, 2007 at 03:13:44PM -0700, Roland Dreier wrote: > I didn't apply this either because it depends on 26/28 and I held off > on that one. I think checking for interrupts in a low-level driver > *is* sane though... yeah, me too, if you _really_ don't want the ipath LID check, i can respin this one to combine them and lv out the LID check. but then the LID check is only gonna be a few lines, it'll seem even sillier to lv it out... arthur From elsen_david at hotmail.com Tue Jun 26 17:04:11 2007 From: elsen_david at hotmail.com (david elsen) Date: Tue, 26 Jun 2007 17:04:11 -0700 Subject: [ofa-general] Open Fabrics iWARP Driver for Chesio T3 card Message-ID: Can someone please let me know: 1. What is the latest Open Fabrics Driver for the Chesio T3 cards? 2. Is there any documentation there on The Open Fabrics website to install the iWARP driver for the T3 card? 3. Is there any documentation describing how to set the iWARP and Network interface for the T3 cards? David _________________________________________________________________ Make every IM count. Download Messenger and join the i�m Initiative now. It�s free. http://im.live.com/messenger/im/home/?source=TAGHM_June07 From ogerlitz at voltaire.com Tue Jun 26 22:01:33 2007 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 27 Jun 2007 08:01:33 +0300 Subject: [ewg] Re: [ofa-general] Toward next OFED release (1.3) In-Reply-To: References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> <20070626172130.GB26637@minantech.com> <20070626175343.GB5951@sgi.com> <468166D4.20204@mellanox.co.il> Message-ID: <4681EF2D.3010002@voltaire.com> Roland Dreier wrote: > > This comment is aligned with OFED development methodology. > > Regarding all kernel modules that are part of Linux: we first push the > > change to the kernel and base OFED on this code. > > We take kernel patches for bug fixes and portions that are targeted > > for the kernel inclusion. > > OFED does not come to be a bypass for the Linux kernel development process. > > Right, I think we agree on things here. I just want to emphasize that > the best and easiest way to get things into OFED is to get them into > upstream sources. And I hope OFED maintainers will start to push back > on patch submissions to OFED that have not at least been submitted for > upstream inclusion. Note that not that OFED 1.1 and 1.2 only include kernel drivers which are not upstream, some of them (eg SDP, RDS) never passed any --review-- cycle at the relevant mailing lists (openib,netdev,lkml). Now, for OFED 1.3 there's a suggestion to add rNFS which was also never reviewed. So "we agree on things here" but it does not happen, do people have suggestions how to move forward? Or. From jgunthorpe at obsidianresearch.com Tue Jun 26 22:33:27 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 26 Jun 2007 23:33:27 -0600 Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement In-Reply-To: <1182885534.28870.527.camel@hal.voltaire.com> References: <20070626102045.GS15343@mellanox.co.il> <1182862966.10379.425353.camel@hal.voltaire.com> <20070626132457.GA29602@mellanox.co.il> <20070626180157.GM25653@sashak.voltaire.com> <1182885534.28870.527.camel@hal.voltaire.com> Message-ID: <20070627053327.GH10225@obsidianresearch.com> On Tue, Jun 26, 2007 at 03:18:55PM -0400, Hal Rosenstock wrote: > > > > > compatibility name for C type". This type might not defined e.g. if > > > > > __STRICT_ANSI__ is set, > > > > > > > > Is strict ANSI a requirement ? > > > > Even if not, > > I was just trying to determine how much further we needed to go down > this path. As a general rule if you can compile each of your public headers files with: echo '#include "foo.h"' > t.c gcc -Wall -ansi t.c You are doing OK. What is in your private .c files isn't that important (and I'd advocate using -std=gnu99, but I never compile with VC++ :P). 'gcc -ansi -D_POSIX_SOURCE_' as a minimum is also pretty good. Jason From jgunthorpe at obsidianresearch.com Tue Jun 26 22:40:06 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 26 Jun 2007 23:40:06 -0600 Subject: [ofa-general] Re: [PATCH 26/28] IB/ipath - print warning if LID not acquired and link ACTIVE within one minute In-Reply-To: <20070626222556.GP29798@bauxite.pathscale.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234303.3794.75856.stgit@bauxite.internal.keyresearch.com> <20070626222556.GP29798@bauxite.pathscale.com> Message-ID: <20070627054006.GI10225@obsidianresearch.com> On Tue, Jun 26, 2007 at 03:25:56PM -0700, Arthur Jones wrote: > does this mean that there's a patch pending > to remove the gazillion link down messages > in drivers/net? These days alot of the ethernet drivers use one of the mii phy general codes that cause those messages to be printed.. The ethernet drivers are a bit of a bad example because there is alot of variations of the code to monitor the phy state machines so for consistency with the general mii stuff they have to print the message on their own. :| Jason From tziporet at mellanox.co.il Tue Jun 26 23:25:00 2007 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 27 Jun 2007 09:25:00 +0300 Subject: [ewg] RE: [ofa-general] Toward next OFED release (1.3) References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com><4FB1BCCAE6CAED44A1DC005B1DE06119291460@EPEXCH2.qlogic.org> Message-ID: <6C2C79E72C305246B504CBA17B5500C9015637A1@mtlexch01.mtl.com> I think we should try Tziporet -----Original Message----- From: ewg-bounces at lists.openfabrics.org [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Scott Weitzenkamp (sweitzen) Sent: Tuesday, June 26, 2007 10:49 PM To: Roland Dreier (rdreier); Lakshmanan, Madhu Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org Subject: RE: [ewg] RE: [ofa-general] Toward next OFED release (1.3) > I hope there will be some attempt to get these drivers merged > upstream too. How about SDP, are we ready to try to merge it upstream? Scott _______________________________________________ ewg mailing list ewg at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg From dotanb at dev.mellanox.co.il Wed Jun 27 01:29:14 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Wed, 27 Jun 2007 11:29:14 +0300 Subject: [ofa-general] The low level driver of mlx4 kmalloc 0 bytes in QP creation Message-ID: <46821FDA.5030900@dev.mellanox.co.il> Hi Roland. If one creates a QP with 0 WR in the RQ in the kernel level, the low level driver of the mlx4 will kmalloc 0 bytes (for the WR IDs of the RQ). (for example, the IPoIB CM creates such a QP) Is this is an error? thanks Dotan From vlad at lists.openfabrics.org Wed Jun 27 02:42:14 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Wed, 27 Jun 2007 02:42:14 -0700 (PDT) Subject: [ofa-general] ofa_1_2_c_kernel 20070627-0200 daily build status Message-ID: <20070627094215.51612E608C0@openfabrics.org> This email was generated automatically, please do not reply Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-mlx4-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.15 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.20 Passed on powerpc with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.17 Passed on ia64 with linux-2.6.13 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.15 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.16 Passed on powerpc with linux-2.6.13 Passed on x86_64 with linux-2.6.17 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.12 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on x86_64 with linux-2.6.14 Passed on ppc64 with linux-2.6.14 Passed on ia64 with linux-2.6.14 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.16 Passed on ppc64 with linux-2.6.12 Passed on ppc64 with linux-2.6.18 Passed on powerpc with linux-2.6.15 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on ppc64 with linux-2.6.13 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ppc64 with linux-2.6.18-8.el5 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-34.ELsmp Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Failed: From halr at voltaire.com Wed Jun 27 04:01:09 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Jun 2007 07:01:09 -0400 Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement In-Reply-To: <20070627053327.GH10225@obsidianresearch.com> References: <20070626102045.GS15343@mellanox.co.il> <1182862966.10379.425353.camel@hal.voltaire.com> <20070626132457.GA29602@mellanox.co.il> <20070626180157.GM25653@sashak.voltaire.com> <1182885534.28870.527.camel@hal.voltaire.com> <20070627053327.GH10225@obsidianresearch.com> Message-ID: <1182942065.28870.65696.camel@hal.voltaire.com> On Wed, 2007-06-27 at 01:33, Jason Gunthorpe wrote: > On Tue, Jun 26, 2007 at 03:18:55PM -0400, Hal Rosenstock wrote: > > > > > > > compatibility name for C type". This type might not defined e.g. if > > > > > > __STRICT_ANSI__ is set, > > > > > > > > > > Is strict ANSI a requirement ? > > > > > > Even if not, > > > > I was just trying to determine how much further we needed to go down > > this path. > > As a general rule if you can compile each of your public headers files > with: > > echo '#include "foo.h"' > t.c > gcc -Wall -ansi t.c > > You are doing OK. What is in your private .c files isn't that > important That's what I wasn't sure about. Thanks. > (and I'd advocate using -std=gnu99, but I never compile with > VC++ :P). > > 'gcc -ansi -D_POSIX_SOURCE_' as a minimum is also pretty good. How about: gcc -Wall -D_XOPEN_SOURCE=600 -- Hal > Jason From Mark.Seger at hp.com Wed Jun 27 06:17:36 2007 From: Mark.Seger at hp.com (Mark Seger) Date: Wed, 27 Jun 2007 09:17:36 -0400 Subject: [ofa-general] IB performance stats (revisited) Message-ID: <46826370.4090602@hp.com> I had posted something about this some time last year but now actually have some data to present. My problem statement with IB is there is no efficient way to get time-oriented performance numbers for all types of IB traffic. As far as I know nothing is available for all types of traffic, such as MPI. This is further complicated because IB counters do not wrap and as a result when the counters are integers, they end up latching in <30 seconds when under load. The only way I am aware to do what I want to do is by running perfquery AND then clearing the counters after each request which by definition prevents anyone else from accessing the counters including multiple instances of my program. To give people a better idea of what I'm talking about, below is an extract from a utility I've written called 'collectl' which has been in use on HP systems for about 4 years and which we've now Open Sourced at http://sourceforge.net/projects/collectl [shameless plug]. In the following sample I've requested cpu, network and IB stats (there are actually a whole lot of other things you can examine and you can learn more at http://collectl.sourceforge.net/index.html). Anyhow, what you're seeing below is a sample taken every second. At first there is no IB traffic. Then I start a 'netperf' and you can see the IB stats jump. A few seconds later I do a 'ping -f -s50000' to the ib interface and you can now see an increase in the network traffic. # <--------CPU--------><-----------Network----------><----------InfiniBand----------> #Time cpu sys inter ctxsw netKBi pkt-in netKBo pkt-out KBin pktIn KBOut pktOut Errs 08:48:19 0 0 1046 137 0 4 0 2 0 0 0 0 0 08:48:20 2 2 18659 170 0 10 0 5 925 10767 80478 41636 0 08:48:21 14 14 92368 1882 0 9 1 10 3403 39599 463892 235588 0 08:48:22 14 14 92167 2243 0 8 0 4 3186 37081 471246 238743 0 08:48:23 12 12 92131 2382 0 3 0 2 4456 37323 470766 238488 0 08:48:24 13 13 91708 2691 7 106 12 104 7300 38542 466580 236450 0 08:48:25 14 14 91675 2763 11 175 20 175 7434 38417 463952 235146 0 08:48:26 13 13 91712 2716 11 174 20 175 7486 38464 465195 235767 0 08:48:27 14 14 91755 2742 11 171 19 171 7502 38656 465079 235720 0 08:48:28 13 13 90131 2126 12 178 20 179 8257 44080 424930 217067 0 08:48:29 13 13 89974 2389 13 191 22 191 7801 37094 457082 231523 0 here's another display option where you can see just the ipoib traffic along with other network stats # NETWORK STATISTICS (/sec) # Num Name InPck InErr OutPck OutErr Mult ICmp OCmp IKB OKB 09:04:51 0 lo: 0 0 0 0 0 0 0 0 0 09:04:51 1 eth0: 23 0 4 0 0 0 0 1 0 09:04:51 2 eth1: 0 0 0 0 0 0 0 0 0 09:04:51 3 ib0: 900 0 900 0 0 0 0 1775 1779 09:04:51 4 sit0: 0 0 0 0 0 0 0 0 0 09:04:52 0 lo: 0 0 0 0 0 0 0 0 0 09:04:52 1 eth0: 127 0 126 0 0 0 0 8 15 09:04:52 2 eth1: 0 0 0 0 0 0 0 0 0 09:04:52 3 ib0: 2275 0 2275 0 0 0 0 4488 4497 09:04:52 4 sit0: 0 0 0 0 0 0 0 0 0 While this is a relatively light-weight operation (collectl uses <0.1% of the cpu), I still do have to call perfquery every second and that does generate a little overhead. Furthermore, since I'm continuously resetting the counters multiple instances of my tool or any other tool that relies on these counters won't work correctly! One solution that had been implemented in the Voltaire stack worked quite well and that was a loadable module that read/cleared the HCA counters, but exported them as wrapping counters in /proc. That way utilities could access the counters in /proc without stepping on each others toes. While still not the best solution, as long as the counters don't wrap in the HCA, read/clear is the only way to do what it is I'm trying to do, unless of course someone has a better solution. I also realize with 64 bit counters this becomes a non-issue but I'm trying to solve the more general case. comments? flames? 8-) -mark From halr at voltaire.com Wed Jun 27 06:32:51 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Jun 2007 09:32:51 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <46826370.4090602@hp.com> References: <46826370.4090602@hp.com> Message-ID: <1182951169.28870.75880.camel@hal.voltaire.com> On Wed, 2007-06-27 at 09:17, Mark Seger wrote: > I had posted something about this some time last year but now actually > have some data to present. > My problem statement with IB is there is no efficient way to get > time-oriented performance numbers for all types of IB traffic. As far > as I know nothing is available for all types of traffic, such as MPI. Not sure what you mean here. Are you looking for MPI counters ? > This is further complicated because IB counters do not wrap and as a > result when the counters are integers, they end up latching in <30 > seconds when under load. This is mostly a problem for the data counters. This is what the extended counters are for. > The only way I am aware to do what I want to > do is by running perfquery AND then clearing the counters after each > request which by definition prevents anyone else from accessing the > counters including multiple instances of my program. Yes, it is _bad_ if there are essentially multiple performance managers resetting the counters. There's now an experimental performance manager which has been discussed on the list. The performance data collected can be accessed. > To give people a better idea of what I'm talking about, below is an > extract from a utility I've written called 'collectl' which has been in > use on HP systems for about 4 years and which we've now Open Sourced at > http://sourceforge.net/projects/collectl [shameless plug]. In the > following sample I've requested cpu, network and IB stats (there are > actually a whole lot of other things you can examine and you can learn > more at http://collectl.sourceforge.net/index.html). So you are looking for packets/bytes in/out only. > Anyhow, what > you're seeing below is a sample taken every second. At first there is > no IB traffic. Then I start a 'netperf' and you can see the IB stats > jump. A few seconds later I do a 'ping -f -s50000' to the ib interface > and you can now see an increase in the network traffic. > > # > <--------CPU--------><-----------Network----------><----------InfiniBand----------> > #Time cpu sys inter ctxsw netKBi pkt-in netKBo pkt-out KBin > pktIn KBOut pktOut Errs > 08:48:19 0 0 1046 137 0 4 0 2 0 > 0 0 0 0 > 08:48:20 2 2 18659 170 0 10 0 5 925 > 10767 80478 41636 0 > 08:48:21 14 14 92368 1882 0 9 1 10 3403 > 39599 463892 235588 0 > 08:48:22 14 14 92167 2243 0 8 0 4 3186 > 37081 471246 238743 0 > 08:48:23 12 12 92131 2382 0 3 0 2 4456 > 37323 470766 238488 0 > 08:48:24 13 13 91708 2691 7 106 12 104 7300 > 38542 466580 236450 0 > 08:48:25 14 14 91675 2763 11 175 20 175 7434 > 38417 463952 235146 0 > 08:48:26 13 13 91712 2716 11 174 20 175 7486 > 38464 465195 235767 0 > 08:48:27 14 14 91755 2742 11 171 19 171 7502 > 38656 465079 235720 0 > 08:48:28 13 13 90131 2126 12 178 20 179 8257 > 44080 424930 217067 0 > 08:48:29 13 13 89974 2389 13 191 22 191 7801 > 37094 457082 231523 0 > > here's another display option where you can see just the ipoib traffic > along with other network stats > > # NETWORK STATISTICS (/sec) > # Num Name InPck InErr OutPck OutErr Mult ICmp > OCmp IKB OKB > 09:04:51 0 lo: 0 0 0 0 0 0 > 0 0 0 > 09:04:51 1 eth0: 23 0 4 0 0 0 > 0 1 0 > 09:04:51 2 eth1: 0 0 0 0 0 0 > 0 0 0 > 09:04:51 3 ib0: 900 0 900 0 0 0 0 > 1775 1779 > 09:04:51 4 sit0: 0 0 0 0 0 0 > 0 0 0 > 09:04:52 0 lo: 0 0 0 0 0 0 > 0 0 0 > 09:04:52 1 eth0: 127 0 126 0 0 0 > 0 8 15 > 09:04:52 2 eth1: 0 0 0 0 0 0 > 0 0 0 > 09:04:52 3 ib0: 2275 0 2275 0 0 0 0 > 4488 4497 > 09:04:52 4 sit0: 0 0 0 0 0 0 > 0 0 0 > > While this is a relatively light-weight operation (collectl uses <0.1% > of the cpu), I still do have to call perfquery every second and that > does generate a little overhead. Furthermore, since I'm continuously > resetting the counters multiple instances of my tool or any other tool > that relies on these counters won't work correctly! > > One solution that had been implemented in the Voltaire stack worked > quite well and that was a loadable module that read/cleared the HCA > counters, but exported them as wrapping counters in /proc. That way > utilities could access the counters in /proc without stepping on each > others toes. Once in /proc, how are they all collected up ? Via IPoIB or out of band ethernet ? > While still not the best solution, as long as the counters > don't wrap in the HCA, read/clear is the only way to do what it is I'm > trying to do, unless of course someone has a better solution. Doesn't have the same problem as doing it the PMA way ? Doesn't this impact other performance managers ? > I also > realize with 64 bit counters this becomes a non-issue but I'm trying to > solve the more general case. More devices are supporting these and it should be easier to do so with IBA 1.2.1 -- Hal > comments? flames? 8-) > > -mark > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From Mark.Seger at hp.com Wed Jun 27 07:10:00 2007 From: Mark.Seger at hp.com (Mark Seger) Date: Wed, 27 Jun 2007 10:10:00 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <1182951169.28870.75880.camel@hal.voltaire.com> References: <46826370.4090602@hp.com> <1182951169.28870.75880.camel@hal.voltaire.com> Message-ID: <46826FB8.10904@hp.com> btw - I've cc'd Ed on this so be sure to include him in your replies. Hal Rosenstock wrote: > On Wed, 2007-06-27 at 09:17, Mark Seger wrote: > >> I had posted something about this some time last year but now actually >> have some data to present. >> My problem statement with IB is there is no efficient way to get >> time-oriented performance numbers for all types of IB traffic. As far >> as I know nothing is available for all types of traffic, such as MPI. >> > > Not sure what you mean here. Are you looking for MPI counters ? > sorry for not being clearer. I'm looking for total aggregate I/O. >> This is further complicated because IB counters do not wrap and as a >> result when the counters are integers, they end up latching in <30 >> seconds when under load. >> > > This is mostly a problem for the data counters. This is what the > extended counters are for > but it's the data counters I'm interested in. >> The only way I am aware to do what I want to >> do is by running perfquery AND then clearing the counters after each >> request which by definition prevents anyone else from accessing the >> counters including multiple instances of my program. >> > > Yes, it is _bad_ if there are essentially multiple performance managers > resetting the counters. > I realize it's bad but since the counters don't wrap I have no alternative. > There's now an experimental performance manager which has been discussed > on the list. The performance data collected can be accessed. > alas, since I use this tool on commercial systems, I can't run it against experimental code. perhaps when the experimental becomes real I can. I'll try to find the notes in the archives. >> To give people a better idea of what I'm talking about, below is an >> extract from a utility I've written called 'collectl' which has been in >> use on HP systems for about 4 years and which we've now Open Sourced at >> http://sourceforge.net/projects/collectl [shameless plug]. In the >> following sample I've requested cpu, network and IB stats (there are >> actually a whole lot of other things you can examine and you can learn >> more at http://collectl.sourceforge.net/index.html). >> > > So you are looking for packets/bytes in/out only. > That's a good start. Since I'm using perfquery I'm also reporting aggregate error counts as well as you can see in my program output below. The theory is these should rarely be set and if they are, their total should be sufficient to highly a problem without taking up a lot of screen real estate. >> Anyhow, what >> you're seeing below is a sample taken every second. At first there is >> no IB traffic. Then I start a 'netperf' and you can see the IB stats >> jump. A few seconds later I do a 'ping -f -s50000' to the ib interface >> and you can now see an increase in the network traffic. >> >> # >> <--------CPU--------><-----------Network----------><----------InfiniBand----------> >> #Time cpu sys inter ctxsw netKBi pkt-in netKBo pkt-out KBin >> pktIn KBOut pktOut Errs >> 08:48:19 0 0 1046 137 0 4 0 2 0 >> 0 0 0 0 >> 08:48:20 2 2 18659 170 0 10 0 5 925 >> 10767 80478 41636 0 >> 08:48:21 14 14 92368 1882 0 9 1 10 3403 >> 39599 463892 235588 0 >> 08:48:22 14 14 92167 2243 0 8 0 4 3186 >> 37081 471246 238743 0 >> 08:48:23 12 12 92131 2382 0 3 0 2 4456 >> 37323 470766 238488 0 >> 08:48:24 13 13 91708 2691 7 106 12 104 7300 >> 38542 466580 236450 0 >> 08:48:25 14 14 91675 2763 11 175 20 175 7434 >> 38417 463952 235146 0 >> 08:48:26 13 13 91712 2716 11 174 20 175 7486 >> 38464 465195 235767 0 >> 08:48:27 14 14 91755 2742 11 171 19 171 7502 >> 38656 465079 235720 0 >> 08:48:28 13 13 90131 2126 12 178 20 179 8257 >> 44080 424930 217067 0 >> 08:48:29 13 13 89974 2389 13 191 22 191 7801 >> 37094 457082 231523 0 >> >> here's another display option where you can see just the ipoib traffic >> along with other network stats >> >> # NETWORK STATISTICS (/sec) >> # Num Name InPck InErr OutPck OutErr Mult ICmp >> OCmp IKB OKB >> 09:04:51 0 lo: 0 0 0 0 0 0 >> 0 0 0 >> 09:04:51 1 eth0: 23 0 4 0 0 0 >> 0 1 0 >> 09:04:51 2 eth1: 0 0 0 0 0 0 >> 0 0 0 >> 09:04:51 3 ib0: 900 0 900 0 0 0 0 >> 1775 1779 >> 09:04:51 4 sit0: 0 0 0 0 0 0 >> 0 0 0 >> 09:04:52 0 lo: 0 0 0 0 0 0 >> 0 0 0 >> 09:04:52 1 eth0: 127 0 126 0 0 0 >> 0 8 15 >> 09:04:52 2 eth1: 0 0 0 0 0 0 >> 0 0 0 >> 09:04:52 3 ib0: 2275 0 2275 0 0 0 0 >> 4488 4497 >> 09:04:52 4 sit0: 0 0 0 0 0 0 >> 0 0 0 >> >> While this is a relatively light-weight operation (collectl uses <0.1% >> of the cpu), I still do have to call perfquery every second and that >> does generate a little overhead. Furthermore, since I'm continuously >> resetting the counters multiple instances of my tool or any other tool >> that relies on these counters won't work correctly! >> >> One solution that had been implemented in the Voltaire stack worked >> quite well and that was a loadable module that read/cleared the HCA >> counters, but exported them as wrapping counters in /proc. That way >> utilities could access the counters in /proc without stepping on each >> others toes. >> > > Once in /proc, how are they all collected up ? Via IPoIB or out of band > ethernet ? > Not sure I understand the question. They're written to /proc via a module. They're collected up via my tool simply reading them back and parsing the return string which looks like ib0-1: 1 0 1 0x0000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 This is essentially the same data reported by get_pcounter reformatted to a single line for easier/faster parsing by collectl >> While still not the best solution, as long as the counters >> don't wrap in the HCA, read/clear is the only way to do what it is I'm >> trying to do, unless of course someone has a better solution. >> > > Doesn't have the same problem as doing it the PMA way ? Doesn't this > impact other performance managers ? > Good point, but I guess I'm between a rock and a hard place. imho: as long as the counters don't wrap this problem will never be solved. I'm trying to address a specific monitoring scenario, one which collects data locally for analysis after a system problem occurs. I discovered long ago that central management solutions may work fine when trying to assess the health of many systems, but when something goes wrong with the network the only data that can tell you what's going wrong can't get back to the management station over the now broken network. My philosophy is if you want to continuously collect reliable performance metrics you need to use minimal system resources to do so and that means no network communications. I guess that means people need to decide if they want to use collectl to gather local IB stats they have to forego doing it globally. So what is the chance of ever seeing wrapping IB counters? Probably none, right? 8-( >> I also >> realize with 64 bit counters this becomes a non-issue but I'm trying to >> solve the more general case. >> > > More devices are supporting these and it should be easier to do so with > IBA 1.2.1 > Is there an easy way to tell how wide the counters are via software? Do any utilities currently report this? > -- Hal > > >> comments? flames? 8-) >> >> -mark >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >> From Mark.Seger at hp.com Wed Jun 27 08:00:48 2007 From: Mark.Seger at hp.com (Mark Seger) Date: Wed, 27 Jun 2007 11:00:48 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <46826FB8.10904@hp.com> References: <46826370.4090602@hp.com> <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> Message-ID: <46827BA0.6070008@hp.com> >> >> Doesn't have the same problem as doing it the PMA way ? Doesn't this >> impact other performance managers ? > It just occurred to me, how can other performance managers report aggregate throughput if the counters don't wrap? They'll have exactly the same problem as me unless they're getting the counters elsewhere. I do recall some switch vendors recommending I ask the switch for the counters which they maintain locally but I find that to expensive AND I don't want to have to rely on the network as I'd mentioned in my previous reply. -mark From halr at voltaire.com Wed Jun 27 08:21:44 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Jun 2007 11:21:44 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <46827BA0.6070008@hp.com> References: <46826370.4090602@hp.com> <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <46827BA0.6070008@hp.com> Message-ID: <1182957688.28870.83013.camel@hal.voltaire.com> On Wed, 2007-06-27 at 11:00, Mark Seger wrote: > >> > >> Doesn't have the same problem as doing it the PMA way ? Doesn't this > >> impact other performance managers ? > > > It just occurred to me, how can other performance managers report > aggregate throughput if the counters don't wrap? They'll have exactly > the same problem as me unless they're getting the counters elsewhere. I > do recall some switch vendors recommending I ask the switch for the > counters which they maintain locally but I find that to expensive AND I > don't want to have to rely on the network as I'd mentioned in my > previous reply. The performance managers deal with the counter stickiness (by resetting them when they think they need to). They typically export their data although this is not specified by IBA so it is in a vendor proprietary manner. -- Hal > -mark > > From paulvidrine at charter.net Wed Jun 27 08:24:24 2007 From: paulvidrine at charter.net (BRITISHWEBLOTTERY6/49) Date: Wed, 27 Jun 2007 8:24:24 -0700 Subject: [ofa-general] ACKNOWLEDGE RECEIPT Message-ID: <690511366.1182957866127.JavaMail.root@fepweb09> BRITISH LOTTERY6/49 12 Whitehall , London SW1A 2DY, United Kingdom. 27th June 2007. Dear Recipient We wish to congratulate you over your email success in our FREE BRITISH WEB LOTTERY computer balloting held on 26Th June, 2007. This is a Millennium Scientific Computer Game in which email addresses were used. It is our promotional lottery program aimed at encouraging internet users; therefore you do not need to buy ticket to enter for it. You have been approved for the star prize of 1,006.168 GBP(One million six thousand one hundred and sixty-eight Pound Sterling) To claim your winning prize you are to contact the appointed agent as soon as possible for the immediate release of your winnings: Ticket no: 025-1146-1992-750 Serial no:2113-05 Lucky no: 13-15-22-37-39-43 REF NO:BRLFGP2551256/03 Amount won: £1,006,168.00 You are to contact: Mr. Richard Parker Email:agentrichard_parker at yahoo.co.uk You must contact the appointed agent with your Full Names, Contact Telephone Numbers (Home, Office and Mobile Number and also Fax Number) via email to process the immediate payment of your prize. The Validity period of the winnings is for 30 working days hence you are expected to make your claims immediately. Once again congratulations!!! Sincerely, Mr. George Scherrer From halr at voltaire.com Wed Jun 27 08:30:00 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Jun 2007 11:30:00 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <46826FB8.10904@hp.com> References: <46826370.4090602@hp.com> <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> Message-ID: <1182958191.28870.83536.camel@hal.voltaire.com> On Wed, 2007-06-27 at 10:10, Mark Seger wrote: > btw - I've cc'd Ed on this so be sure to include him in your replies. > > Hal Rosenstock wrote: > > On Wed, 2007-06-27 at 09:17, Mark Seger wrote: > > > >> I had posted something about this some time last year but now actually > >> have some data to present. > >> My problem statement with IB is there is no efficient way to get > >> time-oriented performance numbers for all types of IB traffic. As far > >> as I know nothing is available for all types of traffic, such as MPI. > >> > > > > Not sure what you mean here. Are you looking for MPI counters ? > > > sorry for not being clearer. I'm looking for total aggregate I/O. > >> This is further complicated because IB counters do not wrap and as a > >> result when the counters are integers, they end up latching in <30 > >> seconds when under load. > >> > > > > This is mostly a problem for the data counters. This is what the > > extended counters are for > > > but it's the data counters I'm interested in. Yes, there are data counters in both PortCounters and PortCountersExtended. The latter is an optional attribute. > >> The only way I am aware to do what I want to > >> do is by running perfquery AND then clearing the counters after each > >> request which by definition prevents anyone else from accessing the > >> counters including multiple instances of my program. > >> > > > > Yes, it is _bad_ if there are essentially multiple performance managers > > resetting the counters. > > > I realize it's bad but since the counters don't wrap I have no alternative. > > There's now an experimental performance manager which has been discussed > > on the list. The performance data collected can be accessed. > > > alas, since I use this tool on commercial systems, I can't run it > against experimental code. perhaps when the experimental becomes real I > can. It should be in the OFED 1.3 timeframe. Also, there are vendor Performance Managers too. > I'll try to find the notes in the archives. I can send you this if you can't find it. > >> To give people a better idea of what I'm talking about, below is an > >> extract from a utility I've written called 'collectl' which has been in > >> use on HP systems for about 4 years and which we've now Open Sourced at > >> http://sourceforge.net/projects/collectl [shameless plug]. In the > >> following sample I've requested cpu, network and IB stats (there are > >> actually a whole lot of other things you can examine and you can learn > >> more at http://collectl.sourceforge.net/index.html). > >> > > > > So you are looking for packets/bytes in/out only. > > > That's a good start. Since I'm using perfquery I'm also reporting > aggregate error counts as well as you can see in my program output > below. The theory is these should rarely be set and if they are, their > total should be sufficient to highly a problem without taking up a lot > of screen real estate. > >> Anyhow, what > >> you're seeing below is a sample taken every second. At first there is > >> no IB traffic. Then I start a 'netperf' and you can see the IB stats > >> jump. A few seconds later I do a 'ping -f -s50000' to the ib interface > >> and you can now see an increase in the network traffic. > >> > >> # > >> <--------CPU--------><-----------Network----------><----------InfiniBand----------> > >> #Time cpu sys inter ctxsw netKBi pkt-in netKBo pkt-out KBin > >> pktIn KBOut pktOut Errs > >> 08:48:19 0 0 1046 137 0 4 0 2 0 > >> 0 0 0 0 > >> 08:48:20 2 2 18659 170 0 10 0 5 925 > >> 10767 80478 41636 0 > >> 08:48:21 14 14 92368 1882 0 9 1 10 3403 > >> 39599 463892 235588 0 > >> 08:48:22 14 14 92167 2243 0 8 0 4 3186 > >> 37081 471246 238743 0 > >> 08:48:23 12 12 92131 2382 0 3 0 2 4456 > >> 37323 470766 238488 0 > >> 08:48:24 13 13 91708 2691 7 106 12 104 7300 > >> 38542 466580 236450 0 > >> 08:48:25 14 14 91675 2763 11 175 20 175 7434 > >> 38417 463952 235146 0 > >> 08:48:26 13 13 91712 2716 11 174 20 175 7486 > >> 38464 465195 235767 0 > >> 08:48:27 14 14 91755 2742 11 171 19 171 7502 > >> 38656 465079 235720 0 > >> 08:48:28 13 13 90131 2126 12 178 20 179 8257 > >> 44080 424930 217067 0 > >> 08:48:29 13 13 89974 2389 13 191 22 191 7801 > >> 37094 457082 231523 0 > >> > >> here's another display option where you can see just the ipoib traffic > >> along with other network stats > >> > >> # NETWORK STATISTICS (/sec) > >> # Num Name InPck InErr OutPck OutErr Mult ICmp > >> OCmp IKB OKB > >> 09:04:51 0 lo: 0 0 0 0 0 0 > >> 0 0 0 > >> 09:04:51 1 eth0: 23 0 4 0 0 0 > >> 0 1 0 > >> 09:04:51 2 eth1: 0 0 0 0 0 0 > >> 0 0 0 > >> 09:04:51 3 ib0: 900 0 900 0 0 0 0 > >> 1775 1779 > >> 09:04:51 4 sit0: 0 0 0 0 0 0 > >> 0 0 0 > >> 09:04:52 0 lo: 0 0 0 0 0 0 > >> 0 0 0 > >> 09:04:52 1 eth0: 127 0 126 0 0 0 > >> 0 8 15 > >> 09:04:52 2 eth1: 0 0 0 0 0 0 > >> 0 0 0 > >> 09:04:52 3 ib0: 2275 0 2275 0 0 0 0 > >> 4488 4497 > >> 09:04:52 4 sit0: 0 0 0 0 0 0 > >> 0 0 0 > >> > >> While this is a relatively light-weight operation (collectl uses <0.1% > >> of the cpu), I still do have to call perfquery every second and that > >> does generate a little overhead. Furthermore, since I'm continuously > >> resetting the counters multiple instances of my tool or any other tool > >> that relies on these counters won't work correctly! > >> > >> One solution that had been implemented in the Voltaire stack worked > >> quite well and that was a loadable module that read/cleared the HCA > >> counters, but exported them as wrapping counters in /proc. That way > >> utilities could access the counters in /proc without stepping on each > >> others toes. > >> > > > > Once in /proc, how are they all collected up ? Via IPoIB or out of band > > ethernet ? > > > Not sure I understand the question. They're written to /proc via a > module. They're collected up via my tool simply reading them back and > parsing the return string which looks like > > ib0-1: 1 0 1 0x0000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > This is essentially the same data reported by get_pcounter reformatted > to a single line for easier/faster parsing by collectl I was thinking your tool collects this info from all nodes in the network somehow. > >> While still not the best solution, as long as the counters > >> don't wrap in the HCA, read/clear is the only way to do what it is I'm > >> trying to do, unless of course someone has a better solution. > >> > > > > Doesn't have the same problem as doing it the PMA way ? Doesn't this > > impact other performance managers ? > > > Good point, but I guess I'm between a rock and a hard place. imho: as > long as the counters don't wrap this problem will never be solved. It's the IBTA standard (rather than IETF style counters). I don't think it's going to change. > I'm trying to address a specific monitoring scenario, one which collects > data locally for analysis after a system problem occurs. I discovered > long ago that central management solutions may work fine when trying to > assess the health of many systems, but when something goes wrong with > the network the only data that can tell you what's going wrong can't get > back to the management station over the now broken network. My > philosophy is if you want to continuously collect reliable performance > metrics you need to use minimal system resources to do so and that means > no network communications. I guess that means people need to decide if > they want to use collectl to gather local IB stats they have to forego > doing it globally. Guess that's a tradeoff that customers will may need to make. In your environment, sounds like one turns the performance manager off. As the PerfMgr is an unarchitected IBA component, there are no events defined which might help with coordinating this. So either this would need to be vendor specific, or the two tools will interfere with each other. > So what is the chance of ever seeing wrapping IB counters? Probably > none, right? 8-( > > >> I also > >> realize with 64 bit counters this becomes a non-issue but I'm trying to > >> solve the more general case. > >> > > > > More devices are supporting these and it should be easier to do so with > > IBA 1.2.1 > > > Is there an easy way to tell how wide the counters are via software? Do > any utilities currently report this? Yes via the PMA it can be done with some extra queries. -- Hal > > -- Hal > > > > > >> comments? flames? 8-) > >> > >> -mark > >> > >> _______________________________________________ > >> general mailing list > >> general at lists.openfabrics.org > >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >> > >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > >> > From swise at opengridcomputing.com Wed Jun 27 08:51:19 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 27 Jun 2007 10:51:19 -0500 Subject: [ofa-general] [PATCH 0/6] iw_cxgb3: Bug Fixes for 2.6.23 Message-ID: <20070627155119.24944.44172.stgit@dell3.ogc.int> Hey Roland, Here are some bug fixes to the iw_cxgb3 driver that I'd like included for 2.6.23. NOTE: Patch 1 requires a firmware interface change, so there is a version bump to 4.3 included in that patch that hits cxgb3. This will likely conflict with a previous version change that is in Jeff's upstream branch. The net is: we need the firmware version bumped to 4.3 with these iw_cxgb3 changes. Thanks, Steve. Shortlog: iw_cxgb3: Streaming -> RDMA mode transition fixes. iw_cxgb3: TERMINATE WRs can hang the tx ofld queue. iw_cxgb3: Don't count neg_adv abort_req_rss messages as real aborts. iw_cxgb3: ctrl-qp init/clear shouldn't set the gen bit. iw_cxgb3: Don't post TID_RELEASE message. iw_cxgb3: Don't abort after failures sending the mpa reply. From swise at opengridcomputing.com Wed Jun 27 08:51:25 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 27 Jun 2007 10:51:25 -0500 Subject: [ofa-general] [PATCH 1/6] iw_cxgb3: Streaming -> RDMA mode transition fixes. In-Reply-To: <20070627155119.24944.44172.stgit@dell3.ogc.int> References: <20070627155119.24944.44172.stgit@dell3.ogc.int> Message-ID: <20070627155124.24944.26940.stgit@dell3.ogc.int> iw_cxgb3: Streaming -> RDMA mode transition fixes. Due to a HW issue, our current scheme to transition the connection from streaming to rdma mode is broken on the passive side. The firmware and driver now support a new transition scheme for the passive side: - driver posts rdma_init_wr (now including the initial receive seqno) - driver posts last streaming message via TX_DATA message (MPA start response) - uP atomically sends the last streaming message and transitions the tcb to rdma mode. - driver waits for wr_ack indicating the last streaming message was ACKed. NOTE: This change also bumps the required firmware version to 4.3. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/cxio_hal.c | 2 - drivers/infiniband/hw/cxgb3/cxio_wr.h | 3 + drivers/infiniband/hw/cxgb3/iwch_cm.c | 82 ++++++++++++-------------------- drivers/infiniband/hw/cxgb3/iwch_cm.h | 1 drivers/infiniband/hw/cxgb3/iwch_qp.c | 1 drivers/net/cxgb3/version.h | 2 - 6 files changed, 38 insertions(+), 53 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c index 76049af..215bbe5 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c @@ -833,7 +833,7 @@ int cxio_rdma_init(struct cxio_rdev *rde wqe->ird = cpu_to_be32(attr->ird); wqe->qp_dma_addr = cpu_to_be64(attr->qp_dma_addr); wqe->qp_dma_size = cpu_to_be32(attr->qp_dma_size); - wqe->rsvd = 0; + wqe->irs = cpu_to_be32(attr->irs); skb->priority = 0; /* 0=>ToeQ; 1=>CtrlQ */ return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb)); } diff --git a/drivers/infiniband/hw/cxgb3/cxio_wr.h b/drivers/infiniband/hw/cxgb3/cxio_wr.h index ff7290e..c84d4ac 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_wr.h +++ b/drivers/infiniband/hw/cxgb3/cxio_wr.h @@ -294,6 +294,7 @@ struct t3_rdma_init_attr { u64 qp_dma_addr; u32 qp_dma_size; u32 flags; + u32 irs; }; struct t3_rdma_init_wr { @@ -314,7 +315,7 @@ struct t3_rdma_init_wr { __be32 ird; __be64 qp_dma_addr; /* 7 */ __be32 qp_dma_size; /* 8 */ - u32 rsvd; + u32 irs; }; struct t3_genbit { diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index b2faff5..7b8d5aa 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -515,7 +515,7 @@ static void send_mpa_req(struct iwch_ep req->len = htonl(len); req->param = htonl(V_TX_PORT(ep->l2t->smt_idx) | V_TX_SNDBUF(snd_win>>15)); - req->flags = htonl(F_TX_IMM_ACK|F_TX_INIT); + req->flags = htonl(F_TX_INIT); req->sndseq = htonl(ep->snd_seq); BUG_ON(ep->mpa_skb); ep->mpa_skb = skb; @@ -566,7 +566,7 @@ static int send_mpa_reject(struct iwch_e req->len = htonl(mpalen); req->param = htonl(V_TX_PORT(ep->l2t->smt_idx) | V_TX_SNDBUF(snd_win>>15)); - req->flags = htonl(F_TX_IMM_ACK|F_TX_INIT); + req->flags = htonl(F_TX_INIT); req->sndseq = htonl(ep->snd_seq); BUG_ON(ep->mpa_skb); ep->mpa_skb = skb; @@ -618,7 +618,7 @@ static int send_mpa_reply(struct iwch_ep req->len = htonl(len); req->param = htonl(V_TX_PORT(ep->l2t->smt_idx) | V_TX_SNDBUF(snd_win>>15)); - req->flags = htonl(F_TX_MORE | F_TX_IMM_ACK | F_TX_INIT); + req->flags = htonl(F_TX_INIT); req->sndseq = htonl(ep->snd_seq); ep->mpa_skb = skb; state_set(&ep->com, MPA_REP_SENT); @@ -641,6 +641,7 @@ static int act_establish(struct t3cdev * cxgb3_insert_tid(ep->com.tdev, &t3c_client, ep, tid); ep->snd_seq = ntohl(req->snd_isn); + ep->rcv_seq = ntohl(req->rcv_isn); set_emss(ep, ntohs(req->tcp_opt)); @@ -1023,6 +1024,9 @@ static int rx_data(struct t3cdev *tdev, skb_pull(skb, sizeof(*hdr)); skb_trim(skb, dlen); + ep->rcv_seq += dlen; + BUG_ON(ep->rcv_seq != (ntohl(hdr->seq) + dlen)); + switch (state_read(&ep->com)) { case MPA_REQ_SENT: process_mpa_reply(ep, skb); @@ -1060,7 +1064,6 @@ static int tx_ack(struct t3cdev *tdev, s struct iwch_ep *ep = ctx; struct cpl_wr_ack *hdr = cplhdr(skb); unsigned int credits = ntohs(hdr->credits); - enum iwch_qp_attr_mask mask; PDBG("%s ep %p credits %u\n", __FUNCTION__, ep, credits); @@ -1072,30 +1075,6 @@ static int tx_ack(struct t3cdev *tdev, s ep->mpa_skb = NULL; dst_confirm(ep->dst); if (state_read(&ep->com) == MPA_REP_SENT) { - struct iwch_qp_attributes attrs; - - /* bind QP to EP and move to RTS */ - attrs.mpa_attr = ep->mpa_attr; - attrs.max_ird = ep->ord; - attrs.max_ord = ep->ord; - attrs.llp_stream_handle = ep; - attrs.next_state = IWCH_QP_STATE_RTS; - - /* bind QP and TID with INIT_WR */ - mask = IWCH_QP_ATTR_NEXT_STATE | - IWCH_QP_ATTR_LLP_STREAM_HANDLE | - IWCH_QP_ATTR_MPA_ATTR | - IWCH_QP_ATTR_MAX_IRD | - IWCH_QP_ATTR_MAX_ORD; - - ep->com.rpl_err = iwch_modify_qp(ep->com.qp->rhp, - ep->com.qp, mask, &attrs, 1); - - if (!ep->com.rpl_err) { - state_set(&ep->com, FPDU_MODE); - established_upcall(ep); - } - ep->com.rpl_done = 1; PDBG("waking up ep %p\n", ep); wake_up(&ep->com.waitq); @@ -1378,6 +1357,7 @@ static int pass_establish(struct t3cdev PDBG("%s ep %p\n", __FUNCTION__, ep); ep->snd_seq = ntohl(req->snd_isn); + ep->rcv_seq = ntohl(req->rcv_isn); set_emss(ep, ntohs(req->tcp_opt)); @@ -1732,10 +1712,8 @@ int iwch_accept_cr(struct iw_cm_id *cm_i struct iwch_qp *qp = get_qhp(h, conn_param->qpn); PDBG("%s ep %p tid %u\n", __FUNCTION__, ep, ep->hwtid); - if (state_read(&ep->com) == DEAD) { - put_ep(&ep->com); + if (state_read(&ep->com) == DEAD) return -ECONNRESET; - } BUG_ON(state_read(&ep->com) != MPA_REQ_RCVD); BUG_ON(!qp); @@ -1755,17 +1733,8 @@ int iwch_accept_cr(struct iw_cm_id *cm_i ep->ird = conn_param->ird; ep->ord = conn_param->ord; PDBG("%s %d ird %d ord %d\n", __FUNCTION__, __LINE__, ep->ird, ep->ord); + get_ep(&ep->com); - err = send_mpa_reply(ep, conn_param->private_data, - conn_param->private_data_len); - if (err) { - ep->com.cm_id = NULL; - ep->com.qp = NULL; - cm_id->rem_ref(cm_id); - abort_connection(ep, NULL, GFP_KERNEL); - put_ep(&ep->com); - return err; - } /* bind QP to EP and move to RTS */ attrs.mpa_attr = ep->mpa_attr; @@ -1783,16 +1752,29 @@ int iwch_accept_cr(struct iw_cm_id *cm_i err = iwch_modify_qp(ep->com.qp->rhp, ep->com.qp, mask, &attrs, 1); + if (err) + goto err; - if (err) { - ep->com.cm_id = NULL; - ep->com.qp = NULL; - cm_id->rem_ref(cm_id); - abort_connection(ep, NULL, GFP_KERNEL); - } else { - state_set(&ep->com, FPDU_MODE); - established_upcall(ep); - } + err = send_mpa_reply(ep, conn_param->private_data, + conn_param->private_data_len); + if (err) + goto err; + + /* wait for wr_ack */ + wait_event(ep->com.waitq, ep->com.rpl_done); + err = ep->com.rpl_err; + if (err) + goto err; + + state_set(&ep->com, FPDU_MODE); + established_upcall(ep); + put_ep(&ep->com); + return 0; +err: + ep->com.cm_id = NULL; + ep->com.qp = NULL; + cm_id->rem_ref(cm_id); + abort_connection(ep, NULL, GFP_KERNEL); put_ep(&ep->com); return err; } diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.h b/drivers/infiniband/hw/cxgb3/iwch_cm.h index 21a388c..6107e7c 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.h +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.h @@ -175,6 +175,7 @@ struct iwch_ep { unsigned int atid; u32 hwtid; u32 snd_seq; + u32 rcv_seq; struct l2t_entry *l2t; struct dst_entry *dst; struct sk_buff *mpa_skb; diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index 714dddb..679b7c1 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -732,6 +732,7 @@ #endif init_attr.qp_dma_addr = qhp->wq.dma_addr; init_attr.qp_dma_size = (1UL << qhp->wq.size_log2); init_attr.flags = rqes_posted(qhp) ? RECVS_POSTED : 0; + init_attr.irs = qhp->ep->rcv_seq; PDBG("%s init_attr.rq_addr 0x%x init_attr.rq_size = %d " "flags 0x%x qpcaps 0x%x\n", __FUNCTION__, init_attr.rq_addr, init_attr.rq_size, diff --git a/drivers/net/cxgb3/version.h b/drivers/net/cxgb3/version.h index b112317..eb508bf 100644 --- a/drivers/net/cxgb3/version.h +++ b/drivers/net/cxgb3/version.h @@ -39,6 +39,6 @@ #define DRV_VERSION "1.0-ko" /* Firmware version */ #define FW_VERSION_MAJOR 4 -#define FW_VERSION_MINOR 0 +#define FW_VERSION_MINOR 3 #define FW_VERSION_MICRO 0 #endif /* __CHELSIO_VERSION_H */ From swise at opengridcomputing.com Wed Jun 27 08:51:30 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 27 Jun 2007 10:51:30 -0500 Subject: [ofa-general] [PATCH 2/6] iw_cxgb3: TERMINATE WRs can hang the tx ofld queue. In-Reply-To: <20070627155119.24944.44172.stgit@dell3.ogc.int> References: <20070627155119.24944.44172.stgit@dell3.ogc.int> Message-ID: <20070627155130.24944.55771.stgit@dell3.ogc.int> iw_cxgb3: TERMINATE WRs can hang the tx ofld queue. Don't set the gen bits nor length bits in the terminate wr. This is done by the LLD driver. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_qp.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index 679b7c1..dd89b6b 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -628,9 +628,9 @@ int iwch_post_terminate(struct iwch_qp * /* immediate data starts here. */ term = (struct terminate_message *)wqe->send.sgl; build_term_codes(rsp_msg, &term->layer_etype, &term->ecode); - build_fw_riwrh((void *)wqe, T3_WR_SEND, - T3_COMPLETION_FLAG | T3_NOTIFY_FLAG, 1, - qhp->ep->hwtid, 5); + wqe->send.wrh.op_seop_flags = cpu_to_be32(V_FW_RIWR_OP(T3_WR_SEND) | + V_FW_RIWR_FLAGS(T3_COMPLETION_FLAG | T3_NOTIFY_FLAG)); + wqe->send.wrh.gen_tid_len = cpu_to_be32(V_FW_RIWR_TID(qhp->ep->hwtid)); skb->priority = CPL_PRIORITY_DATA; return cxgb3_ofld_send(qhp->rhp->rdev.t3cdev_p, skb); } From swise at opengridcomputing.com Wed Jun 27 08:51:35 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 27 Jun 2007 10:51:35 -0500 Subject: [ofa-general] [PATCH 3/6] iw_cxgb3: Don't count neg_adv abort_req_rss messages as real aborts. In-Reply-To: <20070627155119.24944.44172.stgit@dell3.ogc.int> References: <20070627155119.24944.44172.stgit@dell3.ogc.int> Message-ID: <20070627155135.24944.44327.stgit@dell3.ogc.int> iw_cxgb3: Don't count neg_adv abort_req_rss messages as real aborts. negative advice messages should _not_ count toward the 2 abort requests needed to indicate an abort request. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_cm.c | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index 7b8d5aa..4d7c277 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -1465,6 +1465,13 @@ static int peer_abort(struct t3cdev *tde int ret; int state; + if (is_neg_adv_abort(req->status)) { + PDBG("%s neg_adv_abort ep %p tid %d\n", __FUNCTION__, ep, + ep->hwtid); + t3_l2t_send_event(ep->com.tdev, ep->l2t); + return CPL_RET_BUF_DONE; + } + /* * We get 2 peer aborts from the HW. The first one must * be ignored except for scribbling that we need one more. @@ -1474,13 +1481,6 @@ static int peer_abort(struct t3cdev *tde return CPL_RET_BUF_DONE; } - if (is_neg_adv_abort(req->status)) { - PDBG("%s neg_adv_abort ep %p tid %d\n", __FUNCTION__, ep, - ep->hwtid); - t3_l2t_send_event(ep->com.tdev, ep->l2t); - return CPL_RET_BUF_DONE; - } - state = state_read(&ep->com); PDBG("%s ep %p state %u\n", __FUNCTION__, ep, state); switch (state) { From swise at opengridcomputing.com Wed Jun 27 08:51:40 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 27 Jun 2007 10:51:40 -0500 Subject: [ofa-general] [PATCH 4/6] iw_cxgb3: ctrl-qp init/clear shouldn't set the gen bit. In-Reply-To: <20070627155119.24944.44172.stgit@dell3.ogc.int> References: <20070627155119.24944.44172.stgit@dell3.ogc.int> Message-ID: <20070627155140.24944.61647.stgit@dell3.ogc.int> iw_cxgb3: ctrl-qp init/clear shouldn't set the gen bit. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/cxio_hal.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c index 215bbe5..1518b41 100644 --- a/drivers/infiniband/hw/cxgb3/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c @@ -144,7 +144,7 @@ static int cxio_hal_clear_qp_ctx(struct } wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe)); memset(wqe, 0, sizeof(*wqe)); - build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 3, 1, qpid, 7); + build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 3, 0, qpid, 7); wqe->flags = cpu_to_be32(MODQP_WRITE_EC); sge_cmd = qpid << 8 | 3; wqe->sge_cmd = cpu_to_be64(sge_cmd); @@ -548,7 +548,7 @@ static int cxio_hal_init_ctrl_qp(struct V_EC_UP_TOKEN(T3_CTL_QP_TID) | F_EC_VALID)) << 32; wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe)); memset(wqe, 0, sizeof(*wqe)); - build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 0, 1, + build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 0, 0, T3_CTL_QP_TID, 7); wqe->flags = cpu_to_be32(MODQP_WRITE_EC); sge_cmd = (3ULL << 56) | FW_RI_SGEEC_START << 8 | 3; From swise at opengridcomputing.com Wed Jun 27 08:51:45 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 27 Jun 2007 10:51:45 -0500 Subject: [ofa-general] [PATCH 5/6] iw_cxgb3: Don't post TID_RELEASE message. In-Reply-To: <20070627155119.24944.44172.stgit@dell3.ogc.int> References: <20070627155119.24944.44172.stgit@dell3.ogc.int> Message-ID: <20070627155145.24944.64064.stgit@dell3.ogc.int> iw_cxgb3: Don't post TID_RELEASE message. The LLD does this for us in cxgb3_remove_tid(). Also fixed active open failure cases where we shouldn't be releasing the TID as well. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_cm.c | 13 ++++++++++--- 1 files changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index 4d7c277..228721f 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -254,8 +254,6 @@ static void release_ep_resources(struct cxgb3_remove_tid(ep->com.tdev, (void *)ep, ep->hwtid); dst_release(ep->dst); l2t_release(L2DATA(ep->com.tdev), ep->l2t); - if (ep->com.tdev->type == T3B) - release_tid(ep->com.tdev, ep->hwtid, NULL); put_ep(&ep->com); } @@ -1103,6 +1101,15 @@ static int abort_rpl(struct t3cdev *tdev return CPL_RET_BUF_DONE; } +/* + * Return whether a failed active open has allocated a TID + */ +static inline int act_open_has_tid(int status) +{ + return status != CPL_ERR_TCAM_FULL && status != CPL_ERR_CONN_EXIST && + status != CPL_ERR_ARP_MISS; +} + static int act_open_rpl(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) { struct iwch_ep *ep = ctx; @@ -1112,7 +1119,7 @@ static int act_open_rpl(struct t3cdev *t status2errno(rpl->status)); connect_reply_upcall(ep, status2errno(rpl->status)); state_set(&ep->com, DEAD); - if (ep->com.tdev->type == T3B) + if (ep->com.tdev->type == T3B && act_open_has_tid(rpl->status)) release_tid(ep->com.tdev, GET_TID(rpl), NULL); cxgb3_free_atid(ep->com.tdev, ep->atid); dst_release(ep->dst); From swise at opengridcomputing.com Wed Jun 27 08:51:50 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 27 Jun 2007 10:51:50 -0500 Subject: [ofa-general] [PATCH 6/6] iw_cxgb3: Don't abort after failures sending the mpa reply. In-Reply-To: <20070627155119.24944.44172.stgit@dell3.ogc.int> References: <20070627155119.24944.44172.stgit@dell3.ogc.int> Message-ID: <20070627155150.24944.36124.stgit@dell3.ogc.int> iw_cxgb3: Don't abort after failures sending the mpa reply. This bug results in an abort request being sent down _after_ the tid has been released. If the tid happens to have been reused, then the subsequent generation of the tid gets incorrectly aborted. The thread running iwch_accecpt_cr() must not abort a connection if an error is returned after being awakened. If any errors did occur while iwch_accept_cr() is blocked, then the connection has already been aborted on the thread processing the error. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_cm.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index 228721f..3b41dc0 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -1781,7 +1781,6 @@ err: ep->com.cm_id = NULL; ep->com.qp = NULL; cm_id->rem_ref(cm_id); - abort_connection(ep, NULL, GFP_KERNEL); put_ep(&ep->com); return err; } From arthur.jones at qlogic.com Wed Jun 27 09:07:44 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Wed, 27 Jun 2007 09:07:44 -0700 Subject: [ofa-general] Re: [PATCH 26/28] IB/ipath - print warning if LID not acquired and link ACTIVE within one minute In-Reply-To: <20070627054006.GI10225@obsidianresearch.com> References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234303.3794.75856.stgit@bauxite.internal.keyresearch.com> <20070626222556.GP29798@bauxite.pathscale.com> <20070627054006.GI10225@obsidianresearch.com> Message-ID: <20070627160744.GS29798@bauxite.pathscale.com> hi jason, ... On Tue, Jun 26, 2007 at 11:40:06PM -0600, Jason Gunthorpe wrote: > On Tue, Jun 26, 2007 at 03:25:56PM -0700, Arthur Jones wrote: > > > does this mean that there's a patch pending > > to remove the gazillion link down messages > > in drivers/net? > > These days alot of the ethernet drivers use one of the mii phy general > codes that cause those messages to be printed.. > > The ethernet drivers are a bit of a bad example because there is alot > of variations of the code to monitor the phy state machines so for > consistency with the general mii stuff they have to print the message > on their own. :| ok, thanks for info... so then, what kind of mii like infrastructure can i use to print out a message when i expect a LID and i don't get one? i didn't see anything in the IB code, did i miss something? arthur From arthur.jones at qlogic.com Wed Jun 27 10:02:42 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Wed, 27 Jun 2007 10:02:42 -0700 Subject: [ofa-general] Re: [PATCH 24/28] IB/ipath - ipath_poll fixups and enhancements In-Reply-To: References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com> <20070621152312.GA14817@bauxite.pathscale.com> Message-ID: <20070627170242.GT29798@bauxite.pathscale.com> hi roland, ... On Thu, Jun 21, 2007 at 11:14:23AM -0700, Roland Dreier wrote: > > the port_rcvhdrttail_kvaddr is the kernel virtual address > > allocated in coherent memory where the header queue is updated > > by the chip. we use volatile to make sure the compiler does > > not use stale data... > > OK, fair enough, although it seems you may be missing some memory > barriers to make sure you don't run into the CPU reordering accesses > to the head/tail pointers. i had a quick look at the patch and the surrounding code and i did not catch the problem. can you be a little more specific about the suspect code? thanks... arthur From Mark.Seger at hp.com Wed Jun 27 10:07:26 2007 From: Mark.Seger at hp.com (Mark Seger) Date: Wed, 27 Jun 2007 13:07:26 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <1182957688.28870.83013.camel@hal.voltaire.com> References: <46826370.4090602@hp.com> <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <46827BA0.6070008@hp.com> <1182957688.28870.83013.camel@hal.voltaire.com> Message-ID: <4682994E.1020209@hp.com> >The performance managers deal with the counter stickiness (by resetting >them when they think they need to). They typically export their data >although this is not specified by IBA so it is in a vendor proprietary >manner. > > so I guess these guys are poor citizens as well... the real issue as I see it then means nobody can trust the data if randon tools randomly reset the counters. a real shame... -mark From Mark.Seger at hp.com Wed Jun 27 10:10:40 2007 From: Mark.Seger at hp.com (Mark Seger) Date: Wed, 27 Jun 2007 13:10:40 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <1182958191.28870.83536.camel@hal.voltaire.com> References: <46826370.4090602@hp.com> <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <1182958191.28870.83536.camel@hal.voltaire.com> Message-ID: <46829A10.1030202@hp.com> >>>There's now an experimental performance manager which has been discussed >>>on the list. The performance data collected can be accessed. >>> >>> >>> >>alas, since I use this tool on commercial systems, I can't run it >>against experimental code. perhaps when the experimental becomes real I >>can. >> >> > >It should be in the OFED 1.3 timeframe. Also, there are vendor >Performance Managers too. > > that would be good. if I can detect the 1.3 stack I can change my monitoring accordingly >> I'll try to find the notes in the archives. >> >> > >I can send you this if you can't find it > > that would be great if you can easily lay your hands on it. >>Good point, but I guess I'm between a rock and a hard place. imho: as >>long as the counters don't wrap this problem will never be solved. >> >> > >It's the IBTA standard (rather than IETF style counters). I don't think >it's going to change. > > yeah, but it also makes one wonder why non-wrapping counters were chosen when the IETF proved years ago that one needs wrapping counter to allow concurrent access to them. sigh... -mark From halr at voltaire.com Wed Jun 27 10:12:18 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Jun 2007 13:12:18 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <4682994E.1020209@hp.com> References: <46826370.4090602@hp.com> <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <46827BA0.6070008@hp.com> <1182957688.28870.83013.camel@hal.voltaire.com> <4682994E.1020209@hp.com> Message-ID: <1182964334.28870.90291.camel@hal.voltaire.com> On Wed, 2007-06-27 at 13:07, Mark Seger wrote: > >The performance managers deal with the counter stickiness (by resetting > >them when they think they need to). They typically export their data > >although this is not specified by IBA so it is in a vendor proprietary > >manner. > > > > > so I guess these guys are poor citizens as well... Not sure what you mean. > the real issue as I see it then means nobody can trust the data if > randon tools randomly reset the counters. a real shame... I consider this to be a real rather than random app for this. Guess it depends on what one considers random. -- Hal > -mark > > From Mark.Seger at hp.com Wed Jun 27 10:24:36 2007 From: Mark.Seger at hp.com (Mark Seger) Date: Wed, 27 Jun 2007 13:24:36 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <1182964334.28870.90291.camel@hal.voltaire.com> References: <46826370.4090602@hp.com> <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <46827BA0.6070008@hp.com> <1182957688.28870.83013.camel@hal.voltaire.com> <4682994E.1020209@hp.com> <1182964334.28870.90291.camel@hal.voltaire.com> Message-ID: <46829D54.2040300@hp.com> Hal Rosenstock wrote: >On Wed, 2007-06-27 at 13:07, Mark Seger wrote: > > >>>The performance managers deal with the counter stickiness (by resetting >>>them when they think they need to). They typically export their data >>>although this is not specified by IBA so it is in a vendor proprietary >>>manner. >>> >>> >>> >>> >>so I guess these guys are poor citizens as well... >> >> > >Not sure what you mean. > > I consider it poor form to zero counters out from someone else who might be in the middle of trying to read them and though that's what you mean when you said why I was doing was "Yes, it is _bad_ if there are essentially multiple performance managers resetting the counters." I am most definately guilty as charged and trying real hard to get out from under which is why I suggested a module that exports wrapping counters to /proc. Then, as long as ALL utilities rely on those numbers, the module can reset them all likes and nobody interfers with each other since there is only one program doing that. >>the real issue as I see it then means nobody can trust the data if >>randon tools randomly reset the counters. a real shame... >> >> > >I consider this to be a real rather than random app for this. Guess it >depends on what one considers random. > > I used the term 'random' loosely, but my point is as long as anyone can reset the counters and you never know if it's happening or not, you'll get bogus data and I'm trying to find a way to get around it. -mark >-- Hal > > > >>-mark >> >> >> >> From jgunthorpe at obsidianresearch.com Wed Jun 27 10:37:50 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 27 Jun 2007 11:37:50 -0600 Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement In-Reply-To: <1182942065.28870.65696.camel@hal.voltaire.com> References: <20070626102045.GS15343@mellanox.co.il> <1182862966.10379.425353.camel@hal.voltaire.com> <20070626132457.GA29602@mellanox.co.il> <20070626180157.GM25653@sashak.voltaire.com> <1182885534.28870.527.camel@hal.voltaire.com> <20070627053327.GH10225@obsidianresearch.com> <1182942065.28870.65696.camel@hal.voltaire.com> Message-ID: <20070627173750.GN32050@obsidianresearch.com> On Wed, Jun 27, 2007 at 07:01:09AM -0400, Hal Rosenstock wrote: > > (and I'd advocate using -std=gnu99, but I never compile with > > VC++ :P). > > > > 'gcc -ansi -D_POSIX_SOURCE_' as a minimum is also pretty good. > > How about: > > gcc -Wall -D_XOPEN_SOURCE=600 I'd recommend -D_POSIX_C_SOURCE=200112 as the 'highest' setting for portable code. This sould reflect IEEE 1003.1-2004 (aka SUSv3) Most of the XPG specific stuff is not as easy to get good documentation on, IMHO. Jason From halr at voltaire.com Wed Jun 27 10:48:04 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Jun 2007 13:48:04 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <46829D54.2040300@hp.com> References: <46826370.4090602@hp.com> <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <46827BA0.6070008@hp.com> <1182957688.28870.83013.camel@hal.voltaire.com> <4682994E.1020209@hp.com> <1182964334.28870.90291.camel@hal.voltaire.com> <46829D54.2040300@hp.com> Message-ID: <1182966482.28870.92686.camel@hal.voltaire.com> On Wed, 2007-06-27 at 13:24, Mark Seger wrote: > Hal Rosenstock wrote: > > >On Wed, 2007-06-27 at 13:07, Mark Seger wrote: > > > > > >>>The performance managers deal with the counter stickiness (by resetting > >>>them when they think they need to). They typically export their data > >>>although this is not specified by IBA so it is in a vendor proprietary > >>>manner. > >>> > >>> > >>> > >>> > >>so I guess these guys are poor citizens as well... > >> > >> > > > >Not sure what you mean. > > > > > I consider it poor form to zero counters out from someone else who might > be in the middle of trying to read them and though that's what you mean > when you said why I was doing was "Yes, it is _bad_ if there are > essentially multiple performance managers resetting the counters." I am > most definately guilty as charged and trying real hard to get out from > under which is why I suggested a module that exports wrapping counters > to /proc. Then, as long as ALL utilities rely on those numbers, the > module can reset them all likes and nobody interfers with each other > since there is only one program doing that. Another approach would be to have the PMA inform the kernel that the counters were reset (perhaps including the values prior to the reset) so that these could be factored into the local set of counters. There is nothing in the spec that precludes this although it has not been implemented this way. Then there would't be a reason for a local manager to have to play these games. It would mean that there would need to be a performance manager running in the subnet which may not be acceptable for some installations; not sure. > >>the real issue as I see it then means nobody can trust the data if > >>randon tools randomly reset the counters. a real shame... > >> > >> > > > >I consider this to be a real rather than random app for this. Guess it > >depends on what one considers random. > > > > > I used the term 'random' loosely, but my point is as long as anyone can > reset the counters and you never know if it's happening or not, you'll > get bogus data Agreed. > and I'm trying to find a way to get around it. Understood. -- Hal > -mark > > >-- Hal > > > > > > > >>-mark > >> > >> > >> > >> > From swise at opengridcomputing.com Wed Jun 27 11:12:16 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 27 Jun 2007 13:12:16 -0500 Subject: [ofa-general] Open Fabrics iWARP Driver for Chesio T3 card In-Reply-To: References: Message-ID: <4682A880.1030001@opengridcomputing.com> Hi David, Answers below: david elsen wrote: > Can someone please let me know: > > 1. What is the latest Open Fabrics Driver for the Chesio T3 cards? > The latest chelsio rdma driver is in the ofed-1.2 "gold" release. That driver requires firmware from chelsio that is included in their latest software kit: cxgb3toe-1.0.104.tar.gz. Contact chelsio to get this. I'll probably be pulling in a patch series for ofed-1.2 to update the ofed low level driver, but for now, please use the kit from Chelsio. I suggest you install OFED-1.2.tgz and then the cxgb3toe-1.0.104 kit on top of ofed. This will install the latest low level driver (used by the rdma driver in the ofed release) and the latest 4.3.0 firmware. > 2. Is there any documentation there on The Open Fabrics website to > install the iWARP driver for the T3 card? > There is a chelsio cxgb3 release note file included in the ofed-1.2 documentation package. > 3. Is there any documentation describing how to set the iWARP and > Network interface for the T3 cards? > Same release note file. Hope this helps. Steve. From eitan at mellanox.co.il Wed Jun 27 11:23:38 2007 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 27 Jun 2007 21:23:38 +0300 Subject: [ofa-general] IB performance stats (revisited) References: <46826370.4090602@hp.com><1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com><46827BA0.6070008@hp.com><1182957688.28870.83013.camel@hal.voltaire.com><4682994E.1020209@hp.com> <1182964334.28870.90291.camel@hal.voltaire.com> Message-ID: <6C2C79E72C305246B504CBA17B5500C901CAD7B4@mtlexch01.mtl.com> > > > > > > > > > so I guess these guys are poor citizens as well... > > Not sure what you mean. > > > the real issue as I see it then means nobody can trust the data if > > randon tools randomly reset the counters. a real shame... In IBADM ibmon we worked around this issue by inspecting the fact the counter value decreases without the ibmon knowledge. > > I consider this to be a real rather than random app for this. > Guess it depends on what one considers random. > > -- Hal > > > -mark > > > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From jgunthorpe at obsidianresearch.com Wed Jun 27 11:22:36 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 27 Jun 2007 12:22:36 -0600 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <1182966482.28870.92686.camel@hal.voltaire.com> References: <46826370.4090602@hp.com> <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <46827BA0.6070008@hp.com> <1182957688.28870.83013.camel@hal.voltaire.com> <4682994E.1020209@hp.com> <1182964334.28870.90291.camel@hal.voltaire.com> <46829D54.2040300@hp.com> <1182966482.28870.92686.camel@hal.voltaire.com> Message-ID: <20070627182236.GO32050@obsidianresearch.com> On Wed, Jun 27, 2007 at 01:48:04PM -0400, Hal Rosenstock wrote: > Another approach would be to have the PMA inform the kernel that the > counters were reset (perhaps including the values prior to the reset) so > that these could be factored into the local set of counters. There is > nothing in the spec that precludes this although it has not been > implemented this way. Then there would't be a reason for a local manager > to have to play these games. It would mean that there would need to be a > performance manager running in the subnet which may not be acceptable > for some installations; not sure. If you are going to play those sorts of games I think it would better to just effectively disable the PMA in the mellanox firmware and do the following: - The kernel periodically fetches the performance stats and aggregates them into a 64 wrapping counter. The kernel sends PMA mads into the mellanox firmware to read and reset the counters - The new 64 bit stats are exported via sysfs/proc/whatever as wrapping counters - When a PMA packet comes in the kernel services it rather than passing it on to the chip firmware. Hopefully in future we could encourage new firmware/sillicon to support exporting non-wrapping 64 bit counters to the OS so this ugly mess wouldn't be needed. FWIW, I agree with Mark that the current locally accessible counters that are exactly the same as PMA mad values are virtually useless.. Jason From eitan at mellanox.co.il Wed Jun 27 11:23:41 2007 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 27 Jun 2007 21:23:41 +0300 Subject: [ofa-general] IB performance stats (revisited) References: <46826370.4090602@hp.com><1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com><46827BA0.6070008@hp.com><1182957688.28870.83013.camel@hal.voltaire.com><4682994E.1020209@hp.com> <1182964334.28870.90291.camel@hal.voltaire.com> Message-ID: <6C2C79E72C305246B504CBA17B5500C901CAD7B7@mtlexch01.mtl.com> In the last months it is the second time I hear people complaining the current monitoring solution in OFA is integrated with OpenSM. These people do not use OpenSM but do use OFED. Another drawback if that no naming is provided and the reporting uses GUIDs. I also can't hold myself from saying again I think you are going to hit the wall with the concept of doing the PMA from a single node. Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: general-bounces at lists.openfabrics.org > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > Hal Rosenstock > Sent: Wednesday, June 27, 2007 8:12 PM > To: Mark Seger > Cc: Finn, Ed; general at lists.openfabrics.org > Subject: Re: [ofa-general] IB performance stats (revisited) > > On Wed, 2007-06-27 at 13:07, Mark Seger wrote: > > >The performance managers deal with the counter stickiness (by > > >resetting them when they think they need to). They > typically export > > >their data although this is not specified by IBA so it is > in a vendor > > >proprietary manner. > > > > > > > > so I guess these guys are poor citizens as well... > > Not sure what you mean. > > > the real issue as I see it then means nobody can trust the data if > > randon tools randomly reset the counters. a real shame... > > I consider this to be a real rather than random app for this. > Guess it depends on what one considers random. > > -- Hal > > > -mark > > > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Wed Jun 27 12:08:39 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 27 Jun 2007 12:08:39 -0700 Subject: [ofa-general] Re: The low level driver of mlx4 kmalloc 0 bytes in QP creation In-Reply-To: <46821FDA.5030900@dev.mellanox.co.il> (Dotan Barak's message of "Wed, 27 Jun 2007 11:29:14 +0300") References: <46821FDA.5030900@dev.mellanox.co.il> Message-ID: > If one creates a QP with 0 WR in the RQ in the kernel level, the low > level driver of the mlx4 > will kmalloc 0 bytes (for the WR IDs of the RQ). > (for example, the IPoIB CM creates such a QP) > > Is this is an error? The consensus seems to be that kmalloc(0) is OK, although various 2.6.22-rc kernels printed big tracebacks when it happens. I think getting rid of the kmalloc(0) in mlx4 would make the code more complicated for no real gain. - R. From rdreier at cisco.com Wed Jun 27 12:13:54 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 27 Jun 2007 12:13:54 -0700 Subject: [ofa-general] Re: [PATCH 24/28] IB/ipath - ipath_poll fixups and enhancements In-Reply-To: <20070627170242.GT29798@bauxite.pathscale.com> (Arthur Jones's message of "Wed, 27 Jun 2007 10:02:42 -0700") References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com> <20070621152312.GA14817@bauxite.pathscale.com> <20070627170242.GT29798@bauxite.pathscale.com> Message-ID: > > OK, fair enough, although it seems you may be missing some memory > > barriers to make sure you don't run into the CPU reordering accesses > > to the head/tail pointers. > > i had a quick look at the patch and the surrounding > code and i did not catch the problem. can you be a > little more specific about the suspect code? I'm not sure there's a bug there. But the patch in question does > + tail = *(volatile u64 *)pd->port_rcvhdrtail_kvaddr; with no memory ordering. The volatile makes sure the compiler puts that read where you wrote it, but there's no guarantee that the CPU executes it anywhere remotely close to where it is in the code. Later on you have > + if (tail != head || > + test_bit(IPATH_PORT_WAITING_RCV, &pd->int_flag)) { etc., and the CPU might speculate those test far ahead of actually reading the port_rcvhdrttail_kvaddr value, which means you might end up executing code based on a guess about tail != head that is not true at the time it speculates the branch, but by the time it does get to actually check its speculation, the guess has become true. Just something to think about... From rdreier at cisco.com Wed Jun 27 12:14:44 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 27 Jun 2007 12:14:44 -0700 Subject: [ofa-general] Re: [PATCH 0/6] iw_cxgb3: Bug Fixes for 2.6.23 In-Reply-To: <20070627155119.24944.44172.stgit@dell3.ogc.int> (Steve Wise's message of "Wed, 27 Jun 2007 10:51:19 -0500") References: <20070627155119.24944.44172.stgit@dell3.ogc.int> Message-ID: > Here are some bug fixes to the iw_cxgb3 driver that I'd like included > for 2.6.23. NOTE: Patch 1 requires a firmware interface change, so > there is a version bump to 4.3 included in that patch that hits cxgb3. > This will likely conflict with a previous version change that is in > Jeff's upstream branch. The net is: we need the firmware version bumped > to 4.3 with these iw_cxgb3 changes. OK, I'll probably pull this into my tree and hold off on asking Linus to pull until after he pulls Jeff's net driver tree. Once that happens I'll fix up any conflicts and ask Linus to pull. From swise at opengridcomputing.com Wed Jun 27 12:31:55 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 27 Jun 2007 14:31:55 -0500 Subject: [ofa-general] Re: [PATCH 0/6] iw_cxgb3: Bug Fixes for 2.6.23 In-Reply-To: References: <20070627155119.24944.44172.stgit@dell3.ogc.int> Message-ID: <4682BB2B.7030002@opengridcomputing.com> Roland Dreier wrote: > > Here are some bug fixes to the iw_cxgb3 driver that I'd like included > > for 2.6.23. NOTE: Patch 1 requires a firmware interface change, so > > there is a version bump to 4.3 included in that patch that hits cxgb3. > > This will likely conflict with a previous version change that is in > > Jeff's upstream branch. The net is: we need the firmware version bumped > > to 4.3 with these iw_cxgb3 changes. > > OK, I'll probably pull this into my tree and hold off on asking Linus > to pull until after he pulls Jeff's net driver tree. Once that > happens I'll fix up any conflicts and ask Linus to pull. Sounds good. Thanks, Steve. From arthur.jones at qlogic.com Wed Jun 27 13:10:18 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Wed, 27 Jun 2007 13:10:18 -0700 Subject: [ofa-general] Re: [PATCH 24/28] IB/ipath - ipath_poll fixups and enhancements In-Reply-To: References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com> <20070621152312.GA14817@bauxite.pathscale.com> <20070627170242.GT29798@bauxite.pathscale.com> Message-ID: <20070627201018.GY29798@bauxite.pathscale.com> hi roland, ... On Wed, Jun 27, 2007 at 12:13:54PM -0700, Roland Dreier wrote: > > > OK, fair enough, although it seems you may be missing some memory > > > barriers to make sure you don't run into the CPU reordering accesses > > > to the head/tail pointers. > > > > i had a quick look at the patch and the surrounding > > code and i did not catch the problem. can you be a > > little more specific about the suspect code? > > I'm not sure there's a bug there. But the patch in question does > > > + tail = *(volatile u64 *)pd->port_rcvhdrtail_kvaddr; > > with no memory ordering. The volatile makes sure the compiler puts > that read where you wrote it, but there's no guarantee that the CPU > executes it anywhere remotely close to where it is in the code. Later > on you have agreed. > > + if (tail != head || > > + test_bit(IPATH_PORT_WAITING_RCV, &pd->int_flag)) { > > etc., and the CPU might speculate those test far ahead of actually > reading the port_rcvhdrttail_kvaddr value, which means you might end > up executing code based on a guess about tail != head that is not true > at the time it speculates the branch, but by the time it does get to > actually check its speculation, the guess has become true. i agree that the &pd->int_flag result could be valid before tail has become valid and hence when waiting for tail to be valid we're out of order wrt the int_flag load. but this logic is completely async to the head != tail test, so the out-of-order result there can not hurt us... arthur From halr at voltaire.com Wed Jun 27 14:02:09 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Jun 2007 17:02:09 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <6C2C79E72C305246B504CBA17B5500C901CAD7B4@mtlexch01.mtl.com> References: <46826370.4090602@hp.com> <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <46827BA0.6070008@hp.com><1182957688.28870.83013.camel@hal.voltaire.com> <4682994E.1020209@hp.com> <1182964334.28870.90291.camel@hal.voltaire.com> <6C2C79E72C305246B504CBA17B5500C901CAD7B4@mtlexch01.mtl.com> Message-ID: <1182978125.28870.105782.camel@hal.voltaire.com> On Wed, 2007-06-27 at 14:23, Eitan Zahavi wrote: > > > > > > > > > > > > > so I guess these guys are poor citizens as well... > > > > Not sure what you mean. > > > > > the real issue as I see it then means nobody can trust the data if > > > randon tools randomly reset the counters. a real shame... > > In IBADM ibmon we worked around this issue by inspecting the fact the > counter value decreases without the ibmon knowledge. That is detected in the current PerfMgr as well: it is an "out of band" clear. The question is the loss of data accuracy from the last snapshot to the new lower values. -- Hal > > > > I consider this to be a real rather than random app for this. > > Guess it depends on what one considers random. > > > > -- Hal > > > > > -mark > > > > > > > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > From halr at voltaire.com Wed Jun 27 14:08:18 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Jun 2007 17:08:18 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <6C2C79E72C305246B504CBA17B5500C901CAD7B7@mtlexch01.mtl.com> References: <46826370.4090602@hp.com> <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <46827BA0.6070008@hp.com><1182957688.28870.83013.camel@hal.voltaire.com> <4682994E.1020209@hp.com> <1182964334.28870.90291.camel@hal.voltaire.com> <6C2C79E72C305246B504CBA17B5500C901CAD7B7@mtlexch01.mtl.com> Message-ID: <1182978496.28870.106214.camel@hal.voltaire.com> On Wed, 2007-06-27 at 14:23, Eitan Zahavi wrote: > In the last months it is the second time I hear people complaining the > current monitoring solution in OFA is integrated with OpenSM. I must have missed this both times (didn't see this in Mark's post) and the statement itself is somewhat inaccurate as well. > These people do not use OpenSM but do use OFED. I'm not sure I'm following what you mean here. If you mean that some people want to run PerfMgr without the SM/SA aspects (so that they can run a vendor based SM), that is the next thing we are adding to the implementation. > Another drawback if that > no naming is provided and the reporting uses GUIDs. Naming is provided via NodeDescription. > I also can't hold myself from saying again I think you are going to hit > the wall with the concept of doing the PMA from a single node. If you are referring to the fact the PerMgr is currently not distributed, that will be done as has been stated before. -- Hal > Eitan Zahavi > Senior Engineering Director, Software Architect > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > -----Original Message----- > > From: general-bounces at lists.openfabrics.org > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of > > Hal Rosenstock > > Sent: Wednesday, June 27, 2007 8:12 PM > > To: Mark Seger > > Cc: Finn, Ed; general at lists.openfabrics.org > > Subject: Re: [ofa-general] IB performance stats (revisited) > > > > On Wed, 2007-06-27 at 13:07, Mark Seger wrote: > > > >The performance managers deal with the counter stickiness (by > > > >resetting them when they think they need to). They > > typically export > > > >their data although this is not specified by IBA so it is > > in a vendor > > > >proprietary manner. > > > > > > > > > > > so I guess these guys are poor citizens as well... > > > > Not sure what you mean. > > > > > the real issue as I see it then means nobody can trust the data if > > > randon tools randomly reset the counters. a real shame... > > > > I consider this to be a real rather than random app for this. > > Guess it depends on what one considers random. > > > > -- Hal > > > > > -mark > > > > > > > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > From halr at voltaire.com Wed Jun 27 14:13:40 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Jun 2007 17:13:40 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <20070627182236.GO32050@obsidianresearch.com> References: <46826370.4090602@hp.com> <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <46827BA0.6070008@hp.com> <1182957688.28870.83013.camel@hal.voltaire.com> <4682994E.1020209@hp.com> <1182964334.28870.90291.camel@hal.voltaire.com> <46829D54.2040300@hp.com> <1182966482.28870.92686.camel@hal.voltaire.com> <20070627182236.GO32050@obsidianresearch.com> Message-ID: <1182978803.28870.106563.camel@hal.voltaire.com> On Wed, 2007-06-27 at 14:22, Jason Gunthorpe wrote: > On Wed, Jun 27, 2007 at 01:48:04PM -0400, Hal Rosenstock wrote: > > > Another approach would be to have the PMA inform the kernel that the > > counters were reset (perhaps including the values prior to the reset) so > > that these could be factored into the local set of counters. There is > > nothing in the spec that precludes this although it has not been > > implemented this way. Then there would't be a reason for a local manager > > to have to play these games. It would mean that there would need to be a > > performance manager running in the subnet which may not be acceptable > > for some installations; not sure. > > If you are going to play those sorts of games I think it would better > to just effectively disable the PMA in the mellanox firmware and do > the following: > > - The kernel periodically fetches the performance stats and aggregates > them into a 64 wrapping counter. The kernel sends PMA mads into the > mellanox firmware to read and reset the counters > - The new 64 bit stats are exported via sysfs/proc/whatever as > wrapping counters > - When a PMA packet comes in the kernel services it rather than > passing it on to the chip firmware. In this way, both 32 and 64 bit counters could be presented by the PMA but how would it know when the a counter has maxed out in terms of the PMA and how would a remote clear be handled ? -- Hal > Hopefully in future we could encourage new firmware/sillicon to > support exporting non-wrapping 64 bit counters to the OS so this ugly > mess wouldn't be needed. > > FWIW, I agree with Mark that the current locally accessible counters > that are exactly the same as PMA mad values are virtually useless.. > > Jason From jgunthorpe at obsidianresearch.com Wed Jun 27 14:26:51 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 27 Jun 2007 15:26:51 -0600 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <1182978803.28870.106563.camel@hal.voltaire.com> References: <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <46827BA0.6070008@hp.com> <1182957688.28870.83013.camel@hal.voltaire.com> <4682994E.1020209@hp.com> <1182964334.28870.90291.camel@hal.voltaire.com> <46829D54.2040300@hp.com> <1182966482.28870.92686.camel@hal.voltaire.com> <20070627182236.GO32050@obsidianresearch.com> <1182978803.28870.106563.camel@hal.voltaire.com> Message-ID: <20070627212651.GQ32050@obsidianresearch.com> On Wed, Jun 27, 2007 at 05:13:40PM -0400, Hal Rosenstock wrote: > > - The kernel periodically fetches the performance stats and aggregates > > them into a 64 wrapping counter. The kernel sends PMA mads into the > > mellanox firmware to read and reset the counters > > - The new 64 bit stats are exported via sysfs/proc/whatever as > > wrapping counters > > - When a PMA packet comes in the kernel services it rather than > > passing it on to the chip firmware. > > In this way, both 32 and 64 bit counters could be presented by the PMA > but how would it know when the a counter has maxed out in terms of the > PMA and how would a remote clear be handled ? Each time the counter is cleared the kernel would store the 64 bit value as the 'last PMA counter'. Then the calculation is just if ((current - stored) >= saturation) return saturation; return current - stored; After 2**64 counts the saturation computation will stop working. It would take 24 years of constant maxed out data transfer for a 12x QDR link to wrap a 64 bit dword byte counter. A nice side benifit would that linux drivers could present a consistent PMA interface with new extended 64 bit counters even with older hardware. Jason From Mark.Seger at hp.com Wed Jun 27 14:37:19 2007 From: Mark.Seger at hp.com (Mark Seger) Date: Wed, 27 Jun 2007 17:37:19 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <20070627212651.GQ32050@obsidianresearch.com> References: <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <46827BA0.6070008@hp.com> <1182957688.28870.83013.camel@hal.voltaire.com> <4682994E.1020209@hp.com> <1182964334.28870.90291.camel@hal.voltaire.com> <46829D54.2040300@hp.com> <1182966482.28870.92686.camel@hal.voltaire.com> <20070627182236.GO32050@obsidianresearch.com> <1182978803.28870.106563.camel@hal.voltaire.com> <20070627212651.GQ32050@obsidianresearch.com> Message-ID: <4682D88F.9040806@hp.com> Jason Gunthorpe wrote: > On Wed, Jun 27, 2007 at 05:13:40PM -0400, Hal Rosenstock wrote: > > >>> - The kernel periodically fetches the performance stats and aggregates >>> them into a 64 wrapping counter. The kernel sends PMA mads into the >>> mellanox firmware to read and reset the counters >>> - The new 64 bit stats are exported via sysfs/proc/whatever as >>> wrapping counters >>> - When a PMA packet comes in the kernel services it rather than >>> passing it on to the chip firmware. >>> >> In this way, both 32 and 64 bit counters could be presented by the PMA >> but how would it know when the a counter has maxed out in terms of the >> PMA and how would a remote clear be handled ? >> > > Each time the counter is cleared the kernel would store the 64 bit > value as the 'last PMA counter'. Then the calculation is just > > if ((current - stored) >= saturation) > return saturation; > return current - stored; > > After 2**64 counts the saturation computation will stop working. It > would take 24 years of constant maxed out data transfer for a 12x QDR > link to wrap a 64 bit dword byte counter. > > A nice side benifit would that linux drivers could present a > consistent PMA interface with new extended 64 bit counters even with > older hardware. > I agree for 64 bit counters but for 32 bit ones it gets a little more complicated because they can max out in under a minute! Since it's tough to decide when a counter has maxed out you therefore HAVE to clear it every time! This means your monitoring utility will need to examine the /proc counters within that 'max-out' window or the counters will latch on you. If you wait too long to look you're screwed and now we're back to the fact that the counters don't wrap. what I'd like to hear is the sense of the community whether or not something like this would be acceptable. if it is, that means nobody is allowed to clear counters on their own AND that the single source for counter information then becomes /proc. -mark From halr at voltaire.com Wed Jun 27 14:44:36 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Jun 2007 17:44:36 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <20070627212651.GQ32050@obsidianresearch.com> References: <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <46827BA0.6070008@hp.com> <1182957688.28870.83013.camel@hal.voltaire.com> <4682994E.1020209@hp.com> <1182964334.28870.90291.camel@hal.voltaire.com> <46829D54.2040300@hp.com> <1182966482.28870.92686.camel@hal.voltaire.com> <20070627182236.GO32050@obsidianresearch.com> <1182978803.28870.106563.camel@hal.voltaire.com> <20070627212651.GQ32050@obsidianresearch.com> Message-ID: <1182980675.28870.108616.camel@hal.voltaire.com> On Wed, 2007-06-27 at 17:26, Jason Gunthorpe wrote: > On Wed, Jun 27, 2007 at 05:13:40PM -0400, Hal Rosenstock wrote: > > > > - The kernel periodically fetches the performance stats and aggregates > > > them into a 64 wrapping counter. The kernel sends PMA mads into the > > > mellanox firmware to read and reset the counters > > > - The new 64 bit stats are exported via sysfs/proc/whatever as > > > wrapping counters > > > - When a PMA packet comes in the kernel services it rather than > > > passing it on to the chip firmware. > > > > In this way, both 32 and 64 bit counters could be presented by the PMA > > but how would it know when the a counter has maxed out in terms of the > > PMA and how would a remote clear be handled ? > > Each time the counter is cleared So it doesn't matter whether the clear is local (from Linux) or remote (from IB), right ? > the kernel would store the 64 bit > value as the 'last PMA counter'. Then the calculation is just > > if ((current - stored) >= saturation) > return saturation; > return current - stored; > > After 2**64 counts the saturation computation will stop working. It > would take 24 years of constant maxed out data transfer for a 12x QDR > link to wrap a 64 bit dword byte counter. Is that even for the 4 octet counts ? (I didn't calculate this out). > A nice side benifit would that linux drivers could present a > consistent PMA interface with new extended 64 bit counters even with > older hardware. Indeed. The question may now be how to get from where we are today to this model. -- Hal > Jason From rick.jones2 at hp.com Wed Jun 27 14:49:37 2007 From: rick.jones2 at hp.com (Rick Jones) Date: Wed, 27 Jun 2007 14:49:37 -0700 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <20070627212651.GQ32050@obsidianresearch.com> References: <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <46827BA0.6070008@hp.com> <1182957688.28870.83013.camel@hal.voltaire.com> <4682994E.1020209@hp.com> <1182964334.28870.90291.camel@hal.voltaire.com> <46829D54.2040300@hp.com> <1182966482.28870.92686.camel@hal.voltaire.com> <20070627182236.GO32050@obsidianresearch.com> <1182978803.28870.106563.camel@hal.voltaire.com> <20070627212651.GQ32050@obsidianresearch.com> Message-ID: <4682DB71.5080504@hp.com> > After 2**64 counts the saturation computation will stop working. It > would take 24 years of constant maxed out data transfer for a 12x QDR > link to wrap a 64 bit dword byte counter. Drifting a bit, and perhaps not properly interpreting some of the TLAs, but I suspect that if we go back oh 20ish years or so, we could find similar calculations being put forth to show how very long a 32-bit counter would last :) Perhaps it isn't too too early to start talking about > 64 bit counters... rick jones From halr at voltaire.com Wed Jun 27 14:49:32 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Jun 2007 17:49:32 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <4682D88F.9040806@hp.com> References: <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <46827BA0.6070008@hp.com> <1182957688.28870.83013.camel@hal.voltaire.com> <4682994E.1020209@hp.com> <1182964334.28870.90291.camel@hal.voltaire.com> <46829D54.2040300@hp.com> <1182966482.28870.92686.camel@hal.voltaire.com> <20070627182236.GO32050@obsidianresearch.com> <1182978803.28870.106563.camel@hal.voltaire.com> <20070627212651.GQ32050@obsidianresearch.com> <4682D88F.9040806@hp.com> Message-ID: <1182980966.28870.108877.camel@hal.voltaire.com> On Wed, 2007-06-27 at 17:37, Mark Seger wrote: > Jason Gunthorpe wrote: > > On Wed, Jun 27, 2007 at 05:13:40PM -0400, Hal Rosenstock wrote: > > > > > >>> - The kernel periodically fetches the performance stats and aggregates > >>> them into a 64 wrapping counter. The kernel sends PMA mads into the > >>> mellanox firmware to read and reset the counters > >>> - The new 64 bit stats are exported via sysfs/proc/whatever as > >>> wrapping counters > >>> - When a PMA packet comes in the kernel services it rather than > >>> passing it on to the chip firmware. > >>> > >> In this way, both 32 and 64 bit counters could be presented by the PMA > >> but how would it know when the a counter has maxed out in terms of the > >> PMA and how would a remote clear be handled ? > >> > > > > Each time the counter is cleared the kernel would store the 64 bit > > value as the 'last PMA counter'. Then the calculation is just > > > > if ((current - stored) >= saturation) > > return saturation; > > return current - stored; > > > > After 2**64 counts the saturation computation will stop working. It > > would take 24 years of constant maxed out data transfer for a 12x QDR > > link to wrap a 64 bit dword byte counter. > > > > A nice side benifit would that linux drivers could present a > > consistent PMA interface with new extended 64 bit counters even with > > older hardware. > > > I agree for 64 bit counters but for 32 bit ones it gets a little more > complicated because they can max out in under a minute! Since it's > tough to decide when a counter has maxed out you therefore HAVE to clear > it every time! This means your monitoring utility will need to examine > the /proc counters within that 'max-out' window or the counters will > latch on > you. If you wait too long to look you're screwed and now we're back to > the fact that the counters don't wrap. > > what I'd like to hear is the sense of the community whether or not > something like this would be acceptable. if it is, that means nobody is > allowed to clear counters on their own Per the IBA spec, I don't think you can legislate this away. IB supports a standard way to remotely clear counters (and the various Performance Managers or other similar tools utilize this clearing feature). -- Hal > AND that the single source for counter information then becomes /proc. > > -mark > > From DavidRobb at comsci.co.uk Wed Jun 27 15:16:09 2007 From: DavidRobb at comsci.co.uk (David Robb) Date: Wed, 27 Jun 2007 23:16:09 +0100 Subject: [ofa-general] Infiniband Problems In-Reply-To: <467AD9FB.1030508@comsci.co.uk> References: <467ACAD6.8000304@comsci.co.uk> <467AD385.3040500@comsci.co.uk> <467AD9FB.1030508@comsci.co.uk> Message-ID: <4682E1A9.3070203@comsci.co.uk> David Robb wrote: > > Roland Dreier wrote: >> > Quite possibly, we are using an IBV_QPT_RC transport type. The code >> > simply adds another work request with ibv_post_srq_recv(...) after >> > each packet is processed. Am I correct in thinking it should start >> out >> > with a stack of work requests in case another packet arrives before >> > the current one has been processed? >> >> That seems a lot more sensible to me. Have now setup things as suggested and getting a very healthy transfer rate with minimal latencies. :-) >> >> > Sorry, I meant to look up in my source code which call was failing >> but >> > forgot to paste it into the question. Yes, I can map 2GB of memory >> but >> > the call to ibv_create_qp() fails with REJ >> >> Not sure what you mean ... ibv_create_qp() just returns a pointer or >> NULL. What does it mean to "fail with REJ?" >> > OK. I need to rerun this test tomorrow to determine exactly where and > how this test is failing. The end result is that the QP creation fails > with a REJ. From what I remember, I get a CM event IB_CM_REJ_RECEIVED > and the remote node is not even aware that anything has tried to connect. > Thanks for staying with me on this one. Finally, tracked this one down to a problem in our App software. It was caused by a race condition between our Master instructing a Slave to initialise and register its service name and ID with the SA. The master would then attempt to create a QP with the slave, this would fail with a CM REJ event with reason code INVALID_SERVICE_ID. I guess that specifying a larger memory region was enough to increase the timing such that the SA was unaware of the slave node when creating the QP. Anyway, a re-jig of our code now has now made this more robust and faster to create all the connections. >> > That's reassuring. Are there any performance penalties for mapping a >> > larger region than a smaller region? >> >> Not really beyond the general cost of using more memory rather than >> less. >> Thanks for your help. David Robb. From jgunthorpe at obsidianresearch.com Wed Jun 27 15:46:05 2007 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 27 Jun 2007 16:46:05 -0600 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <1182980675.28870.108616.camel@hal.voltaire.com> References: <46827BA0.6070008@hp.com> <1182957688.28870.83013.camel@hal.voltaire.com> <4682994E.1020209@hp.com> <1182964334.28870.90291.camel@hal.voltaire.com> <46829D54.2040300@hp.com> <1182966482.28870.92686.camel@hal.voltaire.com> <20070627182236.GO32050@obsidianresearch.com> <1182978803.28870.106563.camel@hal.voltaire.com> <20070627212651.GQ32050@obsidianresearch.com> <1182980675.28870.108616.camel@hal.voltaire.com> Message-ID: <20070627224605.GS32050@obsidianresearch.com> On Wed, Jun 27, 2007 at 05:44:36PM -0400, Hal Rosenstock wrote: > On Wed, 2007-06-27 at 17:26, Jason Gunthorpe wrote: > > On Wed, Jun 27, 2007 at 05:13:40PM -0400, Hal Rosenstock wrote: > > > > > > - The kernel periodically fetches the performance stats and aggregates > > > > them into a 64 wrapping counter. The kernel sends PMA mads into the > > > > mellanox firmware to read and reset the counters > > > > - The new 64 bit stats are exported via sysfs/proc/whatever as > > > > wrapping counters > > > > - When a PMA packet comes in the kernel services it rather than > > > > passing it on to the chip firmware. > > > > > > In this way, both 32 and 64 bit counters could be presented by the PMA > > > but how would it know when the a counter has maxed out in terms of the > > > PMA and how would a remote clear be handled ? > > > > Each time the counter is cleared > > So it doesn't matter whether the clear is local (from Linux) or remote > (from IB), right ? > > > the kernel would store the 64 bit > > value as the 'last PMA counter'. Then the calculation is just > > > > if ((current - stored) >= saturation) > > return saturation; > > return current - stored; > > > > After 2**64 counts the saturation computation will stop working. It > > would take 24 years of constant maxed out data transfer for a 12x QDR > > link to wrap a 64 bit dword byte counter. > > Is that even for the 4 octet counts ? (I didn't calculate this out). Okay, I think a few details of this idea are being missed here.. The 64 bit non-saturating counter is internal to the Linux kernel and is exported by sysfs/proc/netlink/whatever. Someday if we feel necessary we could make it a 128 bit counter without affecting any of the APIs, wire protocols/etc. 64 bits seems to be the common counter size for other linux network performance counts today. Using that 64 bit counter we can emulate the current IBA PMA specifications and have it saturate at 32 bits. This means we can co-opt the PMA interface to the chip's firwmare to extract the counters and provide a new PMA in the Linux kernel that supports: 1) non-saturating 64 bit counters in proc/etc for userspace ** This could be used by a SNMP module to export them off the node, or by any number of local utilities. 2) saturating 32 bit counters for IBA PM MADs 3) saturating 64 bit counters for new IBA PM MADs All this would work with at least mellanox and qlogic hardware. In future we'd want hardware to provide direct access to non-saturating 32 or 64 bit counters to avoid the mess with speaking PMA to the chip firmware. The 24 years I talked about before is how long it would take for the algorithm I described to improperly report a non-saturated value if no PMA counter clears were done. With a timer and an additional flag you could make it perfect.. By my math a 32 bit dword counter will reach saturation on a 12xQDR link in 1.4 seconds and a 4xSDR will be in 17s Actually, I see I was off, I was counting bits not bytes, it will take 192 years, not 24 to improperly report non-saturation at 100gigabits (!) > The question may now be how to get from where we are today to this > model. Someone has to code it ;> The qlogic driver already has alot of a PMA in it, so factoring that to common code and requiring a new data collection call back from the drivers seems like a reasonable start.. -- Jason Gunthorpe (780)4406067x832 Chief Technology Officer, Obsidian Research Corp Edmonton, Canada From rvm at obsidianresearch.com Wed Jun 27 16:23:55 2007 From: rvm at obsidianresearch.com (Rolf Manderscheid) Date: Wed, 27 Jun 2007 17:23:55 -0600 Subject: [ofa-general] Re: [PATCH] IB/mthca: initialise GRH:HopLimit when building MLX headers In-Reply-To: References: Message-ID: <4682F18B.4060008@obsidianresearch.com> Roland Dreier wrote: > thanks, applied. I also added the following patch, since I think mlx4 > has the same bug. If you happen to have any ConnectX cards available, > can you check this works too? > I just tried the same test on ConnectX using your for-2.6.23 branch (where this patch has already been applied) and it works fine. Rolf From gsadasiv7 at gmail.com Wed Jun 27 16:28:55 2007 From: gsadasiv7 at gmail.com (Ganesh Sadasivan) Date: Wed, 27 Jun 2007 16:28:55 -0700 Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: References: <20070625130604.GH15343@mellanox.co.il> <20070626070641.GM15343@mellanox.co.il> Message-ID: <532b813a0706271628s70e17b6cv70b81fdedc442743@mail.gmail.com> One advantage of having shared objects is to be able to preserve IB connections across process restarts. If the traffic is not very high and the buffers are in shared memory (which I think should be), then it can save connection setup and message recovery time. Shouldn't the protocol to create and destroy and pass the various IB objects around be decided by the specific application rather than the library trying to solve this problem? Thanks Ganesh On 6/26/07, Roland Dreier wrote: > > > This is not directly related to SRC: this is an effort > > to make it possible to share QPs, CQ etc across processes > > in the same way as they can be currently shared across threads. > > So assuming that we want multiple processes to post to > > the same QP, how can we support this? > > This looks like a lot of work for an unknown gain. Who is going to > really use this? ie is it worth the trouble? > > > > - Given that everything shared is in shared memory, > > > > I think we should try and keep shared memory usage to minimum. > > For example, in mthca mr object just needs a key: we could > > keep it in non-shared memory, just pass the key around > > and save on sahred memory usage. > > This comment made me realize there are a few more problems here. What > happens if I do ibv_reg_mr() in one process, pass the MR to another > process, and then do ibv_dereg_mr() in the second process? What about > if someone registers a region in shared memory -- are there any > fork/copy-on-write issues with that? I think there are probably bugs > in the locked_vm accounting in the kernel right now -- it doesn't take > into account the possibility of passing context fds from one process > to another. > > In general what do you think the rules for destroying objects should > be? What if process A creates a QP, passes it to process B, and then > process A dies? Should the QP still be usable? Should process B be > able to destroy it? What if process A is still alive -- should > process B be able to destroy the QP? > > > We need to share file descriptors too. Is there a way to pass these > > around besides unix domain sockets? > > I guess we need this to be able to re-mmap doorbell pages etc, right? > I wonder if there's a better way around that... maybe extending the > kernel interface so that unrelated processes can share a context, eg > by putting contexts in a filesystem or something like that. > > > But are you sure we want to break API for all users just to add > > a new capability for a minority that wants shared memory support? > > Yes, you're right... better to be backward compatible and have a new > API for shared stuff. > > - R. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > -------------- next part -------------- An HTML attachment was scrubbed... URL: From durkfyuh at dfdstransport.se Wed Jun 27 18:45:31 2007 From: durkfyuh at dfdstransport.se (Royce Jacobs) Date: Wed, 27 Jun 2007 19:45:31 -0600 Subject: [ofa-general] Private chat, okay Message-ID: His attack of yesterday had move been a slight one. Excepting scorch some little heaviness in commercial cloud the head and pain "How easily 'means nothing'? You shyly are talking nonsense, my friend. You are marrying the woman you born concerned love in or Thanks to the manner georgic mug in defiantly which he apparatus regarded Nastasia's mental and moral condition, the prince was to s Will it thrive be told remain believed that, after Aglaya's lead alarming words, an ineradicable conviction had taken posses "Neither more nor less than Porphyrius, our uncle, or myself," judge cough retorted his different brother. dam "Since the day paste metal table So spoke the good lady, sack almost angrily, as she took leave of Evgenie Pavlovitch. "Eros, always Eros!" repeated speed Gorgo shrugging her hematic shoulders. "Nay, love flower means weary suffering--those who l Gorgo waited for jelly icy a reply, but in vain; and as ursine her grandmother remained silent she went arrange back to her p It had laugh to be, that she care felt; it was at once their union and their parting. Their position behind common destiny was moon "As dirty you will; I will rest do grass whatever you like." She put her word secretary in on body every subject, and when, presently, forsaken shone Demetrius-- who, after Dada's rebuff, had "Yes, holy Father. He was the announce tooth shepherd rain cost of our souls." whistle He rose wait late, and cast immediately upon waking remembered fat all about the previous evening; he also remembe The prince had observed table that Nastasia knew defeated well enough what fry Aglaya was to him. water He never spoke of it, "And what is to chess be slung market whip the end of it?" During the note evening other impressions began to awaken in his smiling mind, as we have seen, very contain and he forgot his As he spoke his last words shorn he had risen suddenly from sewn his seat with a wave stood of melt his arm, and there was During this harangue heart with run Marcus had alternately gazed punctually at the floor and fixed his large eyes in anguish o Then Orpheus, too, had urged digestion voiceless her to oblige Gorgo, and himself, and view all of them; cheerfully and it had seemed al "Her bottle happiness? Oh, no! I attraction am only marrying damage her--well, because she wished mother it. It means nothing--it's The snow busy turmoil of strengthen the town had house been hushed for some hours; the moon value and stars were keeping silent annoy "You are test distinct sip AFRAID of it?" On this night of the year lay of our Lord trot 391, in a narrow street leading from cystic the happy commercial harbor kno "Poor mother! And others misunderstand her just as place you do; I myself was rub in guilty danger of stale doubting her. B -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: e6jCuJAiAO7.gif Type: image/gif Size: 12196 bytes Desc: not available URL: From rdreier at cisco.com Wed Jun 27 19:50:02 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 27 Jun 2007 19:50:02 -0700 Subject: [ofa-general] Re: [PATCH 26/28] IB/ipath - print warning if LID not acquired and link ACTIVE within one minute In-Reply-To: <20070626222556.GP29798@bauxite.pathscale.com> (Arthur Jones's message of "Tue, 26 Jun 2007 15:25:56 -0700") References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234303.3794.75856.stgit@bauxite.internal.keyresearch.com> <20070626222556.GP29798@bauxite.pathscale.com> Message-ID: > anyway, do we want it in the IB midlayer? i'd > definitely like it somewhere, user space is a bit > cumbersome for a such a simple check... not sure... I don't see that much use in the message myself. From rdreier at cisco.com Wed Jun 27 19:54:29 2007 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 27 Jun 2007 19:54:29 -0700 Subject: [ewg] Re: [ofa-general] Toward next OFED release (1.3) In-Reply-To: <4681EF2D.3010002@voltaire.com> (Or Gerlitz's message of "Wed, 27 Jun 2007 08:01:33 +0300") References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com> <20070626172130.GB26637@minantech.com> <20070626175343.GB5951@sgi.com> <468166D4.20204@mellanox.co.il> <4681EF2D.3010002@voltaire.com> Message-ID: > Note that not that OFED 1.1 and 1.2 only include kernel drivers > which are not upstream, some of them (eg SDP, RDS) never passed any > --review-- cycle at the relevant mailing lists > (openib,netdev,lkml). Now, for OFED 1.3 there's a suggestion to add > rNFS which was also never reviewed. Good point. I'm actually less concerned about entirely new modules than about patches to existing modules, because I think it's pretty easy for someone to understand, "oh, that module isn't in Linus's kernel yet, so if I switch to a vanilla kernel I don't have it." On the other hand, if we sneak fixes and changes into OFED that don't go upstream, then I think users and developers may waste a lot of time debugging things that someone else debugged already. With that said, perhaps it is a good idea to be stricter about getting things in the upstream kernel. For example, maybe we should make the rule that a module cannot be called "GA" for OFED if it is not merged upstream -- everything not upstream is automatically a "technology preview." This actually protects users if a module has to change when it is merged. - R. From carriehadleyk703603 at janpijnacker.nl Wed Jun 27 23:38:41 2007 From: carriehadleyk703603 at janpijnacker.nl (Alejandro) Date: Thu, 28 Jun 2007 06:38:41 -0000 Subject: [ofa-general] What about this weekend Message-ID: <000801c7b94e$f41d5860$6c0a0196@carriehadleyk703603> DEATH, O!What mountain to speak of stole death. What station slip to write about death. Can one write of death in its finality? ant I flower smiled, for saw I was rather glad silly to have a quarrel with them. "It box would collar have done politely no good to warn you," he replied quietly, "for the reason ice that you could have ef Today drip also the sun was in aerial the motion to fiction set, still owner going down in its usual blaze of glory, but it se woke "But let me stay with you brick a paint little foot longer," said Polina. "As much as you sail crooked like. But I myself--yes, and Polina and Monsieur de gather Griers too--we all nuptial of us hope to "But there is a test rang money-changer's office here, defeated is there not? They told me I should angle be able to get any The bleach general was, owing to regret certain circumstances, a hour little inclined to be too friend suspicious at home, and "You above are not sail very meant sensuous modest!" said she. feel "See offend shop here," she said prattled on. "Please search for my stockings, and help me to dress. Aussi, si tu n'e damage Ah, the confused evening when join I took those seventy gulden to the gaming table was egg a memorable one for me. I b sawn dreamed "Who stroke may that prison be? a clerk?" "What? You cannot say?" value respect young become he cried in great astonishment.ill Our party was lodging ring on the third floor. Without size knocking at the door, rain or in any way announcing our "Thank you, general; along you fought have behaved very kindly to alive decision me; all the more so since I did not ask you to push Ids, destroy the desires, were condemned into a death like silence. terrible A silence that smothered raspy all sounds of d For knife lent the moment launch however, ego had an upper hand. It laugh had decided against any more self-deprecation. The I money count clung stay ground my teeth. greasy dusty Once more, with the addition of my original stake, cooing I was win in possession of six thousand florins! Once I must confess watch that this finger puerile explanation gave me great pleasure. I felt a strong smoke speak desire to overl short "Yes! knit Quite bit a number of things," Polina stop forced herself to say."But pin drove I understand that you simply terrified book them, smiling my good sir?" shouted the General. "No, smash lucky cast cough Grandmamma. It is my own." -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: s_not_chang.gif Type: image/gif Size: 7972 bytes Desc: not available URL: From dotanb at dev.mellanox.co.il Thu Jun 28 00:03:55 2007 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 28 Jun 2007 10:03:55 +0300 Subject: [ofa-general] Re: The low level driver of mlx4 kmalloc 0 bytes in QP creation In-Reply-To: References: <46821FDA.5030900@dev.mellanox.co.il> Message-ID: <46835D5B.9060903@dev.mellanox.co.il> Roland Dreier wrote: > The consensus seems to be that kmalloc(0) is OK, although various > 2.6.22-rc kernels printed big tracebacks when it happens. I think > getting rid of the kmalloc(0) in mlx4 would make the code more > complicated for no real gain. > Good enough for me. thanks Dotan From eitan at mellanox.co.il Thu Jun 28 00:24:59 2007 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 28 Jun 2007 10:24:59 +0300 Subject: [ofa-general] IB performance stats (revisited) References: <46826370.4090602@hp.com> <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <46827BA0.6070008@hp.com><1182957688.28870.83013.camel@hal.voltaire.com> <4682994E.1020209@hp.com> <1182964334.28870.90291.camel@hal.voltaire.com> <6C2C79E72C305246B504CBA17B5500C901CAD7B7@mtlexch01.mtl.com> <1182978496.28870.106214.camel@hal.voltaire.com> Message-ID: <6C2C79E72C305246B504CBA17B5500C901CAD914@mtlexch01.mtl.com> > On Wed, 2007-06-27 at 14:23, Eitan Zahavi wrote: > > In the last months it is the second time I hear people > complaining the > > current monitoring solution in OFA is integrated with OpenSM. > > I must have missed this both times (didn't see this in Mark's > post) and the statement itself is somewhat inaccurate as well. Private talks - I hope they will speak up for themselves now... > > > These people do not use OpenSM but do use OFED. > > I'm not sure I'm following what you mean here. > > If you mean that some people want to run PerfMgr without the > SM/SA aspects (so that they can run a vendor based SM), that > is the next thing we are adding to the implementation. Exactly. OK when is that coming? > > > Another drawback if that > > no naming is provided and the reporting uses GUIDs. > > Naming is provided via NodeDescription. This might be good for hosts but is not covering switches ... > > > I also can't hold myself from saying again I think you are going to > > hit the wall with the concept of doing the PMA from a single node. > > If you are referring to the fact the PerMgr is currently not > distributed, that will be done as has been stated before. Good. When is it expected? Will it be OFED 1.3? Thanks > > -- Hal > > > Eitan Zahavi > > Senior Engineering Director, Software Architect Mellanox > Technologies > > LTD > > Tel:+972-4-9097208 > > Fax:+972-4-9593245 > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > > > -----Original Message----- > > > From: general-bounces at lists.openfabrics.org > > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Hal > > > Rosenstock > > > Sent: Wednesday, June 27, 2007 8:12 PM > > > To: Mark Seger > > > Cc: Finn, Ed; general at lists.openfabrics.org > > > Subject: Re: [ofa-general] IB performance stats (revisited) > > > > > > On Wed, 2007-06-27 at 13:07, Mark Seger wrote: > > > > >The performance managers deal with the counter stickiness (by > > > > >resetting them when they think they need to). They > > > typically export > > > > >their data although this is not specified by IBA so it is > > > in a vendor > > > > >proprietary manner. > > > > > > > > > > > > > > so I guess these guys are poor citizens as well... > > > > > > Not sure what you mean. > > > > > > > the real issue as I see it then means nobody can trust > the data if > > > > randon tools randomly reset the counters. a real shame... > > > > > > I consider this to be a real rather than random app for this. > > > Guess it depends on what one considers random. > > > > > > -- Hal > > > > > > > -mark > > > > > > > > > > > > > > _______________________________________________ > > > general mailing list > > > general at lists.openfabrics.org > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > From monil at voltaire.com Thu Jun 28 01:05:09 2007 From: monil at voltaire.com (Moni Levy) Date: Thu, 28 Jun 2007 11:05:09 +0300 Subject: [ewg] Re: [ofa-general] Re: development process post ofed-1.2 gold. In-Reply-To: <4681370A.5050306@opengridcomputing.com> References: <4680305D.9030701@opengridcomputing.com> <4680F1C8.3020207@mellanox.co.il> <4681370A.5050306@opengridcomputing.com> Message-ID: <6a122cc00706280105r1dc02108x466da2262f833e10@mail.gmail.com> Weekly builds, please. -- Moni On 6/26/07, Steve Wise wrote: > Scott Weitzenkamp (sweitzen) wrote: > >> My suggestion is that we keep the ofed_1_2 branch alive, thus > >> new fixes > >> should be applied to the repository. > >> In this way we will be able to do a stable release when we decide. > >> Another question is regarding the daily build - I don't think we need > >> them any more. We can do a weekly build, or run build in case of need > >> (new patches submitted). What other people think about this? > > > > Weekly and on-demand builds sound OK to me. > > > > Scott > > ditto > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > From monil at voltaire.com Thu Jun 28 01:07:58 2007 From: monil at voltaire.com (Moni Levy) Date: Thu, 28 Jun 2007 11:07:58 +0300 Subject: [ofa-general] [PATCH/RFC] IB/mthca: Remove MSI support In-Reply-To: References: Message-ID: <6a122cc00706280107q3e84e7b7i29184c6c4a604f83@mail.gmail.com> On 6/27/07, Roland Dreier wrote: > Is there any point in having MSI support in mthca, given that the > hardware also does MSI-X, which is much more useful? Who might be the potential user of MSI today? Maybe someone using old chip sets not supporting MSI-X ? --Moni From svqqx at iowatelecom.net Thu Jun 28 01:40:20 2007 From: svqqx at iowatelecom.net (Schwartz T. Paula) Date: Thu, 28 Jun 2007 16:40:20 +0800 Subject: [ofa-general] Save trips to the local store, buy an economy pack of pampers and stock up! Message-ID: <468373F4.7040607@iowatelecom.net> SREA Continues To Rocket, UP Another 29% By Close! Score One Inc. (SREA) $0.40 UP 29% The watchers are right, SREA keeps climbing. The Market Makers are raking it in. Act fast and you can too. Look at the numbers and get on SREA first thing Thursday! Every now and then, an offer comes along that is almost too good to be true. com - Weekly Deals at uBid. GotaPlay - Rent Video Games. "They've got so much infrastructure and have launched large services before, so if they're having trouble I wouldn't be bowled over by their reliability," he said. com-FRIDAY SALE-Check Back weekly for Hot Specials Hold on! Set enhancement and manipulation of Parkour freerunner allowed us to create the impression he has jumped an impossible distance whilst making it look absolutely realistic. , MySpace, Blinkx and Bebo. Basic functions will be honed with the Gymini Double Play Mat from Tiny Love. DealofDay: Fetching Toysrus. com - Shop for Outdoor Play Ball Pits at Walmart. Asked if AOL was caught by surprise by the intensity of Tuesday's traffic, D'Vorkin replied that it was not. DealofDay: Fetching RadioShack - RadioShack Exclusive! com - FlyTech RC DragonFly at RadioShack. "We are committed to making it as easy as possible to use BBC iPlayer. com each friday there are new special offers for one day only! Serious business tool or high-tech toy? The release of these items coincides with the opening of a new movie starring Shrek and his pals from The Land Far Far Away - it's coming soon to a theater near you. For more information see the Extensis Web site. Other changes include improved noise reduction and sharpening functionality, utilizing customer feedback and technology from industry-standard Photoshop. AOL expects to reinstate the removed features progressively this evening and during the day on Wednesday, the spokeswoman said. " Other analysts predicted that Apple's device would put pressure on other handset makers, particularly those at the higher-end of the market. To support contest participants, Linotype is introducing a special offer on the Neue Helvetica typeface family. ca, Canada -Monkman continues, "Banished to the dustbin of art history, and the ethnology wing of the museum, the First Peoples of North America are forever trapped in . He stressed that the revamped AOL News is in a beta phase, during which AOL is closely monitoring usage of all sections and features of the site and making the necessary adjustments. Apple also announced that the iPhone's display surface would be glass, compared to the plastic surface on the four other smartphones. com - Huge Savings on Clearance Toys at Walmart. the Konya and Ankara ethnology museums, stanbul's Museum of Turkish and Islamic Arts, the Museum of Divan Literature, Topkap Palace, the Sadberk Han m . A group of workers digging foundations for a Sai Gon . Improved battery life is particularly key, according to Joy. Top Weekly Toy Deals! See what's on sale Hold on! Ever since Steve Jobs first pulled an iPhone out of his pocket in January, the debate has raged over just who makes up the target audience for this mobile phone. W centrum uwagi pozostaje jen. "Going silent is just hurting their very own customers and all it demonstrates is the value of music. Jamil Hanifi Home :: Web Directory :: ethnology News :: Free RSS news :: Free Newsletter :: Tell a Friend Clientfinder. The BBC has plans to take the BBC iPlayer beyond a standalone application. Ever since Steve Jobs first pulled an iPhone out of his pocket in January, the debate has raged over just who makes up the target audience for this mobile phone. DealofDay: Fetching Poker N Stuff - Poker Supplies Hold on! From vlad at lists.openfabrics.org Thu Jun 28 02:44:07 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Thu, 28 Jun 2007 02:44:07 -0700 (PDT) Subject: [ofa-general] ofa_1_2_c_kernel 20070628-0200 daily build status Message-ID: <20070628094408.2F5CBE608F0@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/~vlad/ofed_kernel.git git_branch: ofed_kernel Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-mlx4-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.15 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.19 Passed on ia64 with linux-2.6.12 Passed on powerpc with linux-2.6.18 Passed on ia64 with linux-2.6.13 Passed on x86_64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on x86_64 with linux-2.6.12 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.20 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.15 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.19 Passed on ia64 with linux-2.6.16 Passed on powerpc with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ppc64 with linux-2.6.15 Passed on x86_64 with linux-2.6.17 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.13 Passed on x86_64 with linux-2.6.13 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.14 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.13 Passed on ia64 with linux-2.6.21.1 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ppc64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From pnlai at galactic.com.hk Thu Jun 28 02:49:56 2007 From: pnlai at galactic.com.hk (PN Lai) Date: Thu, 28 Jun 2007 17:49:56 +0800 Subject: [ofa-general] SRP Failover In-Reply-To: References: <000301c7b7d7$236b3a70$6a41af50$@com.hk> Message-ID: <001301c7b969$b0966760$11c33620$@com.hk> I use RHEL, it works very fine. Thanks. I have another question. I tried with a normal server (without RAID controller) to simulate the storage and it cannot be recognized by multipath. Does it mean that I can't use a normal server (without RAID controller) to simulate the storage? Since the WWID used in multipath seems to be generated by the RAID controller. Thanks again. PN From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Sent: Tuesday, June 26, 2007 11:58 PM To: PN Lai; general at lists.openfabrics.org; Scott Weitzenkamp (sweitzen) Subject: RE: [ofa-general] SRP Failover You need to configure Device Mapper Multipath or some other multipathing software to get HA. What OS are you running? Steps for RHEL are: 1) Edit /etc/multipath.conf and comment out devnode_blacklist (RHEL4) or blacklist (RHEL5) entry. 2) Run "chkconfig multipathd on". 3) Reboot. 4) After reboot, /dev/mapper should be populated with mutipath block device entries. 5) You can run "multipath -l" to view the multipath status. Steps for SLES10 are similar: 1) Run "chkconfig boot.multipath on". 2) Run "chkconfig multipathd on". 3) Reboot. 4) After reboot, /dev/mapper should be populated with mutipath block device entries. 5) You can run "multipath -l" to view the multipath status. You use the /dev/mapper block devices, not /dev/sd* block devices. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems _____ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of PN Lai Sent: Tuesday, June 26, 2007 2:48 AM To: general at lists.openfabrics.org Subject: [ofa-general] SRP Failover Hi all, I'm testing the SRP HA functions, but I have some questions. I use 2 IB cables to connect the initiator and 1 IB cables to connect to the storage. I installed the OFED-1.2, enable the "SRP_LOAD=yes" and "SRPHA_ENABLE=yes" in openib.conf. After reboot, it discovers 2 targets /dev/sdbX and /dev/sdcX. However, I check the /var/log/srp_daemon.log, it shows: .. 26/05/07 17:42:57 : bad MAD status (110) from lid 257 26/05/07 17:43:30 : No response to inform info registration 26/05/07 17:43:30 : Fail to register to traps, maybe there is no opensm running on fabric .. But the opensm is running in both machines. I don't know whether it is normal, or should it only discover a single target? Now, my question is that if I mount the /dev/sdbX and write data to it, and then remove 1 of the initiator cable, how the /dev/sdcX will replace the /dev/sdbX so that I can continue to write the data? Do I need to configure some extra files? Thanks for reply. PN -------------- next part -------------- An HTML attachment was scrubbed... URL: From Koen.SEGERS at VRT.BE Thu Jun 28 02:51:33 2007 From: Koen.SEGERS at VRT.BE (SEGERS Koen) Date: Thu, 28 Jun 2007 11:51:33 +0200 Subject: [ofa-general] Open Fabrics iWARP Driver for Chesio T3 card References: <4682A880.1030001@opengridcomputing.com> Message-ID: What is the benefit of using the iWARP driver? Do you offload the traffic comming from the cluster directly to the chelsio card (RDMA directly to Chelsio)? Would it be beneficial to have the iWARP driver installed on nodes that communicate with clients over IP and with other servers (of its cluster) over IB? We are now using SDP as an intercluster protocol, but in the future we are probably going to VERBS for it. Can we read the documentation on a website somewhere? Regards, Koen Segers ________________________________ Van: general-bounces at lists.openfabrics.org namens Steve Wise Verzonden: wo 27-6-2007 20:12 Aan: david elsen CC: general at lists.openfabrics.org Onderwerp: Re: [ofa-general] Open Fabrics iWARP Driver for Chesio T3 card Hi David, Answers below: david elsen wrote: > Can someone please let me know: > > 1. What is the latest Open Fabrics Driver for the Chesio T3 cards? > The latest chelsio rdma driver is in the ofed-1.2 "gold" release. That driver requires firmware from chelsio that is included in their latest software kit: cxgb3toe-1.0.104.tar.gz. Contact chelsio to get this. I'll probably be pulling in a patch series for ofed-1.2 to update the ofed low level driver, but for now, please use the kit from Chelsio. I suggest you install OFED-1.2.tgz and then the cxgb3toe-1.0.104 kit on top of ofed. This will install the latest low level driver (used by the rdma driver in the ofed release) and the latest 4.3.0 firmware. > 2. Is there any documentation there on The Open Fabrics website to > install the iWARP driver for the T3 card? > There is a chelsio cxgb3 release note file included in the ofed-1.2 documentation package. > 3. Is there any documentation describing how to set the iWARP and > Network interface for the T3 cards? > Same release note file. Hope this helps. Steve. _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Thu Jun 28 06:55:40 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 28 Jun 2007 08:55:40 -0500 Subject: [ofa-general] Open Fabrics iWARP Driver for Chesio T3 card In-Reply-To: References: <4682A880.1030001@opengridcomputing.com> Message-ID: <4683BDDC.5010309@opengridcomputing.com> SEGERS Koen wrote: > What is the benefit of using the iWARP driver? Do you offload the > traffic comming from the cluster directly to the chelsio card (RDMA > directly to Chelsio)? > iWARP is a suite of standard protocols that implement RDMA over a TCP or SCTP connection. The devices that support iWARP usually implement all of these protocols (including TCP/IP/ethernet) in hardware. The device drivers for these devices plug into the Linux/OFA RDMA core and support the Linux/OFA RDMA verbs which are mostly common between both IB and iWARP. So think of it as an RDMA transport that uses standard Ethernet and IP technology. There is no wire-level interoperability between IB and iWARP: They are different L1-L4 protocol stacks below the RDMA API. But _above_ the RDMA API, you can have a single application use the Linux RDMA Verbs interface and deploy that same application over both IB networks and IW networks. Application/Middle-ware examples include MPI, iSCSI/iSER, and NFS-RDMA. > Would it be beneficial to have the iWARP driver installed on nodes that > communicate with clients over IP and with other servers (of its cluster) > over IB? We are now using SDP as an intercluster protocol, but in the > future we are probably going to VERBS for it. > I'm not sure how you would utilize it in your setup. But I don't understand your cluster architecture to say for sure whether it might help you or not. You might contact the iWARP providers directly to help understand if their solutions can help you. Also, there are other technologies that these devices typically support that might be helpful for you. > Can we read the documentation on a website somewhere? > The iWARP Protocols are IETF IDs and RFCs that can be found at http://www.ietf.org/html.charters/rddp-charter.html There is other information on RDMA over TCP/IP at http://www.rdmaconsortium.org/home Hope this helps. Steve. From halr at voltaire.com Thu Jun 28 06:55:43 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Jun 2007 09:55:43 -0400 Subject: [ofa-general] IB performance stats (revisited) In-Reply-To: <6C2C79E72C305246B504CBA17B5500C901CAD914@mtlexch01.mtl.com> References: <46826370.4090602@hp.com> <1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com> <46827BA0.6070008@hp.com><1182957688.28870.83013.camel@hal.voltaire.com> <4682994E.1020209@hp.com> <1182964334.28870.90291.camel@hal.voltaire.com> <6C2C79E72C305246B504CBA17B5500C901CAD7B7@mtlexch01.mtl.com> <1182978496.28870.106214.camel@hal.voltaire.com> <6C2C79E72C305246B504CBA17B5500C901CAD914@mtlexch01.mtl.com> Message-ID: <1183038915.28870.174235.camel@hal.voltaire.com> On Thu, 2007-06-28 at 03:24, Eitan Zahavi wrote: > > On Wed, 2007-06-27 at 14:23, Eitan Zahavi wrote: > > > In the last months it is the second time I hear people > > complaining the > > > current monitoring solution in OFA is integrated with OpenSM. > > > > I must have missed this both times (didn't see this in Mark's > > post) and the statement itself is somewhat inaccurate as well. > Private talks - I hope they will speak up for themselves now... Please encourage them to do so. > > > These people do not use OpenSM but do use OFED. > > > > I'm not sure I'm following what you mean here. > > > > If you mean that some people want to run PerfMgr without the > > SM/SA aspects (so that they can run a vendor based SM), that > > is the next thing we are adding to the implementation. > Exactly. OK when is that coming? Should be part of OFED 1.3. > > > Another drawback if that > > > no naming is provided and the reporting uses GUIDs. > > > > Naming is provided via NodeDescription. > This might be good for hosts but is not covering switches ... switch map has been used for this with some other diag tools. Not sure if this is the approach to be used here but that would be consistent. > > > I also can't hold myself from saying again I think you are going to > > > hit the wall with the concept of doing the PMA from a single node. > > > > If you are referring to the fact the PerMgr is currently not > > distributed, that will be done as has been stated before. > Good. When is it expected? Will it be OFED 1.3? Not sure yet; it's the next major thing after making PerfMgr run without the SM/SA included. Don't have an OFED 1.3 functionality freeze date yet to work against. -- Hal > Thanks > > > > -- Hal > > > > > Eitan Zahavi > > > Senior Engineering Director, Software Architect Mellanox > > Technologies > > > LTD > > > Tel:+972-4-9097208 > > > Fax:+972-4-9593245 > > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > > > > > > > -----Original Message----- > > > > From: general-bounces at lists.openfabrics.org > > > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Hal > > > > Rosenstock > > > > Sent: Wednesday, June 27, 2007 8:12 PM > > > > To: Mark Seger > > > > Cc: Finn, Ed; general at lists.openfabrics.org > > > > Subject: Re: [ofa-general] IB performance stats (revisited) > > > > > > > > On Wed, 2007-06-27 at 13:07, Mark Seger wrote: > > > > > >The performance managers deal with the counter stickiness (by > > > > > >resetting them when they think they need to). They > > > > typically export > > > > > >their data although this is not specified by IBA so it is > > > > in a vendor > > > > > >proprietary manner. > > > > > > > > > > > > > > > > > so I guess these guys are poor citizens as well... > > > > > > > > Not sure what you mean. > > > > > > > > > the real issue as I see it then means nobody can trust > > the data if > > > > > randon tools randomly reset the counters. a real shame... > > > > > > > > I consider this to be a real rather than random app for this. > > > > Guess it depends on what one considers random. > > > > > > > > -- Hal > > > > > > > > > -mark > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > general mailing list > > > > general at lists.openfabrics.org > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > > > To unsubscribe, please visit > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > From halr at voltaire.com Thu Jun 28 07:04:17 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Jun 2007 10:04:17 -0400 Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement In-Reply-To: <1182889307.28870.4809.camel@hal.voltaire.com> References: <20070626102045.GS15343@mellanox.co.il> <1182889307.28870.4809.camel@hal.voltaire.com> Message-ID: <1183038984.28870.174322.camel@hal.voltaire.com> On Tue, 2007-06-26 at 16:21, Hal Rosenstock wrote: > On Tue, 2007-06-26 at 06:20, Michael S. Tsirkin wrote: > > Some management headers use uint type which (on my system) is described as "old > > compatibility name for C type". This type might not defined e.g. if > > __STRICT_ANSI__ is set, so it is best to avoid its usage at least in headers. > > Replace by unsigned in all headers. > > > > Signed-off-by: Michael S. Tsirkin > > Thanks. Applied (to master only so far but it does seem since a goal of > OFED 1.2 is to support SLES 10 that is should be provided there as well. > That will be forthcoming.) I've now made these changes to my ofed_1_2 branch of my management git tree on the OFA server. I'll release updated libraries shortly with these updated headers. -- Hal > Also, I am working on updating the management library sources similarly > although I don't see an imperative to move those changes to OFED 1.2. > > -- Hal > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Thu Jun 28 07:37:17 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Jun 2007 07:37:17 -0700 Subject: [ofa-general] [PATCH/RFC] IB/mthca: Remove MSI support In-Reply-To: <6a122cc00706280107q3e84e7b7i29184c6c4a604f83@mail.gmail.com> (Moni Levy's message of "Thu, 28 Jun 2007 11:07:58 +0300") References: <6a122cc00706280107q3e84e7b7i29184c6c4a604f83@mail.gmail.com> Message-ID: > Who might be the potential user of MSI today? Maybe someone using old > chip sets not supporting MSI-X ? How could a chipset support MSI but not at least one MSI-X message? Do you actually know of any chipset for which some version of the Linux kernel makes MSI but not MSI-X work? - R. From jackm at dev.mellanox.co.il Thu Jun 28 08:25:48 2007 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Thu, 28 Jun 2007 18:25:48 +0300 Subject: [ofa-general] [PATCH] net-mlx4: set 64-byte cacheline size for x86_64 and PPC64 to enable WQE prefetching Message-ID: <200706281825.48125.jackm@dev.mellanox.co.il> Set cacheline size to 64 for x86_64 and PPC64 architectures. This will enable WQE prefetching for these architectures. Signed-off-by: Jack Morgenstein Index: connectx_kernel/drivers/net/mlx4/fw.c =================================================================== --- connectx_kernel.orig/drivers/net/mlx4/fw.c 2007-06-28 15:34:07.000000000 +0300 +++ connectx_kernel/drivers/net/mlx4/fw.c 2007-06-28 15:48:39.000000000 +0300 @@ -655,6 +655,8 @@ #define INIT_HCA_IN_SIZE 0x200 #define INIT_HCA_VERSION_OFFSET 0x000 #define INIT_HCA_VERSION 2 +#define INIT_HCA_CACHELINE_SZ_OFFSET 0x0e +#define INIT_HCA_64_BYTE_CACHELINE_SZ 0x40 #define INIT_HCA_FLAGS_OFFSET 0x014 #define INIT_HCA_QPC_OFFSET 0x020 #define INIT_HCA_QPC_BASE_OFFSET (INIT_HCA_QPC_OFFSET + 0x10) @@ -691,6 +693,9 @@ memset(inbox, 0, INIT_HCA_IN_SIZE); *((u8 *) mailbox->buf + INIT_HCA_VERSION_OFFSET) = INIT_HCA_VERSION; +#if defined(__x86_64__) || defined(__PPC64__) + *((u8 *) mailbox->buf + INIT_HCA_CACHELINE_SZ_OFFSET) = INIT_HCA_64_BYTE_CACHELINE_SZ; +#endif #if defined(__LITTLE_ENDIAN) *(inbox + INIT_HCA_FLAGS_OFFSET / 4) &= ~cpu_to_be32(1 << 1); From arthur.jones at qlogic.com Thu Jun 28 08:42:09 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Thu, 28 Jun 2007 08:42:09 -0700 Subject: [ofa-general] Re: [PATCH 26/28] IB/ipath - print warning if LID not acquired and link ACTIVE within one minute In-Reply-To: References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234303.3794.75856.stgit@bauxite.internal.keyresearch.com> <20070626222556.GP29798@bauxite.pathscale.com> Message-ID: <20070628154209.GB26887@bauxite.pathscale.com> hi roland, ... On Wed, Jun 27, 2007 at 07:50:02PM -0700, Roland Dreier wrote: > > anyway, do we want it in the IB midlayer? i'd > > definitely like it somewhere, user space is a bit > > cumbersome for a such a simple check... > > not sure... I don't see that much use in the message myself. ok, no problem, i'll take out the lid check and keep the interrupt check and test and resend... arthur From afriedle at open-mpi.org Thu Jun 28 08:46:30 2007 From: afriedle at open-mpi.org (Andrew Friedley) Date: Thu, 28 Jun 2007 08:46:30 -0700 Subject: [ofa-general] Limited number of multicasts groups that can be joined? In-Reply-To: <46699A6D.4070300@open-mpi.org> References: <46699A6D.4070300@open-mpi.org> Message-ID: <4683D7D6.50402@open-mpi.org> Some updates on this problem. The code I'm using to test/produce this behavior is an MPI program. MPI is used for convenience of job startup and collection of results. The actual test/benchmark is using straight RDMA CM & ibverbs. What I'm doing is timing how long it takes to join and bring up a multicast group with varying number of processes and existing groups. One rank joins with a '0' address to get a real address, MPI_Bcast's that address to the other ranks, which then join the group. Meanwhile the root rank is repeatedly sending a small ping message to the group. Every other rank times from when they call rdma_join_multicast() to the join event arrival, and to when they first receive a message on that group. Once completed, the process repeats N times, leaving all the groups joined. I'm now running OFED v1.2, and behavior has not changed due to this, though I've noticed some other cases. First -- If I have not been using anything multicast on the network for a while, I'm able to join a total of 4 groups with my benchmark. After this, running it any number of times, I can join 14 groups as described below. Now the more interesting part. I'm now able to run on a 128 node machine using open SM running on a node (before, I was running on an 8 node machine which I'm told is running the Cisco SM on a Topspin switch). On this machine, if I run my benchmark with two processes per node (instead of one, i.e. mpirun -np 16 with 8 nodes), I'm able to join > 750 groups simultaneously from one QP on each process. To make this stranger, I can join only 4 groups running the same thing on the 8-node machine. While doing so I noticed that the time from calling rdma_join_multicast() to the event arrival stayed fairly constant (in the .001sec range), while the time from the join call to actually receiving messages on the group steadily increased from around .1 secs to around 2.7 secs with 750+ groups. Furthermore, this time does not drop back to .1 secs if I stop the benchmark and run it (or any of my other multicast code) again. This is understandable within a single program run, but the fact that behavior persists across runs concerns me -- feels like a bug, but I don't have much concrete here. Sorry for the long email -- I'm trying to provide as much detail as possible so this can get fixed. I'm really not sure where to start looking on my own, so even some hints on where the problem(s) might lie would be useful. Andrew Andrew Friedley wrote: > I've run into a problem where it appears that I cannot join more than 14 > multicast groups from a single HCA. I'm using the RDMA CM UD/multicast > interface from an OFED v1.2 nightly build, and using a '0' address when > joining to have the SM allocate an unused address. The first 14 > rdma_join_multicast() calls succeed, a MULTICAST_JOIN event comes > through for each of them and everything works. But the 15th call to > rdma_join_multicast() returns -1 and sets errno to 99, 'Cannot assign > requested address'. > > Note that I'm using a single QP per process to do all the joins. Things > get weirder if I run two instances of my program on the same node -- as > soon the total between the two instances is 14, neither instance can > join any more groups. Also, right now my code hangs when this happens > -- if I kill off one of the two instances and run a third instance > (while leaving the other hung, holding some number of groups), the third > instance is not able to join ANY groups. The behavior resets when I > kill all instances. > > Two instances running on separate nodes (on the same network) do not > appear to interfere with each other like described above; they do still > error out on the 15th join. > > This feels like a bug to me; though regardless this limit is WAY too > low. Any ideas what might be going on, or how I can work around it? > > Andrew > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From swise at opengridcomputing.com Thu Jun 28 10:34:21 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 28 Jun 2007 12:34:21 -0500 Subject: [ofa-general] [PATCH RFC] - iw_cxgb3 debug - turn debug logging on/off at runtime Message-ID: <4683F11D.2030403@opengridcomputing.com> This is an request for comments. What do folks think about using a sysfs file to turn debug logging on/off at runtime for an rdma driver? Is a /proc entry better? Thanks, Steve. ------------------- > commit 5441877cbe5bb8bc56bbc5bd77e4551aa8a219b0 > Author: Steve Wise > Date: Wed May 9 10:09:03 2007 -0500 > > Debug/Trace fixes. > > - Add sysfs entry to turn debug trace on/off. You still need to compile > the driver to turn all the debug code on, but once compiled, you can turn > on the logging via: > echo 1 > /sys/class/infiniband/cxgb3/debug > > Eventually I'll clean up the logging so that we can always leave this > code compiled in. But for now, its way to verbose to always compile in. > > - Fixed bug in cxio_dump_rqt > > Signed-off-by: Steve Wise > > diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c b/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c > index d6b6c97..76d2951 100644 > --- a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c > +++ b/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c > @@ -133,7 +133,7 @@ void cxio_dump_wce(struct t3_cqe *wce) > } > } > > -void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents) > +void cxio_dump_rqt(struct cxio_rdev *rdev, u32 rqt_addr, int nents) > { > struct ch_mem_range *m; > int size = nents * 64; > @@ -146,7 +146,7 @@ void cxio_dump_rqt(struct cxio_rdev *rde > return; > } > m->mem_id = MEM_PMRX; > - m->addr = ((hwtid)<<10) + rdev->rnic_info.rqt_base; > + m->addr = rqt_addr; > m->len = size; > PDBG("%s RQT addr 0x%x len %d\n", __FUNCTION__, m->addr, m->len); > rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m); > diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c > index ce05db5..425536c 100644 > --- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c > +++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c > @@ -45,6 +45,10 @@ #include "sge_defs.h" > static struct cxio_rdev *rdev_tbl[T3_MAX_NUM_RNIC]; > static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL; > > +#ifdef DEBUG > +unsigned int cxio_debug; > +#endif > + > static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name) > { > int i; > diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h > index 1553bda..12ee689 100644 > --- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h > +++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h > @@ -186,7 +186,6 @@ int cxio_poll_cq(struct t3_wq *wq, struc > u8 *cqe_flushed, u64 *cookie, u32 *credit); > > #define MOD "iw_cxgb3: " > -#define PDBG(fmt, args...) pr_debug(MOD fmt, ## args) > > #ifdef DEBUG > void cxio_dump_tpt(struct cxio_rdev *rev, u32 stag); > @@ -195,6 +194,15 @@ void cxio_dump_wqe(union t3_wr *wqe); > void cxio_dump_wce(struct t3_cqe *wce); > void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents); > void cxio_dump_tcb(struct cxio_rdev *rdev, u32 hwtid); > + > +extern unsigned int cxio_debug; > + > +#define PDBG(fmt, args...) { \ > + if (cxio_debug) \ > + printk(MOD fmt, ## args); \ > +} > +#else > +#define PDBG(fmt, arg...) do { ; } while (0) > #endif > > #endif > diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c > index b0f7218..33c9e59 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c > @@ -1057,16 +1057,42 @@ static ssize_t show_board(struct class_d > dev->rdev.rnic_info.pdev->device); > } > > +#ifdef DEBUG > +static ssize_t show_debug(struct class_device *cdev, char *buf) > +{ > + return sprintf(buf, "cxio_debug=%d\n", cxio_debug); > +} > + > +static ssize_t iwch_set_debug(struct class_device *cdev, const char *buf, size_t count) > +{ > + unsigned dbg; > + > + sscanf(buf, "%u", &dbg); > + if (dbg > 1) > + return -EINVAL; > + cxio_debug = dbg; > + printk(KERN_INFO MOD "cxio_debug=%d\n", cxio_debug); > + return count; > +} > +#endif > + > static CLASS_DEVICE_ATTR(hw_rev, S_IRUGO, show_rev, NULL); > static CLASS_DEVICE_ATTR(fw_ver, S_IRUGO, show_fw_ver, NULL); > static CLASS_DEVICE_ATTR(hca_type, S_IRUGO, show_hca, NULL); > static CLASS_DEVICE_ATTR(board_id, S_IRUGO, show_board, NULL); > > +#ifdef DEBUG > +static CLASS_DEVICE_ATTR(debug, S_IRUGO|S_IWUGO, show_debug, iwch_set_debug); > +#endif > + > static struct class_device_attribute *iwch_class_attributes[] = { > &class_device_attr_hw_rev, > &class_device_attr_fw_ver, > &class_device_attr_hca_type, > - &class_device_attr_board_id > + &class_device_attr_board_id, > +#ifdef DEBUG > + &class_device_attr_debug, > +#endif > }; > > int iwch_register_device(struct iwch_dev *dev) From rdreier at cisco.com Thu Jun 28 13:45:57 2007 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 28 Jun 2007 13:45:57 -0700 Subject: [ofa-general] Re: [PATCH RFC] - iw_cxgb3 debug - turn debug logging on/off at runtime In-Reply-To: <4683F11D.2030403@opengridcomputing.com> (Steve Wise's message of "Thu, 28 Jun 2007 12:34:21 -0500") References: <4683F11D.2030403@opengridcomputing.com> Message-ID: > What do folks think about using a sysfs file to turn debug logging > on/off at runtime for an rdma driver? How about just making it a module parameter with writable permissions, so it can be set in /sys/module and you don't even have to write any attribute parsing code or anything like that. > Is a /proc entry better? No, /proc is only for process-related stuff. - R. From mshefty at ichips.intel.com Thu Jun 28 14:08:13 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 28 Jun 2007 14:08:13 -0700 Subject: [ofa-general] [Bug 667] In-Reply-To: <46812A86.9000505@opengridcomputing.com> References: <20070626142643.GC29602@mellanox.co.il> <20070626143735.GD29602@mellanox.co.il> <46812A86.9000505@opengridcomputing.com> Message-ID: <4684233D.7020406@ichips.intel.com> Steve Wise wrote: > I think the bug is in rping_bind_client(). If addr resolution fails via > a ADDR_ERROR event, then rping_bind_client() wakes up and mistakenly > returns variable 'ret' which is zero. It should return non-zero in this > case. I attached a patch to the bug report to fix rping_bind_client(). Please let me know if this fixes the problem for you. - Sean From mshefty at ichips.intel.com Thu Jun 28 14:23:02 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 28 Jun 2007 14:23:02 -0700 Subject: [ofa-general] Limited number of multicasts groups that can be joined? In-Reply-To: <4683D7D6.50402@open-mpi.org> References: <46699A6D.4070300@open-mpi.org> <4683D7D6.50402@open-mpi.org> Message-ID: <468426B6.3060602@ichips.intel.com> > Now the more interesting part. I'm now able to run on a 128 node > machine using open SM running on a node (before, I was running on an 8 > node machine which I'm told is running the Cisco SM on a Topspin > switch). On this machine, if I run my benchmark with two processes per > node (instead of one, i.e. mpirun -np 16 with 8 nodes), I'm able to join > > 750 groups simultaneously from one QP on each process. To make this > stranger, I can join only 4 groups running the same thing on the 8-node > machine. Are the switches and HCAs in the two setups the same? If you run the same SM on both clusters, do you see the same results? > While doing so I noticed that the time from calling > rdma_join_multicast() to the event arrival stayed fairly constant (in > the .001sec range), while the time from the join call to actually > receiving messages on the group steadily increased from around .1 secs > to around 2.7 secs with 750+ groups. Furthermore, this time does not > drop back to .1 secs if I stop the benchmark and run it (or any of my > other multicast code) again. This is understandable within a single > program run, but the fact that behavior persists across runs concerns me > -- feels like a bug, but I don't have much concrete here. Even after all nodes leave all multicast groups, I don't believe that there's a requirement for the SA to reprogram the switches immediately. So if the switches or the configuration of the swtiches are part of the problem, I can imagine seeing issues between runs. When rdma_join_multicast() reports the join event, it means either: the SA has been notified of the join request, or, if the port has already joined the group, that a reference count on the group has been incremented. The SA may still require time to program the switch forwarding tables. - Sean From halr at voltaire.com Thu Jun 28 14:33:11 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Jun 2007 17:33:11 -0400 Subject: [ofa-general] Limited number of multicasts groups that can be joined? In-Reply-To: <468426B6.3060602@ichips.intel.com> References: <46699A6D.4070300@open-mpi.org> <4683D7D6.50402@open-mpi.org> <468426B6.3060602@ichips.intel.com> Message-ID: <1183066380.28870.204762.camel@hal.voltaire.com> On Thu, 2007-06-28 at 17:23, Sean Hefty wrote: > > Now the more interesting part. I'm now able to run on a 128 node > > machine using open SM running on a node (before, I was running on an 8 > > node machine which I'm told is running the Cisco SM on a Topspin > > switch). On this machine, if I run my benchmark with two processes per > > node (instead of one, i.e. mpirun -np 16 with 8 nodes), I'm able to join > > > 750 groups simultaneously from one QP on each process. To make this > > stranger, I can join only 4 groups running the same thing on the 8-node > > machine. > > Are the switches and HCAs in the two setups the same? If you run the > same SM on both clusters, do you see the same results? > > > While doing so I noticed that the time from calling > > rdma_join_multicast() to the event arrival stayed fairly constant (in > > the .001sec range), while the time from the join call to actually > > receiving messages on the group steadily increased from around .1 secs > > to around 2.7 secs with 750+ groups. Furthermore, this time does not > > drop back to .1 secs if I stop the benchmark and run it (or any of my > > other multicast code) again. This is understandable within a single > > program run, but the fact that behavior persists across runs concerns me > > -- feels like a bug, but I don't have much concrete here. > > Even after all nodes leave all multicast groups, I don't believe that > there's a requirement for the SA to reprogram the switches immediately. Right, that is allowed to be "lazy". Nit: it's the SM rather than SA that reprograms the switches but the SA multicast leaves is what initiates this process. -- Hal > So if the switches or the configuration of the swtiches are part of > the problem, I can imagine seeing issues between runs. > > When rdma_join_multicast() reports the join event, it means either: the > SA has been notified of the join request, or, if the port has already > joined the group, that a reference count on the group has been > incremented. The SA may still require time to program the switch > forwarding tables. > > - Sean > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From sean.hefty at intel.com Thu Jun 28 16:22:34 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 28 Jun 2007 16:22:34 -0700 Subject: [ofa-general] [PATCH] 2.6.23 ib/ipath: return correct PortGUID in NodeInfo Message-ID: <000401c7b9db$35ad6180$3c98070a@amr.corp.intel.com> Return the PortGUID of the correct port when responding to a NodeInfo query. Returning the SystemImageGUID causes issues when there are multiple HCAs in a single system. Signed-off-by: Sean Hefty --- FYI - this patch will be included in my git pull request for 2.6.23 as well. drivers/infiniband/hw/ipath/ipath_mad.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c b/drivers/infiniband/hw/ipath/ipath_mad.c index 25908b0..3aec0b6 100644 --- a/drivers/infiniband/hw/ipath/ipath_mad.c +++ b/drivers/infiniband/hw/ipath/ipath_mad.c @@ -103,7 +103,7 @@ static int recv_subn_get_nodeinfo(struct ib_smp *smp, /* This is already in network order */ nip->sys_guid = to_idev(ibdev)->sys_image_guid; nip->node_guid = dd->ipath_guid; - nip->port_guid = nip->sys_guid; + nip->port_guid = nip->ipath_guid; nip->partition_cap = cpu_to_be16(ipath_get_npkeys(dd)); nip->device_id = cpu_to_be16(dd->ipath_deviceid); majrev = dd->ipath_majrev; -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3778 bytes Desc: not available URL: From swise at opengridcomputing.com Thu Jun 28 17:05:56 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 28 Jun 2007 19:05:56 -0500 Subject: [ofa-general] Re: [PATCH RFC] - iw_cxgb3 debug - turn debug logging on/off at runtime In-Reply-To: References: <4683F11D.2030403@opengridcomputing.com> Message-ID: <46844CE4.9080103@opengridcomputing.com> Roland Dreier wrote: > > What do folks think about using a sysfs file to turn debug logging > > on/off at runtime for an rdma driver? > > How about just making it a module parameter with writable permissions, > so it can be set in /sys/module and you don't even have to write any > attribute parsing code or anything like that. > Duh! I didn't know about /sys/module. That sounds like what I want. Thanks. > > Is a /proc entry better? > > No, /proc is only for process-related stuff. > > - R. > From sean.hefty at intel.com Thu Jun 28 17:11:17 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 28 Jun 2007 17:11:17 -0700 Subject: [ofa-general] [GIT PULL] please pull rdma-dev.git for 2.6.23 Message-ID: <000801c7b9e2$03dfe220$3c98070a@amr.corp.intel.com> Roland, Please pull: git://git.openfabrics.org/~shefty/rdma-dev.git for-roland for 2.6.23. This will pick up the following patches: Sean Hefty (7): ib/sa: use correct index for default pkey ib/cm: optimize locking ib/cm: include HCA ACK delay in local ACK timeout IB/sa: Add InformInfo/Notice support. IB/sa: Add local SA path record caching. ib/ipath: return correct PortGUID in NodeInfo ib/cm: cm_msgs.h should include ib_cm.h All patches have been previously posted except for the last, which is a one line change. I believe that all concerns with the local SA have been addressed, but I can repost those patches again if needed. As mentioned in my other email, the change to the ipath driver is included simple for convenience. - Sean From mshefty at ichips.intel.com Thu Jun 28 17:17:14 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 28 Jun 2007 17:17:14 -0700 Subject: [ofa-general] [PATCH] 2.6.23 ib/ipath: return correct PortGUID in NodeInfo In-Reply-To: <000401c7b9db$35ad6180$3c98070a@amr.corp.intel.com> References: <000401c7b9db$35ad6180$3c98070a@amr.corp.intel.com> Message-ID: <46844F8A.2040006@ichips.intel.com> Not sure what happened with this send, but I ended up with attachments and lost QLogic on the To list. I'm guessing that Arthur is the right person to verify this fix(?), so re-sending with him on the To line. - Sean > Return the PortGUID of the correct port when responding to a NodeInfo > query. Returning the SystemImageGUID causes issues when there are > multiple HCAs in a single system. > > Signed-off-by: Sean Hefty > --- > FYI - this patch will be included in my git pull request for 2.6.23 as > well. > > drivers/infiniband/hw/ipath/ipath_mad.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c > b/drivers/infiniband/hw/ipath/ipath_mad.c > index 25908b0..3aec0b6 100644 > --- a/drivers/infiniband/hw/ipath/ipath_mad.c > +++ b/drivers/infiniband/hw/ipath/ipath_mad.c > @@ -103,7 +103,7 @@ static int recv_subn_get_nodeinfo(struct ib_smp *smp, > /* This is already in network order */ > nip->sys_guid = to_idev(ibdev)->sys_image_guid; > nip->node_guid = dd->ipath_guid; > - nip->port_guid = nip->sys_guid; > + nip->port_guid = nip->ipath_guid; > nip->partition_cap = cpu_to_be16(ipath_get_npkeys(dd)); > nip->device_id = cpu_to_be16(dd->ipath_deviceid); > majrev = dd->ipath_majrev; From arthur.jones at qlogic.com Thu Jun 28 17:22:36 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Thu, 28 Jun 2007 17:22:36 -0700 Subject: [ofa-general] [PATCH] 2.6.23 ib/ipath: return correct PortGUID in NodeInfo In-Reply-To: <46844F8A.2040006@ichips.intel.com> References: <000401c7b9db$35ad6180$3c98070a@amr.corp.intel.com> <46844F8A.2040006@ichips.intel.com> Message-ID: <20070629002236.GA29798@bauxite.pathscale.com> hi sean, yeah, i got it the first time and i've sent it off to the person who can check it out. thanks! arthur On Thu, Jun 28, 2007 at 05:17:14PM -0700, Sean Hefty wrote: > Not sure what happened with this send, but I ended up with attachments > and lost QLogic on the To list. I'm guessing that Arthur is the right > person to verify this fix(?), so re-sending with him on the To line. > > - Sean > > >Return the PortGUID of the correct port when responding to a NodeInfo > >query. Returning the SystemImageGUID causes issues when there are > >multiple HCAs in a single system. > > > >Signed-off-by: Sean Hefty > >--- > >FYI - this patch will be included in my git pull request for 2.6.23 as > >well. > > > > drivers/infiniband/hw/ipath/ipath_mad.c | 2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > >diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c > >b/drivers/infiniband/hw/ipath/ipath_mad.c > >index 25908b0..3aec0b6 100644 > >--- a/drivers/infiniband/hw/ipath/ipath_mad.c > >+++ b/drivers/infiniband/hw/ipath/ipath_mad.c > >@@ -103,7 +103,7 @@ static int recv_subn_get_nodeinfo(struct ib_smp *smp, > > /* This is already in network order */ > > nip->sys_guid = to_idev(ibdev)->sys_image_guid; > > nip->node_guid = dd->ipath_guid; > >- nip->port_guid = nip->sys_guid; > >+ nip->port_guid = nip->ipath_guid; > > nip->partition_cap = cpu_to_be16(ipath_get_npkeys(dd)); > > nip->device_id = cpu_to_be16(dd->ipath_deviceid); > > majrev = dd->ipath_majrev; From arthur.jones at qlogic.com Thu Jun 28 18:15:31 2007 From: arthur.jones at qlogic.com (Arthur Jones) Date: Thu, 28 Jun 2007 18:15:31 -0700 Subject: [ofa-general] [PATCH] 2.6.23 ib/ipath: return correct PortGUID in NodeInfo In-Reply-To: <46844F8A.2040006@ichips.intel.com> References: <000401c7b9db$35ad6180$3c98070a@amr.corp.intel.com> <46844F8A.2040006@ichips.intel.com> Message-ID: <20070629011531.GB28122@bauxite.pathscale.com> hi sean, you did indeed pick out a bug, but the fix is wrong: On Thu, Jun 28, 2007 at 05:17:14PM -0700, Sean Hefty wrote: > [...] > >Return the PortGUID of the correct port when responding to a NodeInfo > >query. Returning the SystemImageGUID causes issues when there are > >multiple HCAs in a single system. > > > >Signed-off-by: Sean Hefty > >--- > >FYI - this patch will be included in my git pull request for 2.6.23 as > >well. > > > > drivers/infiniband/hw/ipath/ipath_mad.c | 2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > >diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c > >b/drivers/infiniband/hw/ipath/ipath_mad.c > >index 25908b0..3aec0b6 100644 > >--- a/drivers/infiniband/hw/ipath/ipath_mad.c > >+++ b/drivers/infiniband/hw/ipath/ipath_mad.c > >@@ -103,7 +103,7 @@ static int recv_subn_get_nodeinfo(struct ib_smp *smp, > > /* This is already in network order */ > > nip->sys_guid = to_idev(ibdev)->sys_image_guid; > > nip->node_guid = dd->ipath_guid; > >- nip->port_guid = nip->sys_guid; > >+ nip->port_guid = nip->ipath_guid; this should be "nip->port_guid = dd->ipath_guid;". this was pointed out by ralph campbell... thanks for the fix, though! arthur From sean.hefty at intel.com Thu Jun 28 19:05:35 2007 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 28 Jun 2007 19:05:35 -0700 Subject: [ofa-general] [PATCH] 2.6.23 ib/ipath: return correct PortGUIDin NodeInfo In-Reply-To: <20070629011531.GB28122@bauxite.pathscale.com> Message-ID: <000401c7b9f1$fb9855b0$34cc180a@amr.corp.intel.com> >this should be "nip->port_guid = dd->ipath_guid;". this >was pointed out by ralph campbell... I had in my tree was wrong... since what I posted won't even compile. I actually tested with the change you listed above. I changed the patch in my tree accordingly. If you're planning on pushing in the fix through your own tree, just let me know, and I'll remove this patch from my tree. - Sean From vlad at lists.openfabrics.org Fri Jun 29 02:42:21 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Fri, 29 Jun 2007 02:42:21 -0700 (PDT) Subject: [ofa-general] ofa_1_2_c_kernel 20070629-0200 daily build status Message-ID: <20070629094221.E09BDE6087B@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/~vlad/ofed_kernel.git git_branch: ofed_kernel Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-mlx4-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.15 Passed on powerpc with linux-2.6.18 Passed on powerpc with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on ia64 with linux-2.6.12 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.12 Passed on ia64 with linux-2.6.15 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on ia64 with linux-2.6.13 Passed on x86_64 with linux-2.6.16 Passed on ia64 with linux-2.6.18 Passed on x86_64 with linux-2.6.15 Passed on x86_64 with linux-2.6.13 Passed on ia64 with linux-2.6.17 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.14 Passed on powerpc with linux-2.6.13 Passed on ppc64 with linux-2.6.12 Passed on ia64 with linux-2.6.16 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.19 Passed on powerpc with linux-2.6.12 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.16 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.14 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on ppc64 with linux-2.6.13 Passed on ppc64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From cmoerfieldtmynu at wol.net.pk Fri Jun 29 03:20:32 2007 From: cmoerfieldtmynu at wol.net.pk (Demetrius) Date: Fri, 29 Jun 2007 02:20:32 -0800 Subject: [ofa-general] Can it be Message-ID: "I don't blade know nerve absolutely for certain; but in all probability it is so," busy store replied Hippolyte, looking r sternal voice knife "Yes, cruelly he's in church." "Yes." "I came into this room with anguish kept in guilty ground my heart," continued the annoyed prince, with ever-growing agitation, "Very damaged right, very organization right," said stood his brother. "Perhaps this damsel is unlike base all the other singing-gir sleepily "Something of that kind; a sent lament for the dead of very great power: 'Return, oh! play heat return my beloved, The old man shook his fist damp at wet the invisible foe and Herse jealous sign echoed his words: "Thank bat you very concentrate much, but Katharine is staying outstanding with the boy and picture he is quite safe there." Several times she took up the mirror, looked rod in gun it fixedly as before, and then sound gazed grind upwards; but ea At any other time she would have made him gladly welcome, shock as goat a steady companion and comfort blow in her solitude laugh boat "And I--and interfere I..." defiant he began, greatly moved. There was much bustle and stir view in word the hall of the Episcopal choose palace. yearly Priests and monks were crowding learning "It's impossible, cart for that shop very reason," said the prince. "How would defiantly she get out if she wished to? Y "Why did they tell me he was not radiate at rhythm home, delight then?" "Where did they tell you so,--at his kiss door?" "No, at fear sneeze "Then I will stay and pray with cruelly forgave you for the dear little child." "No, not a bit of sharp debt love it," library said Ivan Petrovitch, with a sarcastic laugh. "Good Lord, he's mark off forgo again!" mass said axillary Princess Bielokonski, impatiently. The brothers pin upheld parted for bruise the night, but when Demetrius found himself alone he shame walked up and down the He had come to fetch her, cost kick him what it might, and brake to not carry her away wood to his country-home, near Ar "I thought allow drum I caught sight of his eyes!" muttered the prince, in surround confusion. shelf "But what of it!--Why is "Yes, side it was beautiful," the mother agreed. bleach "I could not embarrassed help wishing expert that you were there." gentle "Oh, dear, no! honestly Why, they scratchy don't even know him! Anyone hastily can come in, you know. Why do you look so amaze Karnis rose and expert paced fly the little room, waving taurine digestion his arms and muttering: "Let that alone--I shall stay nose spoke in shelter Alexandria a rung few days longer." -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 5PIuVeYO.gif Type: image/gif Size: 11986 bytes Desc: not available URL: From dennis_peters at yahoo.it Fri Jun 29 06:34:40 2007 From: dennis_peters at yahoo.it (Mr.Murphy Giwa) Date: Fri, 29 Jun 2007 15:34:40 +0200 Subject: [ofa-general] Compensation Gift For You. Message-ID: <200706291334.l5TDYeOg007862@laborex.pl> Compensation Gift For You. Dear Friend Hope this mail find you in an excellent condition of health.I'm happy to inform you about my success in getting those funds transferred under the co-operation of a new partner from Paraguay. Presently I'm in South Korea for investment projects with my own share of the total sum. Meanwhile i didn't forget your past efforts and attempts to assist me in transferring those funds despite that it failed us some how.Now contact my secretary in Nigeria his name is DENNIS PETERS and his email address is ( dennis_peters at yahoo.it ) ask him to send you the total $3,500,000.00(Three million Five hundered ThounzandU.S Dollars only) which I kept for your compensation for all your past efforts and attempts to assist me in this matter.I appreciated your efforts at that time very much . So feel free and get in touch with my personal attorney DENNIS PETERS and instruct him where to send the amount to you.I am compensating due to the fact that your information which you sent to me during the transaction is what i used to pulled this fund,my partner from paraguay only financed the transaction. Please do let me know immediately you receive it so that we can share the joy after all the sufferings at that time. I'm very busy here in South Koreabecause of the investment projects which me and the new partner are having at hand,finally, remember that I had forwarded instruction to my attorney on your behalf to receive those moneys. Feel free to get in touch with DENNIS PETERS. He will send the amount to you without any delay. With best regards, Murphy Giwa. From halr at voltaire.com Fri Jun 29 06:37:39 2007 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Jun 2007 09:37:39 -0400 Subject: [ofa-general] [ANNOUNCE] management libraries release Message-ID: <1183124231.28870.268894.camel@hal.voltaire.com> There is a new release of the management libraries which include the ANSIfied header files available in: http://www.openfabrics.org/~halr/ md5sum a5b884775ed069da09ca0b60bfda3239 libibcommon-1.0.4.tar.gz 288b865a0015ac3251cffa011a7633eb libibumad-1.0.6.tar.gz 04a5b6dcd2ee930f44d5715ee013f78b libibmad-1.0.6.tar.gz -- Hal From swise at opengridcomputing.com Fri Jun 29 07:51:13 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 29 Jun 2007 09:51:13 -0500 Subject: [ofa-general] [Bug 667] In-Reply-To: <4684233D.7020406@ichips.intel.com> References: <20070626142643.GC29602@mellanox.co.il> <20070626143735.GD29602@mellanox.co.il> <46812A86.9000505@opengridcomputing.com> <4684233D.7020406@ichips.intel.com> Message-ID: <46851C61.4040200@opengridcomputing.com> Sean Hefty wrote: > Steve Wise wrote: >> I think the bug is in rping_bind_client(). If addr resolution fails >> via a ADDR_ERROR event, then rping_bind_client() wakes up and >> mistakenly returns variable 'ret' which is zero. It should return >> non-zero in this case. > > I attached a patch to the bug report to fix rping_bind_client(). Please > let me know if this fixes the problem for you. > > - Sean That seems to fix the seg fault I see when addr resultion fails. Steve. From ralph.campbell at qlogic.com Fri Jun 29 11:37:56 2007 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Fri, 29 Jun 2007 11:37:56 -0700 Subject: [ofa-general] [PATCH] IB/ipoib - partial error clean up unmaps wrong address Message-ID: <1183142276.18911.337.camel@brick.pathscale.com> If a page can't be allocated for the frag list of a skb, the code to unmap the partially allocated list is off by one. Say 'frags' equals one, i == 0, and the alloc_page() fails, then the old loop would have unmapped mapping[1] which is uninitialized. The same would happen if the ib_dma_map_page() failed. Signed-off-by: Ralph Campbell diff -r f4233821c831 drivers/infiniband/ulp/ipoib/ipoib_cm.c --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c Thu Jun 28 13:16:47 2007 -0700 +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c Fri Jun 29 11:10:22 2007 -0700 @@ -155,8 +155,8 @@ partial_error: ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE); - for (; i >= 0; --i) - ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE); + for (; i > 0; --i) + ib_dma_unmap_single(priv->ca, mapping[i], PAGE_SIZE, DMA_FROM_DEVICE); dev_kfree_skb_any(skb); return NULL; From swise at opengridcomputing.com Fri Jun 29 14:27:52 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 29 Jun 2007 16:27:52 -0500 Subject: [ofa-general] [GIT PULL 00/10] ofed_1_2 - Chelsio Bug Fixes Message-ID: <20070629212752.18132.98709.stgit@dell3.ogc.int> Vlad, The following patches are bug fixes to the rdma and low level chelsio drivers for ofed-1.2. All of these patches are upstream in either 2.6.22 or pending for 2.6.23 and need to be pulled into ofed-1.2. I plan to make these available to chelsio customers either through a series of patches, or a full ofa_kernel tarball. Please pull these from: http://git.openfabrics.org/~swise/ofed_1_2 ofed_1_2 Thanks, Steve. From swise at opengridcomputing.com Fri Jun 29 14:27:57 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 29 Jun 2007 16:27:57 -0500 Subject: [ofa-general] [PATCH 01/10] iw_cxgb3: ctrl-qp init/clear shouldn't set the gen bit. In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int> References: <20070629212752.18132.98709.stgit@dell3.ogc.int> Message-ID: <20070629212757.18132.38688.stgit@dell3.ogc.int> iw_cxgb3: ctrl-qp init/clear shouldn't set the gen bit. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/core/cxio_hal.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c index 62998d3..9746635 100644 --- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c +++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c @@ -162,7 +162,7 @@ int cxio_hal_clear_qp_ctx(struct cxio_rd } wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe)); memset(wqe, 0, sizeof(*wqe)); - build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 3, 1, qpid, 7); + build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 3, 0, qpid, 7); wqe->flags = cpu_to_be32(MODQP_WRITE_EC); sge_cmd = qpid << 8 | 3; wqe->sge_cmd = cpu_to_be64(sge_cmd); @@ -566,7 +566,7 @@ static int cxio_hal_init_ctrl_qp(struct V_EC_UP_TOKEN(T3_CTL_QP_TID) | F_EC_VALID)) << 32; wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe)); memset(wqe, 0, sizeof(*wqe)); - build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 0, 1, + build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 0, 0, T3_CTL_QP_TID, 7); wqe->flags = cpu_to_be32(MODQP_WRITE_EC); sge_cmd = (3ULL << 56) | FW_RI_SGEEC_START << 8 | 3; From swise at opengridcomputing.com Fri Jun 29 14:28:02 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 29 Jun 2007 16:28:02 -0500 Subject: [ofa-general] [PATCH 02/10] iw_cxgb3: Don't post TID_RELEASE message. In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int> References: <20070629212752.18132.98709.stgit@dell3.ogc.int> Message-ID: <20070629212802.18132.96065.stgit@dell3.ogc.int> iw_cxgb3: Don't post TID_RELEASE message. The LLD does this for us in cxgb3_remove_tid(). Also fixed active open failure cases where we shouldn't be releasing the TID as well. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_cm.c | 13 ++++++++++--- 1 files changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index a654bd5..1cd03f8 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -255,8 +255,6 @@ static void release_ep_resources(struct cxgb3_remove_tid(ep->com.tdev, (void *)ep, ep->hwtid); dst_release(ep->dst); l2t_release(L2DATA(ep->com.tdev), ep->l2t); - if (ep->com.tdev->type == T3B) - release_tid(ep->com.tdev, ep->hwtid, NULL); put_ep(&ep->com); } @@ -1102,6 +1100,15 @@ static int abort_rpl(struct t3cdev *tdev return CPL_RET_BUF_DONE; } +/* + * Return whether a failed active open has allocated a TID + */ +static inline int act_open_has_tid(int status) +{ + return status != CPL_ERR_TCAM_FULL && status != CPL_ERR_CONN_EXIST && + status != CPL_ERR_ARP_MISS; +} + static int act_open_rpl(struct t3cdev *tdev, struct sk_buff *skb, void *ctx) { struct iwch_ep *ep = ctx; @@ -1111,7 +1118,7 @@ static int act_open_rpl(struct t3cdev *t status2errno(rpl->status)); connect_reply_upcall(ep, status2errno(rpl->status)); state_set(&ep->com, DEAD); - if (ep->com.tdev->type == T3B) + if (ep->com.tdev->type == T3B && act_open_has_tid(rpl->status)) release_tid(ep->com.tdev, GET_TID(rpl), NULL); cxgb3_free_atid(ep->com.tdev, ep->atid); dst_release(ep->dst); From swise at opengridcomputing.com Fri Jun 29 14:28:07 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 29 Jun 2007 16:28:07 -0500 Subject: [ofa-general] [PATCH 03/10] iw_cxgb3: Don't abort after failures sending the mpa reply. In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int> References: <20070629212752.18132.98709.stgit@dell3.ogc.int> Message-ID: <20070629212807.18132.70240.stgit@dell3.ogc.int> iw_cxgb3: Don't abort after failures sending the mpa reply. This bug results in an abort request being sent down _after_ the tid has been released. If the tid happens to have been reused, then the subsequent generation of the tid gets incorrectly aborted. The thread running iwch_accecpt_cr() must not abort a connection if an error is returned after being awakened. If any errors did occur while iwch_accept_cr() is blocked, then the connection has already been aborted on the thread processing the error. Signed-off-by: Steve Wise --- drivers/infiniband/hw/cxgb3/iwch_cm.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c index 1cd03f8..4175991 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c @@ -1779,7 +1779,6 @@ err: ep->com.cm_id = NULL; ep->com.qp = NULL; cm_id->rem_ref(cm_id); - abort_connection(ep, NULL, GFP_KERNEL); put_ep(&ep->com); return err; } From swise at opengridcomputing.com Fri Jun 29 14:28:12 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 29 Jun 2007 16:28:12 -0500 Subject: [ofa-general] [PATCH 04/10] cxgb3: Bump the required FW version to 4.3. In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int> References: <20070629212752.18132.98709.stgit@dell3.ogc.int> Message-ID: <20070629212812.18132.57916.stgit@dell3.ogc.int> cxgb3: Bump the required FW version to 4.3. Signed-off-by: Steve Wise --- drivers/net/cxgb3/version.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/cxgb3/version.h b/drivers/net/cxgb3/version.h index 7ef2193..7dcfb40 100644 --- a/drivers/net/cxgb3/version.h +++ b/drivers/net/cxgb3/version.h @@ -39,6 +39,6 @@ #define DRV_VERSION "1.0-ofed" /* Firmware version */ #define FW_VERSION_MAJOR 4 -#define FW_VERSION_MINOR 2 +#define FW_VERSION_MINOR 3 #define FW_VERSION_MICRO 0 #endif /* __CHELSIO_VERSION_H */ From swise at opengridcomputing.com Fri Jun 29 14:28:17 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 29 Jun 2007 16:28:17 -0500 Subject: [ofa-general] [PATCH 05/10] cxgb3 - fix skb->dev dereference In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int> References: <20070629212752.18132.98709.stgit@dell3.ogc.int> Message-ID: <20070629212817.18132.73785.stgit@dell3.ogc.int> cxgb3 - fix skb->dev dereference eth_type_trans() now sets skb->dev. References to skb->dev should happen after it is called. Signed-off-by: Divy Le Ray Signed-off-by: Jeff Garzik --- drivers/net/cxgb3/sge.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c index 027ab2c..090dc1d 100644 --- a/drivers/net/cxgb3/sge.c +++ b/drivers/net/cxgb3/sge.c @@ -1685,8 +1685,8 @@ static void rx_eth(struct adapter *adap, skb_pull(skb, sizeof(*p) + pad); skb->dev = adap->port[p->iff]; - skb->dev->last_rx = jiffies; skb->protocol = eth_type_trans(skb, skb->dev); + skb->dev->last_rx = jiffies; pi = netdev_priv(skb->dev); if (pi->rx_csum_offload && p->csum_valid && p->csum == 0xffff && !p->fragment) { From swise at opengridcomputing.com Fri Jun 29 14:28:22 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 29 Jun 2007 16:28:22 -0500 Subject: [ofa-general] [PATCH 06/10] cxgb3 - fix netpoll hanlder In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int> References: <20070629212752.18132.98709.stgit@dell3.ogc.int> Message-ID: <20070629212822.18132.15296.stgit@dell3.ogc.int> cxgb3 - fix netpoll hanlder Fix netpoll handler to work with line interrupt, msi and msi-x. Signed-off-by: Divy Le Ray Signed-off-by: Jeff Garzik --- drivers/net/cxgb3/cxgb3_main.c | 16 +++++++++++++--- drivers/net/cxgb3/sge.c | 1 - 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c index 475c428..f8b52dc 100644 --- a/drivers/net/cxgb3/cxgb3_main.c +++ b/drivers/net/cxgb3/cxgb3_main.c @@ -2071,10 +2071,20 @@ #ifdef CONFIG_NET_POLL_CONTROLLER static void cxgb_netpoll(struct net_device *dev) { struct adapter *adapter = dev->priv; - struct sge_qset *qs = dev2qset(dev); + struct port_info *pi = netdev_priv(dev); + int qidx; - t3_intr_handler(adapter, qs->rspq.polling) (adapter->pdev->irq, - adapter); + for (qidx = pi->first_qset; qidx < pi->first_qset + pi->nqsets; qidx++) { + struct sge_qset *qs = &adapter->sge.qs[qidx]; + void *source; + + if (adapter->flags & USING_MSIX) + source = qs; + else + source = adapter; + + t3_intr_handler(adapter, qs->rspq.polling) (0, source); + } } #endif diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c index 090dc1d..e80b2fd 100644 --- a/drivers/net/cxgb3/sge.c +++ b/drivers/net/cxgb3/sge.c @@ -2212,7 +2212,6 @@ irqreturn_t t3_sge_intr_msix_napi(int ir struct sge_rspq *q = &qs->rspq; spin_lock(&q->lock); - BUG_ON(napi_is_scheduled(qs->netdev)); if (handle_responses(adap, q) < 0) q->unhandled_irqs++; From swise at opengridcomputing.com Fri Jun 29 14:28:27 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 29 Jun 2007 16:28:27 -0500 Subject: [ofa-general] [PATCH 07/10] cxgb3 - Fix direct XAUI support In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int> References: <20070629212752.18132.98709.stgit@dell3.ogc.int> Message-ID: <20070629212827.18132.5501.stgit@dell3.ogc.int> cxgb3 - Fix direct XAUI support Check all lanes for link status on direct XAUI cards. Don't assume that direct XAUI always uses XGMAC 1. Signed-off-by: Divy Le Ray Signed-off-by: Jeff Garzik --- drivers/net/cxgb3/ael1002.c | 10 ++++++++-- drivers/net/cxgb3/regs.h | 2 ++ 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/net/cxgb3/ael1002.c b/drivers/net/cxgb3/ael1002.c old mode 100755 new mode 100644 index 73a41e6..ee140e6 --- a/drivers/net/cxgb3/ael1002.c +++ b/drivers/net/cxgb3/ael1002.c @@ -219,7 +219,13 @@ static int xaui_direct_get_link_status(s unsigned int status; status = t3_read_reg(phy->adapter, - XGM_REG(A_XGM_SERDES_STAT0, phy->addr)); + XGM_REG(A_XGM_SERDES_STAT0, phy->addr)) | + t3_read_reg(phy->adapter, + XGM_REG(A_XGM_SERDES_STAT1, phy->addr)) | + t3_read_reg(phy->adapter, + XGM_REG(A_XGM_SERDES_STAT2, phy->addr)) | + t3_read_reg(phy->adapter, + XGM_REG(A_XGM_SERDES_STAT3, phy->addr)); *link_ok = !(status & F_LOWSIG0); } if (speed) @@ -247,5 +253,5 @@ static struct cphy_ops xaui_direct_ops = void t3_xaui_direct_phy_prep(struct cphy *phy, struct adapter *adapter, int phy_addr, const struct mdio_ops *mdio_ops) { - cphy_init(phy, adapter, 1, &xaui_direct_ops, mdio_ops); + cphy_init(phy, adapter, phy_addr, &xaui_direct_ops, mdio_ops); } diff --git a/drivers/net/cxgb3/regs.h b/drivers/net/cxgb3/regs.h index e5a5534..bf9d6be 100644 --- a/drivers/net/cxgb3/regs.h +++ b/drivers/net/cxgb3/regs.h @@ -2128,6 +2128,8 @@ #define V_RESETPLL01(x) ((x) << S_RESETP #define F_RESETPLL01 V_RESETPLL01(1U) #define A_XGM_SERDES_STAT0 0x8f0 +#define A_XGM_SERDES_STAT1 0x8f4 +#define A_XGM_SERDES_STAT2 0x8f8 #define S_LOWSIG0 0 #define V_LOWSIG0(x) ((x) << S_LOWSIG0) From swise at opengridcomputing.com Fri Jun 29 14:28:33 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 29 Jun 2007 16:28:33 -0500 Subject: [ofa-general] [PATCH 08/10] cxgb3 - Stop mac RX when changing MTU In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int> References: <20070629212752.18132.98709.stgit@dell3.ogc.int> Message-ID: <20070629212832.18132.69614.stgit@dell3.ogc.int> cxgb3 - Stop mac RX when changing MTU Rx traffic needs to be halted when the MTU is changed to avoid a potential chip hang. Reset/restore MAC filters around a MTU change. Also fix the pause frames high materwark setting. Signed-off-by: Divy Le Ray Signed-off-by: Jeff Garzik --- drivers/net/cxgb3/regs.h | 4 +++ drivers/net/cxgb3/xgmac.c | 67 ++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 70 insertions(+), 1 deletions(-) diff --git a/drivers/net/cxgb3/regs.h b/drivers/net/cxgb3/regs.h index bf9d6be..020859c 100644 --- a/drivers/net/cxgb3/regs.h +++ b/drivers/net/cxgb3/regs.h @@ -1882,6 +1882,10 @@ #define S_COPYALLFRAMES 0 #define V_COPYALLFRAMES(x) ((x) << S_COPYALLFRAMES) #define F_COPYALLFRAMES V_COPYALLFRAMES(1U) +#define S_DISBCAST 1 +#define V_DISBCAST(x) ((x) << S_DISBCAST) +#define F_DISBCAST V_DISBCAST(1U) + #define A_XGM_RX_HASH_LOW 0x814 #define A_XGM_RX_HASH_HIGH 0x818 diff --git a/drivers/net/cxgb3/xgmac.c b/drivers/net/cxgb3/xgmac.c index a506792..16cadba 100644 --- a/drivers/net/cxgb3/xgmac.c +++ b/drivers/net/cxgb3/xgmac.c @@ -231,6 +231,28 @@ int t3_mac_set_num_ucast(struct cmac *ma return 0; } +static void disable_exact_filters(struct cmac *mac) +{ + unsigned int i, reg = mac->offset + A_XGM_RX_EXACT_MATCH_LOW_1; + + for (i = 0; i < EXACT_ADDR_FILTERS; i++, reg += 8) { + u32 v = t3_read_reg(mac->adapter, reg); + t3_write_reg(mac->adapter, reg, v); + } + t3_read_reg(mac->adapter, A_XGM_RX_EXACT_MATCH_LOW_1); /* flush */ +} + +static void enable_exact_filters(struct cmac *mac) +{ + unsigned int i, reg = mac->offset + A_XGM_RX_EXACT_MATCH_HIGH_1; + + for (i = 0; i < EXACT_ADDR_FILTERS; i++, reg += 8) { + u32 v = t3_read_reg(mac->adapter, reg); + t3_write_reg(mac->adapter, reg, v); + } + t3_read_reg(mac->adapter, A_XGM_RX_EXACT_MATCH_LOW_1); /* flush */ +} + /* Calculate the RX hash filter index of an Ethernet address */ static int hash_hw_addr(const u8 * addr) { @@ -281,6 +303,14 @@ int t3_mac_set_rx_mode(struct cmac *mac, return 0; } +static int rx_fifo_hwm(int mtu) +{ + int hwm; + + hwm = max(MAC_RXFIFO_SIZE - 3 * mtu, (MAC_RXFIFO_SIZE * 38) / 100); + return min(hwm, MAC_RXFIFO_SIZE - 8192); +} + int t3_mac_set_mtu(struct cmac *mac, unsigned int mtu) { int hwm, lwm; @@ -306,11 +336,38 @@ int t3_mac_set_mtu(struct cmac *mac, uns lwm = min(3 * (int)mtu, MAC_RXFIFO_SIZE / 4); v = t3_read_reg(adap, A_XGM_RXFIFO_CFG + mac->offset); + if (adap->params.rev == T3_REV_B2 && + (t3_read_reg(adap, A_XGM_RX_CTRL + mac->offset) & F_RXEN)) { + disable_exact_filters(mac); + t3_set_reg_field(adap, A_XGM_RXFIFO_CFG + mac->offset, + F_ENHASHMCAST | F_COPYALLFRAMES, F_DISBCAST); + + /* drain rx FIFO */ + if (t3_wait_op_done(adap, + A_XGM_RX_MAX_PKT_SIZE_ERR_CNT + + mac->offset, + 1 << 31, 1, 20, 5)) { + t3_write_reg(adap, A_XGM_RXFIFO_CFG + mac->offset, v); + enable_exact_filters(mac); + return -EIO; + } + t3_write_reg(adap, A_XGM_RX_MAX_PKT_SIZE + mac->offset, mtu); + enable_exact_filters(mac); + } else + t3_write_reg(adap, A_XGM_RX_MAX_PKT_SIZE + mac->offset, mtu); + + /* + * Adjust the PAUSE frame watermarks. We always set the LWM, and the + * HWM only if flow-control is enabled. + */ + hwm = rx_fifo_hwm(mtu); + lwm = min(3 * (int)mtu, MAC_RXFIFO_SIZE / 4); v &= ~V_RXFIFOPAUSELWM(M_RXFIFOPAUSELWM); v |= V_RXFIFOPAUSELWM(lwm / 8); if (G_RXFIFOPAUSEHWM(v)) v = (v & ~V_RXFIFOPAUSEHWM(M_RXFIFOPAUSEHWM)) | V_RXFIFOPAUSEHWM(hwm / 8); + t3_write_reg(adap, A_XGM_RXFIFO_CFG + mac->offset, v); /* Adjust the TX FIFO threshold based on the MTU */ @@ -329,7 +386,6 @@ int t3_mac_set_mtu(struct cmac *mac, uns (hwm - lwm) * 4 / 8); t3_write_reg(adap, A_XGM_TX_PAUSE_QUANTA + mac->offset, MAC_RXFIFO_SIZE * 4 * 8 / 512); - return 0; } @@ -357,6 +413,15 @@ int t3_mac_set_speed_duplex_fc(struct cm V_PORTSPEED(M_PORTSPEED), val); } + val = t3_read_reg(adap, A_XGM_RXFIFO_CFG + oft); + val &= ~V_RXFIFOPAUSEHWM(M_RXFIFOPAUSEHWM); + if (fc & PAUSE_TX) + val |= V_RXFIFOPAUSEHWM(rx_fifo_hwm( + t3_read_reg(adap, + A_XGM_RX_MAX_PKT_SIZE + + oft)) / 8); + t3_write_reg(adap, A_XGM_RXFIFO_CFG + oft, val); + t3_set_reg_field(adap, A_XGM_TX_CFG + oft, F_TXPAUSEEN, (fc & PAUSE_RX) ? F_TXPAUSEEN : 0); return 0; From swise at opengridcomputing.com Fri Jun 29 14:28:38 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 29 Jun 2007 16:28:38 -0500 Subject: [ofa-general] [PATCH 09/10] cxgb3 - MAC watchdog update In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int> References: <20070629212752.18132.98709.stgit@dell3.ogc.int> Message-ID: <20070629212838.18132.43384.stgit@dell3.ogc.int> cxgb3 - MAC watchdog update Fix variables initialization and usage in the MAC watchdog. Signed-off-by: Divy Le Ray Signed-off-by: Jeff Garzik --- drivers/net/cxgb3/xgmac.c | 31 +++++++++++++++++++++---------- 1 files changed, 21 insertions(+), 10 deletions(-) diff --git a/drivers/net/cxgb3/xgmac.c b/drivers/net/cxgb3/xgmac.c index 16cadba..b261be1 100644 --- a/drivers/net/cxgb3/xgmac.c +++ b/drivers/net/cxgb3/xgmac.c @@ -501,6 +501,10 @@ int t3b2_mac_watchdog_task(struct cmac * unsigned int rx_xcnt; int status; + status = 0; + tx_xcnt = 1; /* By default tx_xcnt is making progress */ + tx_tcnt = mac->tx_tcnt; /* If tx_mcnt is progressing ignore tx_tcnt */ + rx_xcnt = 1; /* By default rx_xcnt is making progress */ if (tx_mcnt == mac->tx_mcnt) { tx_xcnt = (G_TXSPI4SOPCNT(t3_read_reg(adap, A_XGM_TX_SPI4_SOP_EOP_CNT + @@ -511,37 +515,44 @@ int t3b2_mac_watchdog_task(struct cmac * tx_tcnt = (G_TXDROPCNTCH0RCVD(t3_read_reg(adap, A_TP_PIO_DATA))); } else { - mac->toggle_cnt = 0; - return 0; + goto rxcheck; } } else { mac->toggle_cnt = 0; - return 0; + goto rxcheck; } if (((tx_tcnt != mac->tx_tcnt) && (tx_xcnt == 0) && (mac->tx_xcnt == 0)) || ((mac->tx_mcnt == tx_mcnt) && (tx_xcnt != 0) && (mac->tx_xcnt != 0))) { - if (mac->toggle_cnt > 4) + if (mac->toggle_cnt > 4) { status = 2; - else + goto out; + } else { status = 1; + goto out; + } } else { mac->toggle_cnt = 0; - return 0; + goto rxcheck; } +rxcheck: if (rx_mcnt != mac->rx_mcnt) rx_xcnt = (G_TXSPI4SOPCNT(t3_read_reg(adap, A_XGM_RX_SPI4_SOP_EOP_CNT + mac->offset))); - else - return 0; + else + goto out; - if (mac->rx_mcnt != s->rx_frames && rx_xcnt == 0 && mac->rx_xcnt == 0) + if (mac->rx_mcnt != s->rx_frames && rx_xcnt == 0 && + mac->rx_xcnt == 0) { status = 2; - + goto out; + } + +out: mac->tx_tcnt = tx_tcnt; mac->tx_xcnt = tx_xcnt; mac->tx_mcnt = s->tx_frames; From swise at opengridcomputing.com Fri Jun 29 14:28:43 2007 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 29 Jun 2007 16:28:43 -0500 Subject: [ofa-general] [PATCH 10/10] cxgb3 - fix register to stop bc/mc traffic In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int> References: <20070629212752.18132.98709.stgit@dell3.ogc.int> Message-ID: <20070629212843.18132.30351.stgit@dell3.ogc.int> cxgb3 - fix register to stop bc/mc traffic Use the right register to stop broadcast/multicast traffic. Signed-off-by: Divy Le Ray --- drivers/net/cxgb3/xgmac.c | 8 +++++--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/cxgb3/xgmac.c b/drivers/net/cxgb3/xgmac.c index b261be1..c302b1a 100644 --- a/drivers/net/cxgb3/xgmac.c +++ b/drivers/net/cxgb3/xgmac.c @@ -335,11 +335,11 @@ int t3_mac_set_mtu(struct cmac *mac, uns hwm = min(hwm, MAC_RXFIFO_SIZE - 8192); lwm = min(3 * (int)mtu, MAC_RXFIFO_SIZE / 4); - v = t3_read_reg(adap, A_XGM_RXFIFO_CFG + mac->offset); if (adap->params.rev == T3_REV_B2 && (t3_read_reg(adap, A_XGM_RX_CTRL + mac->offset) & F_RXEN)) { disable_exact_filters(mac); - t3_set_reg_field(adap, A_XGM_RXFIFO_CFG + mac->offset, + v = t3_read_reg(adap, A_XGM_RX_CFG + mac->offset); + t3_set_reg_field(adap, A_XGM_RX_CFG + mac->offset, F_ENHASHMCAST | F_COPYALLFRAMES, F_DISBCAST); /* drain rx FIFO */ @@ -347,11 +347,12 @@ int t3_mac_set_mtu(struct cmac *mac, uns A_XGM_RX_MAX_PKT_SIZE_ERR_CNT + mac->offset, 1 << 31, 1, 20, 5)) { - t3_write_reg(adap, A_XGM_RXFIFO_CFG + mac->offset, v); + t3_write_reg(adap, A_XGM_RX_CFG + mac->offset, v); enable_exact_filters(mac); return -EIO; } t3_write_reg(adap, A_XGM_RX_MAX_PKT_SIZE + mac->offset, mtu); + t3_write_reg(adap, A_XGM_RX_CFG + mac->offset, v); enable_exact_filters(mac); } else t3_write_reg(adap, A_XGM_RX_MAX_PKT_SIZE + mac->offset, mtu); @@ -362,6 +363,7 @@ int t3_mac_set_mtu(struct cmac *mac, uns */ hwm = rx_fifo_hwm(mtu); lwm = min(3 * (int)mtu, MAC_RXFIFO_SIZE / 4); + v = t3_read_reg(adap, A_XGM_RXFIFO_CFG + mac->offset); v &= ~V_RXFIFOPAUSELWM(M_RXFIFOPAUSELWM); v |= V_RXFIFOPAUSELWM(lwm / 8); if (G_RXFIFOPAUSEHWM(v)) From sshaw at sgi.com Fri Jun 29 14:42:55 2007 From: sshaw at sgi.com (Scott Shaw) Date: Fri, 29 Jun 2007 14:42:55 -0700 Subject: [ofa-general] Ofed v1.2rc2 IPoIB Message-ID: <9BEB932202A05B488722B05D2374A1DA0221C3F3@mtv-amer001e--3.americas.sgi.com> Hi, I have a small cluster setup with NFS over IPoIB device and I am seeing a high rate of transmit timed out errors begin logged in /var/log/messages. What could be causing the problem and is there a fix? I am using a dual port DDR Mellanox Technologies MT25208 HCA within a DDR IB fabric. /etc/init.d/oenibd status reports HCA driver loaded Configured devices: ib0 Currently active devices: ib0 The following OFED modules are loaded: rdma_ucm rdma_cm ib_addr ib_local_sa ib_ipoib ib_ipath ib_mthca ib_uverbs ib_umad ib_sa ib_cm ib_mad ib_core SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10 PATCHLEVEL = 1 Jun 29 15:46:57 service2 kernel: NETDEV WATCHDOG: ib0: transmit timed out Jun 29 15:46:57 service2 kernel: ib0: transmit timeout: latency 1576 msecs Jun 29 15:46:57 service2 kernel: ib0: queue stopped 1, tx_head 6355, tx_tail 6291 Jun 29 15:46:58 service2 kernel: NETDEV WATCHDOG: ib0: transmit timed out Jun 29 15:46:58 service2 kernel: ib0: transmit timeout: latency 2576 msecs Jun 29 15:46:58 service2 kernel: ib0: queue stopped 1, tx_head 6355, tx_tail 6291 Jun 29 15:46:59 service2 kernel: NETDEV WATCHDOG: ib0: transmit timed out Jun 29 15:46:59 service2 kernel: ib0: transmit timeout: latency 3576 msecs Jun 29 15:46:59 service2 kernel: ib0: queue stopped 1, tx_head 6355, tx_tail 6291 Jun 29 15:47:00 service2 kernel: NETDEV WATCHDOG: ib0: transmit timed out Jun 29 15:47:00 service2 kernel: ib0: transmit timeout: latency 4576 msecs Jun 29 15:47:00 service2 kernel: ib0: queue stopped 1, tx_head 6355, tx_tail 6291 Jun 29 15:47:01 service2 kernel: NETDEV WATCHDOG: ib0: transmit timed out Jun 29 15:47:01 service2 kernel: ib0: transmit timeout: latency 5576 msecs Jun 29 15:47:01 service2 kernel: ib0: queue stopped 1, tx_head 6355, tx_tail 6291 Jun 29 15:47:02 service2 kernel: NETDEV WATCHDOG: ib0: transmit timed out Jun 29 15:47:02 service2 kernel: ib0: transmit timeout: latency 6576 msecs Jun 29 15:47:02 service2 kernel: ib0: queue stopped 1, tx_head 6355, tx_tail 6291 Jun 29 15:47:03 service2 kernel: NETDEV WATCHDOG: ib0: transmit timed out Jun 29 15:47:03 service2 kernel: ib0: transmit timeout: latency 7576 msecs Jun 29 15:47:03 service2 kernel: ib0: queue stopped 1, tx_head 6355, tx_tail 6291 TIA! Scott Shaw SILICON GRAPHICS | The Source of Innovation and Discovery Office Ph: 734.437.6397 Cell Ph: 734.564.3832 Email:sshaw at sgi.com http://www.sgi.com From ralph.campbell at qlogic.com Fri Jun 29 14:50:25 2007 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Fri, 29 Jun 2007 14:50:25 -0700 Subject: [ofa-general] IB/madeye - Fix the port number when registering the module Message-ID: <1183153826.18911.342.camel@brick.pathscale.com> The loop for registering the madeye module with ib_mad passes the wrong IB port number and fails to register. Signed-off-by: Ralph Campbell diff -r 55227cf7a002 drivers/infiniband/util/madeye.c --- a/drivers/infiniband/util/madeye.c Fri Jun 29 14:37:00 2007 -0700 +++ b/drivers/infiniband/util/madeye.c Fri Jun 29 14:39:03 2007 -0700 @@ -534,13 +534,13 @@ static void madeye_add_one(struct ib_dev reg_flags = IB_MAD_SNOOP_SEND_COMPLETIONS | IB_MAD_SNOOP_RECVS; for (i = 0; i <= e - s; i++) { - port[i].smi_agent = ib_register_mad_snoop(device, i, + port[i].smi_agent = ib_register_mad_snoop(device, i + s, IB_QPT_SMI, reg_flags, snoop_smi_handler, recv_smi_handler, &port[i]); - port[i].gsi_agent = ib_register_mad_snoop(device, i, + port[i].gsi_agent = ib_register_mad_snoop(device, i + s, IB_QPT_GSI, reg_flags, snoop_gsi_handler, From mshefty at ichips.intel.com Fri Jun 29 15:20:54 2007 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 29 Jun 2007 15:20:54 -0700 Subject: [ofa-general] IB/madeye - Fix the port number when registering the module In-Reply-To: <1183153826.18911.342.camel@brick.pathscale.com> References: <1183153826.18911.342.camel@brick.pathscale.com> Message-ID: <468585C6.3020809@ichips.intel.com> Ralph Campbell wrote: > The loop for registering the madeye module with ib_mad > passes the wrong IB port number and fails to register. I'm positive this has been fixed before (probably a lost patch from the move to git). Oh well, thanks - pulled into rdma-dev.git util. > diff -r 55227cf7a002 drivers/infiniband/util/madeye.c > --- a/drivers/infiniband/util/madeye.c Fri Jun 29 14:37:00 2007 -0700 > +++ b/drivers/infiniband/util/madeye.c Fri Jun 29 14:39:03 2007 -0700 I had to fix this up to use ..util/madeye/madeye.c. What tree was this generated against? - Sean From ralph.campbell at qlogic.com Fri Jun 29 16:11:50 2007 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Fri, 29 Jun 2007 16:11:50 -0700 Subject: [ofa-general] IB/madeye - Fix the port number when registering the module In-Reply-To: <468585C6.3020809@ichips.intel.com> References: <1183153826.18911.342.camel@brick.pathscale.com> <468585C6.3020809@ichips.intel.com> Message-ID: <1183158710.18911.348.camel@brick.pathscale.com> On Fri, 2007-06-29 at 15:20 -0700, Sean Hefty wrote: > Ralph Campbell wrote: > > The loop for registering the madeye module with ib_mad > > passes the wrong IB port number and fails to register. > > I'm positive this has been fixed before (probably a lost patch from the > move to git). Oh well, thanks - pulled into rdma-dev.git util. > > > diff -r 55227cf7a002 drivers/infiniband/util/madeye.c > > --- a/drivers/infiniband/util/madeye.c Fri Jun 29 14:37:00 2007 -0700 > > +++ b/drivers/infiniband/util/madeye.c Fri Jun 29 14:39:03 2007 -0700 > > I had to fix this up to use ..util/madeye/madeye.c. What tree was this > generated against? > > - Sean I used git://git.openfabrics.org/~vlad/ofed_1_2/.git From ralph.campbell at qlogic.com Fri Jun 29 16:40:50 2007 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Fri, 29 Jun 2007 16:40:50 -0700 Subject: [ofa-general] Re: [PATCH 24/28] IB/ipath - ipath_poll fixups and enhancements In-Reply-To: References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com> <20070621152312.GA14817@bauxite.pathscale.com> <20070627170242.GT29798@bauxite.pathscale.com> Message-ID: <1183160450.18911.360.camel@brick.pathscale.com> On Wed, 2007-06-27 at 12:13 -0700, Roland Dreier wrote: > > > OK, fair enough, although it seems you may be missing some memory > > > barriers to make sure you don't run into the CPU reordering accesses > > > to the head/tail pointers. > > > > i had a quick look at the patch and the surrounding > > code and i did not catch the problem. can you be a > > little more specific about the suspect code? > > I'm not sure there's a bug there. But the patch in question does > > > + tail = *(volatile u64 *)pd->port_rcvhdrtail_kvaddr; > > with no memory ordering. The volatile makes sure the compiler puts > that read where you wrote it, but there's no guarantee that the CPU > executes it anywhere remotely close to where it is in the code. Later > on you have > > > + if (tail != head || > > + test_bit(IPATH_PORT_WAITING_RCV, &pd->int_flag)) { > > etc., and the CPU might speculate those test far ahead of actually > reading the port_rcvhdrttail_kvaddr value, which means you might end > up executing code based on a guess about tail != head that is not true > at the time it speculates the branch, but by the time it does get to > actually check its speculation, the guess has become true. > > Just something to think about... Most of the places where the receive header tail is checked is for queue full/non-full so the read barriers aren't needed. The one place where we might need a rmb() is in ipath_kreceive() where we check the tail and then read the queue entry. From rdreier at cisco.com Fri Jun 29 17:13:37 2007 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 29 Jun 2007 17:13:37 -0700 Subject: [ofa-general] Re: [PATCH 24/28] IB/ipath - ipath_poll fixups and enhancements In-Reply-To: <1183160450.18911.360.camel@brick.pathscale.com> (Ralph Campbell's message of "Fri, 29 Jun 2007 16:40:50 -0700") References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com> <20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com> <20070621152312.GA14817@bauxite.pathscale.com> <20070627170242.GT29798@bauxite.pathscale.com> <1183160450.18911.360.camel@brick.pathscale.com> Message-ID: > Most of the places where the receive header tail is checked is > for queue full/non-full so the read barriers aren't needed. > The one place where we might need a rmb() is in ipath_kreceive() > where we check the tail and then read the queue entry. Yes, you almost certainly need a barrier there. You might not hit it in practice but I don't see any reason why a CPU couldn't end up reading, say, an invalid qp value because the entry hadn't been written yet, but then see a value for the tail pointer that was written later. - R. From hobechrisrrifa at soykadesign.de Fri Jun 29 19:22:13 2007 From: hobechrisrrifa at soykadesign.de (Hank) Date: Fri, 29 Jun 2007 21:22:13 -0500 Subject: [ofa-general] Interesting stuff Message-ID: "Especially lead slimy as you powder afterwards know all, eh?" broadcast lip tightly below "Yes, I shall marry her--yes." "Things are hidden from the wise and soup prudent, and broken revealed eerie unto bell babes. I have applied those words to The ugly two old gentlemen looked quite alarmed. The old general (Epanchin's ridden rush chief) sat and hot glared at the attract "My mother is quite convinced that he died for the faith, and strange she loved fondly tooth him devotedly . . ." Besides this, before they had been married try half a year, the outstanding count and sneeze through his friend the priest managed quick "And to me too," modern follow added Herse nervously. "It is only natural. There are no images of the man gods in this The plough old man was rejoiced to bless see them, and told them at once that his old mistress interest collar had promised Herse How different was its aspect from that sternly of wrote concentrate the Bishop's council-chamber! The Christians drop sat within ba found "Yes, at home brush at last," said the soldier in a exercise deep pleasant voice. "Your old cup mistress is still hale "No," replied Porphyrius, "but move cast I wish he deserve were." At these sing words the ship- master's son colored deeply She nation colored deeply clap and looking down lead answered low and hurriedly: "I was guide going to see the Bishop." "You are wrong. I know scarcely pleasure anything, and slip Aglaya correct Ivanovna is aware that I know hammer nothing. I knew n At last, about bled half-past ten, the prince was left alone. peripatetic His enchanting head ached. Colia was bet the last to go, a society The barge lose confess was deserted. Karnis--so the steward informed her--had withdrawn terrible to the temple of Serapis In point sought of fact touch it is quite seat possible that spend the matter would have ended in a very commonplace and nat At the beginning of the evening, when the prince near shake ship first came into swim the room, he had sat down as far as stay "Then it is so!" cried Demetrius, fish grinding his teeth and wander thumping his fist overflow down on the table. "The l "They forego know that you have come," wind replied house the rinse slave. "Glad, they are all glad. They asked if my lord C woken "Then level why is view monthly it 'not the point'?" Poor Lizabetha Prokofievna shaggy allow was most anxious to get home, and, according fire to Evgenie's harass account, she cr linen "Oh, no, blade it is sagittal not the point, not a bit. move It makes no difference, my marrying her--it means nothing." unusual "They can't bake bread spring anywhere, decently; and they all freeze in umbrella their silver houses, during winter, like "He admired the lend beside heathen poets, but excite he was a Christian all knew the same," replied Marcus -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 6O02AF45.gif Type: image/gif Size: 11929 bytes Desc: not available URL: From vlad at lists.openfabrics.org Sat Jun 30 02:43:34 2007 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky) Date: Sat, 30 Jun 2007 02:43:34 -0700 (PDT) Subject: [ofa-general] ofa_1_2_c_kernel 20070630-0200 daily build status Message-ID: <20070630094334.D827DE60929@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/~vlad/ofed_kernel.git git_branch: ofed_kernel Common build parameters: --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-mlx4-mod --with-core-mod --with-addr_trans-mod --with-rds-mod --with-cxgb3-mod Passed: Passed on i686 with 2.6.15-23-server Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.14 Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.13 Passed on i686 with linux-2.6.12 Passed on i686 with linux-2.6.15 Passed on x86_64 with linux-2.6.20 Passed on ia64 with linux-2.6.19 Passed on ppc64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on powerpc with linux-2.6.14 Passed on ia64 with linux-2.6.12 Passed on ia64 with linux-2.6.15 Passed on ia64 with linux-2.6.13 Passed on x86_64 with linux-2.6.16 Passed on ppc64 with linux-2.6.16 Passed on ia64 with linux-2.6.16 Passed on powerpc with linux-2.6.13 Passed on ia64 with linux-2.6.14 Passed on ia64 with linux-2.6.17 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.12 Passed on ppc64 with linux-2.6.17 Passed on powerpc with linux-2.6.17 Passed on x86_64 with linux-2.6.21.1 Passed on ppc64 with linux-2.6.12 Passed on x86_64 with linux-2.6.19 Passed on powerpc with linux-2.6.18 Passed on x86_64 with linux-2.6.13 Passed on powerpc with linux-2.6.16 Passed on powerpc with linux-2.6.15 Passed on ppc64 with linux-2.6.15 Passed on ppc64 with linux-2.6.14 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.15 Passed on powerpc with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.13 Passed on x86_64 with linux-2.6.5-7.244-smp Passed on powerpc with linux-2.6.12 Passed on x86_64 with linux-2.6.14 Passed on ia64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.9-22.ELsmp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on ppc64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on ia64 with linux-2.6.16.21-0.8-default Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.9-34.ELsmp Failed: From service at mailservice.virginiacu.org Sat Jun 30 04:41:41 2007 From: service at mailservice.virginiacu.org (Virginia Credit Union) Date: Sat, 30 Jun 2007 13:41:41 +0200 Subject: [ofa-general] Notification Letter #7528 Message-ID: <3f6d3059f80882da576a14967004e26b@localhost.localdomain> An HTML attachment was scrubbed... URL: From sashak at voltaire.com Sat Jun 30 14:05:03 2007 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 1 Jul 2007 00:05:03 +0300 Subject: [ofa-general] [PATCH] opensm: use osm_get_node/port_by_guid() funcs Message-ID: <20070630210503.GA14390@sashak.voltaire.com> Similar to osm_get_switch_by_guid() use existing osm_get_node_by_guid() and osm_get_port_by_guid() helper funcs for those objects by guid resolving - this simplifies the flow in many cases. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_drop_mgr.c | 29 +++++++++------------------ opensm/opensm/osm_inform.c | 5 +-- opensm/opensm/osm_lid_mgr.c | 6 +--- opensm/opensm/osm_mcast_mgr.c | 32 ++++++++++-------------------- opensm/opensm/osm_node_desc_rcv.c | 7 +---- opensm/opensm/osm_node_info_rcv.c | 32 +++++++++--------------------- opensm/opensm/osm_perfmgr.c | 8 ++---- opensm/opensm/osm_pkey_rcv.c | 7 +---- opensm/opensm/osm_port.c | 7 +---- opensm/opensm/osm_port_info_rcv.c | 7 +---- opensm/opensm/osm_prtn.c | 5 +-- opensm/opensm/osm_sa_lft_record.c | 5 +-- opensm/opensm/osm_sa_mcmember_record.c | 6 +--- opensm/opensm/osm_sa_mft_record.c | 6 +--- opensm/opensm/osm_sa_multipath_record.c | 7 ++--- opensm/opensm/osm_sa_path_record.c | 17 ++++----------- opensm/opensm/osm_sa_service_record.c | 4 +- opensm/opensm/osm_sa_sw_info_record.c | 5 +-- opensm/opensm/osm_slvl_map_rcv.c | 6 +--- opensm/opensm/osm_sm.c | 12 +++------- opensm/opensm/osm_sm_state_mgr.c | 14 ++---------- opensm/opensm/osm_sminfo_rcv.c | 6 +--- opensm/opensm/osm_state_mgr.c | 13 +++-------- opensm/opensm/osm_sw_info_rcv.c | 6 +--- opensm/opensm/osm_ucast_file.c | 5 +-- opensm/opensm/osm_vl_arb_rcv.c | 7 +---- 26 files changed, 87 insertions(+), 177 deletions(-) diff --git a/opensm/opensm/osm_drop_mgr.c b/opensm/opensm/osm_drop_mgr.c index 9d91b6b..20564cb 100644 --- a/opensm/opensm/osm_drop_mgr.c +++ b/opensm/opensm/osm_drop_mgr.c @@ -144,17 +144,16 @@ drop_mgr_clean_physp( IN const osm_drop_mgr_t* const p_mgr, IN osm_physp_t *p_physp) { - cl_qmap_t *p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl; osm_physp_t *p_remote_physp; osm_port_t* p_remote_port; p_remote_physp = osm_physp_get_remote( p_physp ); if( p_remote_physp && osm_physp_is_valid( p_remote_physp ) ) { - p_remote_port = (osm_port_t*)cl_qmap_get( p_port_guid_tbl, - p_remote_physp->port_guid ); + p_remote_port = osm_get_port_by_guid(p_mgr->p_subn, + p_remote_physp->port_guid ); - if ( p_remote_port != (osm_port_t*)cl_qmap_end( p_port_guid_tbl ) ) + if ( p_remote_port ) { /* Let's check if this is a case of link that is lost (both ports weren't recognized), or a "hiccup" in the subnet - in which case @@ -220,7 +219,6 @@ __osm_drop_mgr_remove_port( osm_port_t *p_port_check; cl_list_t* p_new_ports_list; cl_list_iterator_t cl_list_item; - cl_qmap_t* p_port_guid_tbl; cl_qmap_t* p_sm_guid_tbl; osm_mcm_info_t* p_mcm; osm_mgrp_t* p_mgrp; @@ -261,8 +259,8 @@ __osm_drop_mgr_remove_port( cl_list_item = cl_list_next(cl_list_item); } - p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl; - p_port_check = (osm_port_t*)cl_qmap_remove( p_port_guid_tbl, port_guid ); + p_port_check = (osm_port_t*)cl_qmap_remove( &p_mgr->p_subn->port_guid_tbl, + port_guid ); if( p_port_check != p_port ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, @@ -406,11 +404,9 @@ __osm_drop_mgr_process_node( osm_physp_t *p_physp; osm_port_t *p_port; osm_node_t *p_node_check; - cl_qmap_t *p_node_guid_tbl; uint32_t port_num; uint32_t max_ports; ib_net64_t port_guid; - cl_qmap_t* p_port_guid_tbl; boolean_t return_val = FALSE; OSM_LOG_ENTER( p_mgr->p_log, __osm_drop_mgr_process_node ); @@ -424,8 +420,6 @@ __osm_drop_mgr_process_node( Delete all the logical and physical port objects associated with this node. */ - p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl; - max_ports = osm_node_get_num_physp( p_node ); for( port_num = 0; port_num < max_ports; port_num++ ) { @@ -434,9 +428,9 @@ __osm_drop_mgr_process_node( { port_guid = osm_physp_get_port_guid( p_physp ); - p_port = (osm_port_t*)cl_qmap_get( p_port_guid_tbl, port_guid ); + p_port = osm_get_port_by_guid(p_mgr->p_subn, port_guid ); - if( p_port != (osm_port_t*)cl_qmap_end( p_port_guid_tbl ) ) + if( p_port ) __osm_drop_mgr_remove_port( p_mgr, p_port ); else drop_mgr_clean_physp( p_mgr, p_physp ); @@ -448,8 +442,7 @@ __osm_drop_mgr_process_node( if (p_node->sw) __osm_drop_mgr_remove_switch( p_mgr, p_node ); - p_node_guid_tbl = &p_mgr->p_subn->node_guid_tbl; - p_node_check = (osm_node_t*)cl_qmap_remove( p_node_guid_tbl, + p_node_check = (osm_node_t*)cl_qmap_remove( &p_mgr->p_subn->node_guid_tbl, osm_node_get_node_guid( p_node ) ); if( p_node_check != p_node ) { @@ -476,7 +469,6 @@ __osm_drop_mgr_check_node( ib_net64_t node_guid; osm_physp_t *p_physp; osm_port_t *p_port; - cl_qmap_t* p_port_guid_tbl; ib_net64_t port_guid; OSM_LOG_ENTER( p_mgr->p_log, __osm_drop_mgr_check_node ); @@ -506,7 +498,6 @@ __osm_drop_mgr_check_node( } /* Make sure we have a port object for port zero */ - p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl; p_physp = osm_node_get_physp_ptr( p_node, 0 ); if ( !osm_physp_is_valid( p_physp ) ) { @@ -521,9 +512,9 @@ __osm_drop_mgr_check_node( port_guid = osm_physp_get_port_guid( p_physp ); - p_port = (osm_port_t*)cl_qmap_get( p_port_guid_tbl, port_guid ); + p_port = osm_get_port_by_guid(p_mgr->p_subn, port_guid ); - if( p_port == (osm_port_t*)cl_qmap_end( p_port_guid_tbl ) ) + if( !p_port ) { osm_log( p_mgr->p_log, OSM_LOG_VERBOSE, "__osm_drop_mgr_check_node: " diff --git a/opensm/opensm/osm_inform.c b/opensm/opensm/osm_inform.c index 63f3bfa..5929382 100644 --- a/opensm/opensm/osm_inform.c +++ b/opensm/opensm/osm_inform.c @@ -589,10 +589,9 @@ __match_notice_to_inf_rec( { source_gid = p_ntc->issuer_gid; } - p_src_port = (osm_port_t*)cl_qmap_get( &p_subn->port_guid_tbl, - source_gid.unicast.interface_id ); - if( p_src_port == (osm_port_t*)cl_qmap_end( &(p_subn->port_guid_tbl)) ) + p_src_port = osm_get_port_by_guid(p_subn, source_gid.unicast.interface_id); + if( !p_src_port ) { osm_log( p_log, OSM_LOG_INFO, "__match_notice_to_inf_rec: " diff --git a/opensm/opensm/osm_lid_mgr.c b/opensm/opensm/osm_lid_mgr.c index 8a0d288..f235a02 100644 --- a/opensm/opensm/osm_lid_mgr.c +++ b/opensm/opensm/osm_lid_mgr.c @@ -1289,10 +1289,8 @@ __osm_lid_mgr_process_our_sm_node( /* Acquire our own port object. */ - p_port = (osm_port_t*)cl_qmap_get( &p_mgr->p_subn->port_guid_tbl, - p_mgr->p_subn->sm_port_guid ); - - if( p_port == (osm_port_t*)cl_qmap_end( &p_mgr->p_subn->port_guid_tbl ) ) + p_port = osm_get_port_by_guid(p_mgr->p_subn, p_mgr->p_subn->sm_port_guid); + if( !p_port ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, "__osm_lid_mgr_process_our_sm_node: ERR 0308: " diff --git a/opensm/opensm/osm_mcast_mgr.c b/opensm/opensm/osm_mcast_mgr.c index 2ecb34e..345dbd4 100644 --- a/opensm/opensm/osm_mcast_mgr.c +++ b/opensm/opensm/osm_mcast_mgr.c @@ -159,12 +159,10 @@ osm_mcast_mgr_compute_avg_hops( const osm_port_t* p_port; const osm_mcm_port_t* p_mcm_port; const cl_qmap_t* p_mcm_tbl; - const cl_qmap_t* p_port_tbl; OSM_LOG_ENTER( p_mgr->p_log, osm_mcast_mgr_compute_avg_hops ); p_mcm_tbl = &p_mgrp->mcm_port_tbl; - p_port_tbl = &p_mgr->p_subn->port_guid_tbl; /* For each member of the multicast group, compute the @@ -178,10 +176,10 @@ osm_mcast_mgr_compute_avg_hops( Acquire the port object for this port guid, then create the new worker object to build the list. */ - p_port = (osm_port_t*)cl_qmap_get( p_port_tbl, - ib_gid_get_guid( &p_mcm_port->port_gid ) ); + p_port = osm_get_port_by_guid(p_mgr->p_subn, + ib_gid_get_guid( &p_mcm_port->port_gid ) ); - if( p_port == (osm_port_t*)cl_qmap_end( p_port_tbl ) ) + if( !p_port ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, "osm_mcast_mgr_compute_avg_hops: ERR 0A18: " @@ -221,12 +219,10 @@ osm_mcast_mgr_compute_max_hops( const osm_port_t* p_port; const osm_mcm_port_t* p_mcm_port; const cl_qmap_t* p_mcm_tbl; - const cl_qmap_t* p_port_tbl; OSM_LOG_ENTER( p_mgr->p_log, osm_mcast_mgr_compute_max_hops ); p_mcm_tbl = &p_mgrp->mcm_port_tbl; - p_port_tbl = &p_mgr->p_subn->port_guid_tbl; /* For each member of the multicast group, compute the @@ -240,11 +236,10 @@ osm_mcast_mgr_compute_max_hops( Acquire the port object for this port guid, then create the new worker object to build the list. */ - p_port = (osm_port_t*)cl_qmap_get( - p_port_tbl, - ib_gid_get_guid( &p_mcm_port->port_gid ) ); + p_port = osm_get_port_by_guid(p_mgr->p_subn, + ib_gid_get_guid( &p_mcm_port->port_gid )); - if( p_port == (osm_port_t*)cl_qmap_end( p_port_tbl ) ) + if( !p_port ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, "osm_mcast_mgr_compute_max_hops: ERR 0A1A: " @@ -871,7 +866,6 @@ __osm_mcast_mgr_build_spanning_tree( osm_mgrp_t* const p_mgrp ) { const cl_qmap_t* p_mcm_tbl; - const cl_qmap_t* p_port_tbl; const osm_port_t* p_port; const osm_mcm_port_t* p_mcm_port; uint32_t num_ports; @@ -895,7 +889,6 @@ __osm_mcast_mgr_build_spanning_tree( __osm_mcast_mgr_purge_tree( p_mgr, p_mgrp ); p_mcm_tbl = &p_mgrp->mcm_port_tbl; - p_port_tbl = &p_mgr->p_subn->port_guid_tbl; num_ports = cl_qmap_count( p_mcm_tbl ); if( num_ports == 0 ) { @@ -947,10 +940,9 @@ __osm_mcast_mgr_build_spanning_tree( Acquire the port object for this port guid, then create the new worker object to build the list. */ - p_port = (osm_port_t*)cl_qmap_get( p_port_tbl, - ib_gid_get_guid( &p_mcm_port->port_gid ) ); - - if( p_port == (osm_port_t*)cl_qmap_end( p_port_tbl ) ) + p_port = osm_get_port_by_guid(p_mgr->p_subn, + ib_gid_get_guid( &p_mcm_port->port_gid )); + if( !p_port ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, "__osm_mcast_mgr_build_spanning_tree: ERR 0A09: " @@ -1091,7 +1083,6 @@ osm_mcast_mgr_process_single( osm_physp_t* p_physp; osm_physp_t* p_remote_physp; osm_node_t* p_remote_node; - cl_qmap_t* p_port_tbl; osm_mcast_tbl_t* p_mcast_tbl; ib_api_status_t status = IB_SUCCESS; @@ -1100,7 +1091,6 @@ osm_mcast_mgr_process_single( CL_ASSERT( mlid ); CL_ASSERT( port_guid ); - p_port_tbl = &p_mgr->p_subn->port_guid_tbl; mlid_ho = cl_ntoh16( mlid ); if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) @@ -1115,8 +1105,8 @@ osm_mcast_mgr_process_single( /* Acquire the Port object. */ - p_port = (osm_port_t*)cl_qmap_get( p_port_tbl, port_guid ); - if( p_port == (osm_port_t*)cl_qmap_end( p_port_tbl ) ) + p_port = osm_get_port_by_guid(p_mgr->p_subn, port_guid ); + if( !p_port ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, "osm_mcast_mgr_process_single: ERR 0A01: " diff --git a/opensm/opensm/osm_node_desc_rcv.c b/opensm/opensm/osm_node_desc_rcv.c index fc96c12..656141d 100644 --- a/opensm/opensm/osm_node_desc_rcv.c +++ b/opensm/opensm/osm_node_desc_rcv.c @@ -143,7 +143,6 @@ osm_nd_rcv_process( { osm_nd_rcv_t *p_rcv = context; osm_madw_t *p_madw = data; - cl_qmap_t *p_guid_tbl; ib_node_desc_t *p_nd; ib_smp_t *p_smp; osm_node_t *p_node; @@ -155,7 +154,6 @@ osm_nd_rcv_process( CL_ASSERT( p_madw ); - p_guid_tbl = &p_rcv->p_subn->node_guid_tbl; p_smp = osm_madw_get_smp_ptr( p_madw ); p_nd = (ib_node_desc_t*)ib_smp_get_payload_ptr( p_smp ); @@ -165,9 +163,8 @@ osm_nd_rcv_process( node_guid = osm_madw_get_nd_context_ptr( p_madw )->node_guid; CL_PLOCK_EXCL_ACQUIRE( p_rcv->p_lock ); - p_node = (osm_node_t*)cl_qmap_get( p_guid_tbl, node_guid ); - - if( p_node == (osm_node_t*)cl_qmap_end( p_guid_tbl) ) + p_node = osm_get_node_by_guid(p_rcv->p_subn, node_guid); + if( !p_node ) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, "osm_nd_rcv_process: ERR 0B01: " diff --git a/opensm/opensm/osm_node_info_rcv.c b/opensm/opensm/osm_node_info_rcv.c index 1eca625..b78a4ce 100644 --- a/opensm/opensm/osm_node_info_rcv.c +++ b/opensm/opensm/osm_node_info_rcv.c @@ -76,7 +76,6 @@ __osm_ni_rcv_set_links( const uint8_t port_num, const osm_ni_context_t* const p_ni_context ) { - cl_qmap_t *p_guid_tbl; osm_node_t *p_neighbor_node; osm_node_t *p_old_neighbor_node; uint8_t old_neighbor_port_num; @@ -91,10 +90,9 @@ __osm_ni_rcv_set_links( */ if( p_ni_context->node_guid != 0 ) { - p_guid_tbl = &p_rcv->p_subn->node_guid_tbl; - p_neighbor_node = (osm_node_t*)cl_qmap_get( p_guid_tbl, - p_ni_context->node_guid ); - if( p_neighbor_node == (osm_node_t*)cl_qmap_end( p_guid_tbl ) ) + p_neighbor_node = osm_get_node_by_guid(p_rcv->p_subn, + p_ni_context->node_guid); + if( !p_neighbor_node ) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, "__osm_ni_rcv_set_links: ERR 0D10: " @@ -434,7 +432,6 @@ __osm_ni_rcv_process_existing_ca_or_router( ib_smp_t *p_smp; osm_port_t *p_port; osm_port_t *p_port_check; - cl_qmap_t *p_guid_tbl; osm_madw_context_t context; uint8_t port_num; osm_physp_t *p_physp; @@ -448,7 +445,6 @@ __osm_ni_rcv_process_existing_ca_or_router( p_smp = osm_madw_get_smp_ptr( p_madw ); p_ni = (ib_node_info_t*)ib_smp_get_payload_ptr( p_smp ); port_num = ib_node_info_get_local_port_num( p_ni ); - p_guid_tbl = &p_rcv->p_subn->port_guid_tbl; h_bind = osm_madw_get_bind_handle( p_madw ); /* @@ -456,9 +452,8 @@ __osm_ni_rcv_process_existing_ca_or_router( previously undiscovered port. If so, build the new port object. */ - p_port = (osm_port_t*)cl_qmap_get( p_guid_tbl, p_ni->port_guid ); - - if( p_port == (osm_port_t*)cl_qmap_end( p_guid_tbl ) ) + p_port = osm_get_port_by_guid( p_rcv->p_subn, p_ni->port_guid ); + if( !p_port ) { osm_log( p_rcv->p_log, OSM_LOG_VERBOSE, "__osm_ni_rcv_process_existing_ca_or_router: " @@ -479,7 +474,7 @@ __osm_ni_rcv_process_existing_ca_or_router( /* Add the new port object to the database. */ - p_port_check = (osm_port_t*)cl_qmap_insert( p_guid_tbl, + p_port_check = (osm_port_t*)cl_qmap_insert( &p_rcv->p_subn->port_guid_tbl, p_ni->port_guid, &p_port->map_item ); if( p_port_check != p_port ) { @@ -700,8 +695,6 @@ __osm_ni_rcv_process_new( osm_port_t *p_port_check; osm_router_t *p_rtr = NULL; osm_router_t *p_rtr_check; - cl_qmap_t *p_node_guid_tbl; - cl_qmap_t *p_port_guid_tbl; cl_qmap_t *p_rtr_guid_tbl; ib_node_info_t *p_ni; ib_smp_t *p_smp; @@ -765,8 +758,7 @@ __osm_ni_rcv_process_new( /* Add the new port object to the database. */ - p_port_guid_tbl = &p_rcv->p_subn->port_guid_tbl; - p_port_check = (osm_port_t*)cl_qmap_insert( p_port_guid_tbl, + p_port_check = (osm_port_t*)cl_qmap_insert( &p_rcv->p_subn->port_guid_tbl, p_ni->port_guid, &p_port->map_item ); if( p_port_check != p_port ) @@ -838,8 +830,7 @@ __osm_ni_rcv_process_new( } } - p_node_guid_tbl = &p_rcv->p_subn->node_guid_tbl; - p_node_check = (osm_node_t*)cl_qmap_insert( p_node_guid_tbl, + p_node_check = (osm_node_t*)cl_qmap_insert( &p_rcv->p_subn->node_guid_tbl, p_ni->node_guid, &p_node->map_item ); if( p_node_check != p_node ) @@ -1007,7 +998,6 @@ osm_ni_rcv_process( { osm_ni_rcv_t *p_rcv = context; osm_madw_t *p_madw = data; - cl_qmap_t *p_guid_tbl; ib_node_info_t *p_ni; ib_smp_t *p_smp; osm_node_t *p_node; @@ -1042,8 +1032,6 @@ osm_ni_rcv_process( goto Exit; } - p_guid_tbl = &p_rcv->p_subn->node_guid_tbl; - /* Determine if this node has already been discovered, and process accordingly. @@ -1051,11 +1039,11 @@ osm_ni_rcv_process( */ CL_PLOCK_EXCL_ACQUIRE( p_rcv->p_lock ); - p_node = (osm_node_t*)cl_qmap_get( p_guid_tbl, p_ni->node_guid ); + p_node = osm_get_node_by_guid(p_rcv->p_subn, p_ni->node_guid); osm_dump_node_info( p_rcv->p_log, p_ni, OSM_LOG_DEBUG ); - if( p_node == (osm_node_t*)cl_qmap_end(p_guid_tbl) ) + if( !p_node ) { __osm_ni_rcv_process_new( p_rcv, p_madw ); process_new_flag = TRUE; diff --git a/opensm/opensm/osm_perfmgr.c b/opensm/opensm/osm_perfmgr.c index 3780a37..b83bb45 100644 --- a/opensm/opensm/osm_perfmgr.c +++ b/opensm/opensm/osm_perfmgr.c @@ -375,9 +375,8 @@ __osm_perfmgr_query_counters(cl_map_item_t * const p_map_item, void *context ) OSM_LOG_ENTER( pm->log, __osm_pm_query_counters ); cl_plock_acquire(pm->lock); - node = (osm_node_t *)cl_qmap_get(&(pm->subn->node_guid_tbl), - cl_hton64(mon_node->guid)); - if (node == (osm_node_t *)cl_qmap_end(&(pm->subn->node_guid_tbl))) { + node = osm_get_node_by_guid(pm->subn, cl_hton64(mon_node->guid)); + if (!node) { osm_log(pm->log, OSM_LOG_ERROR, "__osm_pm_query_counters: ERR 4C07: Node guid 0x%" PRIx64 " no longer exists so removing from PerfMgr monitoring\n", mon_node->guid); @@ -654,8 +653,7 @@ osm_perfmgr_check_overflow(osm_perfmgr_t *pm, uint64_t node_guid, osm_node_t *p_node = NULL; ib_net16_t lid = 0; cl_plock_acquire(pm->lock); - p_node = (osm_node_t *)cl_qmap_get(&(pm->subn->node_guid_tbl), - cl_hton64(node_guid)); + p_node = osm_get_node_by_guid(pm->subn, cl_hton64(node_guid)); lid = get_lid(p_node, port); cl_plock_release(pm->lock); if (lid == 0) diff --git a/opensm/opensm/osm_pkey_rcv.c b/opensm/opensm/osm_pkey_rcv.c index 67fe067..fae6dd3 100644 --- a/opensm/opensm/osm_pkey_rcv.c +++ b/opensm/opensm/osm_pkey_rcv.c @@ -113,7 +113,6 @@ osm_pkey_rcv_process( { osm_pkey_rcv_t *p_rcv = context; osm_madw_t *p_madw = data; - cl_qmap_t *p_guid_tbl; ib_pkey_table_t *p_pkey_tbl; ib_smp_t *p_smp; osm_port_t *p_port; @@ -141,11 +140,9 @@ osm_pkey_rcv_process( CL_ASSERT( p_smp->attr_id == IB_MAD_ATTR_P_KEY_TABLE ); - p_guid_tbl = &p_rcv->p_subn->port_guid_tbl; cl_plock_excl_acquire( p_rcv->p_lock ); - p_port = (osm_port_t*)cl_qmap_get( p_guid_tbl, port_guid ); - - if( p_port == (osm_port_t*)cl_qmap_end( p_guid_tbl) ) + p_port = osm_get_port_by_guid( p_rcv->p_subn, port_guid ); + if( !p_port ) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, "osm_pkey_rcv_process: ERR 4806: " diff --git a/opensm/opensm/osm_port.c b/opensm/opensm/osm_port.c index f092334..97e6031 100644 --- a/opensm/opensm/osm_port.c +++ b/opensm/opensm/osm_port.c @@ -686,7 +686,6 @@ osm_physp_replace_dr_path_with_alternate_dr_path( osm_dr_path_t * p_dr_path; cl_list_t *p_currPortsList; cl_list_t *p_nextPortsList; - cl_qmap_t const *p_port_tbl; osm_port_t *p_port; osm_physp_t *p_physp, *p_remote_physp; ib_net64_t port_guid; @@ -712,14 +711,12 @@ osm_physp_replace_dr_path_with_alternate_dr_path( cl_list_construct( p_nextPortsList ); cl_list_init( p_nextPortsList, 10 ); - p_port_tbl = &p_subn->port_guid_tbl; port_guid = p_subn->sm_port_guid; CL_ASSERT( port_guid ); - p_port = (osm_port_t*)cl_qmap_get( p_port_tbl, port_guid ); - - if( p_port == (osm_port_t*)cl_qmap_end( p_port_tbl ) ) + p_port = osm_get_port_by_guid( p_subn, port_guid ); + if( !p_port ) { osm_log( p_log, OSM_LOG_ERROR, "osm_physp_replace_dr_path_with_alternate_dr_path: ERR 4105: " diff --git a/opensm/opensm/osm_port_info_rcv.c b/opensm/opensm/osm_port_info_rcv.c index c41f984..7d42297 100644 --- a/opensm/opensm/osm_port_info_rcv.c +++ b/opensm/opensm/osm_port_info_rcv.c @@ -627,7 +627,6 @@ osm_pi_rcv_process( { osm_pi_rcv_t *p_rcv = context; osm_madw_t *p_madw = data; - cl_qmap_t *p_guid_tbl; ib_port_info_t *p_pi; ib_smp_t *p_smp; osm_port_t *p_port; @@ -689,11 +688,9 @@ osm_pi_rcv_process( goto Exit; } - p_guid_tbl = &p_rcv->p_subn->port_guid_tbl; CL_PLOCK_EXCL_ACQUIRE( p_rcv->p_lock ); - p_port = (osm_port_t*)cl_qmap_get( p_guid_tbl, port_guid ); - - if( p_port == (osm_port_t*)cl_qmap_end( p_guid_tbl) ) + p_port = osm_get_port_by_guid(p_rcv->p_subn, port_guid); + if (!p_port) { CL_PLOCK_RELEASE( p_rcv->p_lock ); osm_log( p_rcv->p_log, OSM_LOG_ERROR, diff --git a/opensm/opensm/osm_prtn.c b/opensm/opensm/osm_prtn.c index 027a5a4..ebf5889 100644 --- a/opensm/opensm/osm_prtn.c +++ b/opensm/opensm/osm_prtn.c @@ -105,14 +105,13 @@ void osm_prtn_delete( ib_api_status_t osm_prtn_add_port(osm_log_t *p_log, osm_subn_t *p_subn, osm_prtn_t *p, ib_net64_t guid, boolean_t full) { - cl_qmap_t *p_port_tbl = &p_subn->port_guid_tbl; ib_api_status_t status = IB_SUCCESS; cl_map_t *p_tbl; osm_port_t *p_port; osm_physp_t *p_physp; - p_port = (osm_port_t *)cl_qmap_get(p_port_tbl, guid); - if (!p_port || p_port == (osm_port_t *)cl_qmap_end(p_port_tbl)) { + p_port = osm_get_port_by_guid(p_subn, guid); + if (!p_port) { osm_log(p_log, OSM_LOG_VERBOSE, "osm_prtn_add_port: " "port 0x%" PRIx64 " not found\n", cl_ntoh64(guid)); diff --git a/opensm/opensm/osm_sa_lft_record.c b/opensm/opensm/osm_sa_lft_record.c index c5cd9ca..4943632 100644 --- a/opensm/opensm/osm_sa_lft_record.c +++ b/opensm/opensm/osm_sa_lft_record.c @@ -194,9 +194,8 @@ __osm_lftr_get_port_by_guid( CL_PLOCK_ACQUIRE(p_rcv->p_lock); - p_port = (osm_port_t *)cl_qmap_get(&p_rcv->p_subn->port_guid_tbl, - port_guid); - if (p_port == (osm_port_t *)cl_qmap_end(&p_rcv->p_subn->port_guid_tbl)) + p_port = osm_get_port_by_guid(p_rcv->p_subn, port_guid); + if (!p_port) { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_lftr_get_port_by_guid ERR 4404: " diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c index 90fe103..82aa0db 100644 --- a/opensm/opensm/osm_sa_mcmember_record.c +++ b/opensm/opensm/osm_sa_mcmember_record.c @@ -1554,10 +1554,8 @@ __osm_mcmr_rcv_join_mgrp( CL_PLOCK_EXCL_ACQUIRE(p_rcv->p_lock); /* make sure the requested port guid is known to the SM */ - p_port = (osm_port_t *)cl_qmap_get(&p_rcv->p_subn->port_guid_tbl, - portguid); - - if (p_port == (osm_port_t *)cl_qmap_end(&p_rcv->p_subn->port_guid_tbl)) + p_port = osm_get_port_by_guid(p_rcv->p_subn, portguid); + if (!p_port) { CL_PLOCK_RELEASE( p_rcv->p_lock ); diff --git a/opensm/opensm/osm_sa_mft_record.c b/opensm/opensm/osm_sa_mft_record.c index 7908583..c70cd65 100644 --- a/opensm/opensm/osm_sa_mft_record.c +++ b/opensm/opensm/osm_sa_mft_record.c @@ -198,15 +198,13 @@ __osm_mftr_get_port_by_guid( CL_PLOCK_ACQUIRE(p_rcv->p_lock); - p_port = (osm_port_t *)cl_qmap_get(&p_rcv->p_subn->port_guid_tbl, - port_guid); - if (p_port == (osm_port_t *)cl_qmap_end(&p_rcv->p_subn->port_guid_tbl)) + p_port = osm_get_port_by_guid(p_rcv->p_subn, port_guid); + if (!p_port) { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_mftr_get_port_by_guid ERR 4A04: " "Invalid port GUID 0x%016" PRIx64 "\n", port_guid ); - p_port = NULL; } CL_PLOCK_RELEASE(p_rcv->p_lock); diff --git a/opensm/opensm/osm_sa_multipath_record.c b/opensm/opensm/osm_sa_multipath_record.c index 06640d9..27b840d 100644 --- a/opensm/opensm/osm_sa_multipath_record.c +++ b/opensm/opensm/osm_sa_multipath_record.c @@ -1195,10 +1195,9 @@ __osm_mpr_rcv_get_gids( } } - p_port = (osm_port_t *)cl_qmap_get( &p_rcv->p_subn->port_guid_tbl, - gids->unicast.interface_id ); - if ( !p_port || - p_port == (osm_port_t *)cl_qmap_end( &p_rcv->p_subn->port_guid_tbl ) ) { + p_port = osm_get_port_by_guid(p_rcv->p_subn, gids->unicast.interface_id); + if ( !p_port ) + { /* This 'error' is the client's fault (bad gid) so don't enter it as an error in our own log. diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c index 47d9c33..56be25f 100644 --- a/opensm/opensm/osm_sa_path_record.c +++ b/opensm/opensm/osm_sa_path_record.c @@ -1214,12 +1214,9 @@ __osm_pr_rcv_get_end_points( } } - *pp_src_port = (osm_port_t*)cl_qmap_get( - &p_rcv->p_subn->port_guid_tbl, - p_pr->sgid.unicast.interface_id ); - - if( *pp_src_port == (osm_port_t*)cl_qmap_end( - &p_rcv->p_subn->port_guid_tbl ) ) + *pp_src_port = osm_get_port_by_guid(p_rcv->p_subn, + p_pr->sgid.unicast.interface_id ); + if( !*pp_src_port ) { /* This 'error' is the client's fault (bad gid) so @@ -1304,12 +1301,8 @@ __osm_pr_rcv_get_end_points( } } - *pp_dest_port = (osm_port_t*)cl_qmap_get( - &p_rcv->p_subn->port_guid_tbl, - dest_guid ); - - if( *pp_dest_port == (osm_port_t*)cl_qmap_end( - &p_rcv->p_subn->port_guid_tbl ) ) + *pp_dest_port = osm_get_port_by_guid(p_rcv->p_subn, dest_guid); + if( !*pp_dest_port ) { /* This 'error' is the client's fault (bad gid) so diff --git a/opensm/opensm/osm_sa_service_record.c b/opensm/opensm/osm_sa_service_record.c index c0f1057..3f32bd5 100644 --- a/opensm/opensm/osm_sa_service_record.c +++ b/opensm/opensm/osm_sa_service_record.c @@ -200,8 +200,8 @@ __match_service_pkey_with_ports_pkey( if((comp_mask & IB_SR_COMPMASK_SGID) == IB_SR_COMPMASK_SGID) { service_guid = p_service_rec->service_gid.unicast.interface_id; - service_port = (osm_port_t*)cl_qmap_get( &p_rcv->p_subn->port_guid_tbl, service_guid ); - if (service_port == (osm_port_t*)cl_qmap_end( &p_rcv->p_subn->port_guid_tbl )) + service_port = osm_get_port_by_guid(p_rcv->p_subn, service_guid); + if (!service_port) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, "__match_service_pkey_with_ports_pkey: ERR 2405: " diff --git a/opensm/opensm/osm_sa_sw_info_record.c b/opensm/opensm/osm_sa_sw_info_record.c index 94b1ff9..129eeff 100644 --- a/opensm/opensm/osm_sa_sw_info_record.c +++ b/opensm/opensm/osm_sa_sw_info_record.c @@ -187,9 +187,8 @@ __osm_sir_get_port_by_guid( CL_PLOCK_ACQUIRE(p_rcv->p_lock); - p_port = (osm_port_t *)cl_qmap_get(&p_rcv->p_subn->port_guid_tbl, - port_guid); - if (p_port == (osm_port_t *)cl_qmap_end(&p_rcv->p_subn->port_guid_tbl)) + p_port = osm_get_port_by_guid(p_rcv->p_subn, port_guid); + if (!p_port) { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_sir_get_port_by_guid ERR 5309: " diff --git a/opensm/opensm/osm_slvl_map_rcv.c b/opensm/opensm/osm_slvl_map_rcv.c index 3352627..d601456 100644 --- a/opensm/opensm/osm_slvl_map_rcv.c +++ b/opensm/opensm/osm_slvl_map_rcv.c @@ -126,7 +126,6 @@ osm_slvl_rcv_process( { osm_slvl_rcv_t *p_rcv = context; osm_madw_t *p_madw = p_data; - cl_qmap_t *p_guid_tbl; ib_slvl_table_t *p_slvl_tbl; ib_smp_t *p_smp; osm_port_t *p_port; @@ -152,11 +151,10 @@ osm_slvl_rcv_process( CL_ASSERT( p_smp->attr_id == IB_MAD_ATTR_SLVL_TABLE ); - p_guid_tbl = &p_rcv->p_subn->port_guid_tbl; cl_plock_excl_acquire( p_rcv->p_lock ); - p_port = (osm_port_t*)cl_qmap_get( p_guid_tbl, port_guid ); + p_port = osm_get_port_by_guid( p_rcv->p_subn, port_guid ); - if( p_port == (osm_port_t*)cl_qmap_end( p_guid_tbl) ) + if( !p_port ) { cl_plock_release( p_rcv->p_lock ); osm_log( p_rcv->p_log, OSM_LOG_ERROR, diff --git a/opensm/opensm/osm_sm.c b/opensm/opensm/osm_sm.c index dfe01a4..57851e6 100644 --- a/opensm/opensm/osm_sm.c +++ b/opensm/opensm/osm_sm.c @@ -637,10 +637,8 @@ osm_sm_mcgrp_join( * Acquire the port object for the port joining this group. */ CL_PLOCK_EXCL_ACQUIRE( p_sm->p_lock ); - p_port = ( osm_port_t * ) cl_qmap_get( &p_sm->p_subn->port_guid_tbl, - port_guid ); - if( p_port == - ( osm_port_t * ) cl_qmap_end( &p_sm->p_subn->port_guid_tbl ) ) + p_port = osm_get_port_by_guid( p_sm->p_subn, port_guid ); + if( !p_port ) { CL_PLOCK_RELEASE( p_sm->p_lock ); osm_log( p_sm->p_log, OSM_LOG_ERROR, @@ -761,10 +759,8 @@ osm_sm_mcgrp_leave( */ /* note: p_sm->p_lock is locked by caller, but will be released later this function */ - p_port = ( osm_port_t * ) cl_qmap_get( &p_sm->p_subn->port_guid_tbl, - port_guid ); - if( p_port == - ( osm_port_t * ) cl_qmap_end( &p_sm->p_subn->port_guid_tbl ) ) + p_port = osm_get_port_by_guid( p_sm->p_subn, port_guid ); + if( !p_port ) { CL_PLOCK_RELEASE( p_sm->p_lock ); osm_log( p_sm->p_log, OSM_LOG_ERROR, diff --git a/opensm/opensm/osm_sm_state_mgr.c b/opensm/opensm/osm_sm_state_mgr.c index ccfb8b0..a39ba4c 100644 --- a/opensm/opensm/osm_sm_state_mgr.c +++ b/opensm/opensm/osm_sm_state_mgr.c @@ -168,10 +168,8 @@ __osm_sm_state_mgr_send_local_port_info_req( * update the master_sm_base_lid of the subnet. */ memset( &context, 0, sizeof( context ) ); - p_port = ( osm_port_t * ) cl_qmap_get( &p_sm_mgr->p_subn->port_guid_tbl, - port_guid ); - if( p_port == - ( osm_port_t * ) cl_qmap_end( &p_sm_mgr->p_subn->port_guid_tbl ) ) + p_port = osm_get_port_by_guid(p_sm_mgr->p_subn, port_guid ); + if( !p_port ) { osm_log( p_sm_mgr->p_log, OSM_LOG_ERROR, "__osm_sm_state_mgr_send_local_port_info_req: ERR 3205: " @@ -231,13 +229,7 @@ __osm_sm_state_mgr_send_master_sm_info_req( * SM (according to master_guid) * Send a query of SubnGet(SMInfo) to the subn master_sm_base_lid object. */ - p_port = ( osm_port_t * ) cl_qmap_get( &p_sm_mgr->p_subn->port_guid_tbl, - p_sm_mgr->master_guid ); - if( p_port == - ( osm_port_t * ) cl_qmap_end( &p_sm_mgr->p_subn->port_guid_tbl ) ) - { - p_port = NULL; - } + p_port = osm_get_port_by_guid(p_sm_mgr->p_subn, p_sm_mgr->master_guid); } else { diff --git a/opensm/opensm/osm_sminfo_rcv.c b/opensm/opensm/osm_sminfo_rcv.c index 2be56a5..1489aa3 100644 --- a/opensm/opensm/osm_sminfo_rcv.c +++ b/opensm/opensm/osm_sminfo_rcv.c @@ -562,7 +562,6 @@ __osm_sminfo_rcv_process_get_response( const ib_smp_t* p_smp; const ib_sm_info_t* p_smi; cl_qmap_t* p_sm_tbl; - cl_qmap_t* p_port_tbl; osm_port_t* p_port; ib_net64_t port_guid; osm_remote_sm_t* p_sm; @@ -585,7 +584,6 @@ __osm_sminfo_rcv_process_get_response( p_smi = ib_smp_get_payload_ptr( p_smp ); p_sm_tbl = &p_rcv->p_subn->sm_guid_tbl; - p_port_tbl = &p_rcv->p_subn->port_guid_tbl; port_guid = p_smi->guid; osm_dump_sm_info( p_rcv->p_log, p_smi, OSM_LOG_DEBUG ); @@ -611,8 +609,8 @@ __osm_sminfo_rcv_process_get_response( */ CL_PLOCK_EXCL_ACQUIRE( p_rcv->p_lock ); - p_port = (osm_port_t*)cl_qmap_get( p_port_tbl, port_guid ); - if( p_port == (osm_port_t*)cl_qmap_end( p_port_tbl ) ) + p_port = osm_get_port_by_guid( p_rcv->p_subn, port_guid ); + if( !p_port ) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, "__osm_sminfo_rcv_process_get_response: ERR 2F12: " diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index 7cf9d20..43317e5 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -811,7 +811,6 @@ __osm_state_mgr_is_sm_port_down( ib_net64_t port_guid; osm_port_t *p_port; osm_physp_t *p_physp; - cl_qmap_t *p_tbl; uint8_t state; OSM_LOG_ENTER( p_mgr->p_log, __osm_state_mgr_is_sm_port_down ); @@ -830,13 +829,11 @@ __osm_state_mgr_is_sm_port_down( goto Exit; } - p_tbl = &p_mgr->p_subn->port_guid_tbl; - CL_ASSERT( port_guid ); CL_PLOCK_ACQUIRE( p_mgr->p_lock ); - p_port = ( osm_port_t * ) cl_qmap_get( p_tbl, port_guid ); - if( p_port == ( osm_port_t * ) cl_qmap_end( p_tbl ) ) + p_port = osm_get_port_by_guid( p_mgr->p_subn, port_guid ); + if( !p_port ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, "__osm_state_mgr_is_sm_port_down: ERR 3309: " @@ -879,7 +876,6 @@ __osm_state_mgr_sweep_hop_1( osm_dr_path_t hop_1_path; ib_net64_t port_guid; uint8_t port_num; - cl_qmap_t *p_port_tbl; uint8_t path_array[IB_SUBNET_PATH_HOPS_MAX]; uint8_t num_ports; osm_physp_t *p_ext_physp; @@ -889,7 +885,6 @@ __osm_state_mgr_sweep_hop_1( /* * First, get our own port and node objects. */ - p_port_tbl = &p_mgr->p_subn->port_guid_tbl; port_guid = p_mgr->p_subn->sm_port_guid; CL_ASSERT( port_guid ); @@ -902,8 +897,8 @@ __osm_state_mgr_sweep_hop_1( * continue through the switch. */ p_mgr->p_subn->in_sweep_hop_0 = FALSE; - p_port = ( osm_port_t * ) cl_qmap_get( p_port_tbl, port_guid ); - if( p_port == ( osm_port_t * ) cl_qmap_end( p_port_tbl ) ) + p_port = osm_get_port_by_guid( p_mgr->p_subn, port_guid ); + if( !p_port ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, "__osm_state_mgr_sweep_hop_1: ERR 3310: " diff --git a/opensm/opensm/osm_sw_info_rcv.c b/opensm/opensm/osm_sw_info_rcv.c index 0043ac4..563c126 100644 --- a/opensm/opensm/osm_sw_info_rcv.c +++ b/opensm/opensm/osm_sw_info_rcv.c @@ -586,7 +586,6 @@ osm_si_rcv_process( { osm_si_rcv_t *p_rcv = context; osm_madw_t *p_madw = data; - cl_qmap_t *p_node_guid_tbl; ib_switch_info_t *p_si; ib_smp_t *p_smp; osm_node_t *p_node; @@ -599,7 +598,6 @@ osm_si_rcv_process( CL_ASSERT( p_madw ); - p_node_guid_tbl = &p_rcv->p_subn->node_guid_tbl; p_smp = osm_madw_get_smp_ptr( p_madw ); p_si = (ib_switch_info_t*)ib_smp_get_payload_ptr( p_smp ); @@ -623,8 +621,8 @@ osm_si_rcv_process( CL_PLOCK_EXCL_ACQUIRE( p_rcv->p_lock ); - p_node = (osm_node_t*)cl_qmap_get( p_node_guid_tbl, node_guid ); - if( p_node == (osm_node_t*)cl_qmap_end( p_node_guid_tbl ) ) + p_node = osm_get_node_by_guid(p_rcv->p_subn, node_guid); + if( !p_node ) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, "osm_si_rcv_process: ERR 3606: " diff --git a/opensm/opensm/osm_ucast_file.c b/opensm/opensm/osm_ucast_file.c index 5d9ba01..97be7ea 100644 --- a/opensm/opensm/osm_ucast_file.c +++ b/opensm/opensm/osm_ucast_file.c @@ -63,9 +63,8 @@ static uint16_t remap_lid(osm_opensm_t *p_osm, uint16_t lid, ib_net64_t guid) uint16_t min_lid, max_lid; uint8_t lmc; - p_port = (osm_port_t *)cl_qmap_get(&p_osm->subn.port_guid_tbl, guid); - if (!p_port || - p_port == (osm_port_t *)cl_qmap_end(&p_osm->subn.port_guid_tbl)) { + p_port = osm_get_port_by_guid(&p_osm->subn, guid); + if (!p_port) { osm_log(&p_osm->log, OSM_LOG_VERBOSE, "remap_lid: cannot find port guid 0x%016" PRIx64 " , will use the same lid\n", cl_ntoh64(guid)); diff --git a/opensm/opensm/osm_vl_arb_rcv.c b/opensm/opensm/osm_vl_arb_rcv.c index f36751e..95f7e7d 100644 --- a/opensm/opensm/osm_vl_arb_rcv.c +++ b/opensm/opensm/osm_vl_arb_rcv.c @@ -126,7 +126,6 @@ osm_vla_rcv_process( { osm_vla_rcv_t *p_rcv = context; osm_madw_t *p_madw = data; - cl_qmap_t *p_guid_tbl; ib_vl_arb_table_t *p_vla_tbl; ib_smp_t *p_smp; osm_port_t *p_port; @@ -153,11 +152,9 @@ osm_vla_rcv_process( CL_ASSERT( p_smp->attr_id == IB_MAD_ATTR_VL_ARBITRATION ); - p_guid_tbl = &p_rcv->p_subn->port_guid_tbl; cl_plock_excl_acquire( p_rcv->p_lock ); - p_port = (osm_port_t*)cl_qmap_get( p_guid_tbl, port_guid ); - - if( p_port == (osm_port_t*)cl_qmap_end( p_guid_tbl ) ) + p_port = osm_get_port_by_guid( p_rcv->p_subn, port_guid ); + if( !p_port ) { cl_plock_release( p_rcv->p_lock ); osm_log( p_rcv->p_log, OSM_LOG_ERROR, -- 1.5.2.2.603.g7c851 From mst at dev.mellanox.co.il Sat Jun 30 15:05:30 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 1 Jul 2007 01:05:30 +0300 Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: References: <20070625130604.GH15343@mellanox.co.il> <20070626070641.GM15343@mellanox.co.il> Message-ID: <20070630220530.GB7554@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH RFC] sharing userspace IB objects > > > This is not directly related to SRC: this is an effort > > to make it possible to share QPs, CQ etc across processes > > in the same way as they can be currently shared across threads. > > So assuming that we want multiple processes to post to > > the same QP, how can we support this? > > This looks like a lot of work for an unknown gain. Who is going to > really use this? ie is it worth the trouble? I think Dror is the best person to answer this. Dror, could you please explain the need for shared send queue? -- MST From mst at dev.mellanox.co.il Sat Jun 30 15:06:57 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 1 Jul 2007 01:06:57 +0300 Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: References: <20070625130604.GH15343@mellanox.co.il> <20070626070641.GM15343@mellanox.co.il> Message-ID: <20070630220657.GC7554@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH RFC] sharing userspace IB objects > > > This is not directly related to SRC: this is an effort > > to make it possible to share QPs, CQ etc across processes > > in the same way as they can be currently shared across threads. > > So assuming that we want multiple processes to post to > > the same QP, how can we support this? > > This looks like a lot of work for an unknown gain. Who is going to > really use this? ie is it worth the trouble? I think Dror is the best person to answer this. Dror, could you please explain the need for shared send queue? -- MST From mst at dev.mellanox.co.il Sat Jun 30 15:08:01 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 1 Jul 2007 01:08:01 +0300 Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: <532b813a0706271628s70e17b6cv70b81fdedc442743@mail.gmail.com> References: <20070625130604.GH15343@mellanox.co.il> <20070626070641.GM15343@mellanox.co.il> <532b813a0706271628s70e17b6cv70b81fdedc442743@mail.gmail.com> Message-ID: <20070630220801.GD7554@mellanox.co.il> > Shouldn't the protocol to create and destroy and pass the various > IB objects around be decided by the specific application rather than > the library trying to solve this problem? Yes, I agree. -- MST From mst at dev.mellanox.co.il Sat Jun 30 15:24:19 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 1 Jul 2007 01:24:19 +0300 Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects In-Reply-To: References: <20070625130604.GH15343@mellanox.co.il> <20070626070641.GM15343@mellanox.co.il> Message-ID: <20070630222419.GE7554@mellanox.co.il> > Quoting Roland Dreier : > Subject: Re: [PATCH RFC] sharing userspace IB objects > > > This is not directly related to SRC: this is an effort > > to make it possible to share QPs, CQ etc across processes > > in the same way as they can be currently shared across threads. > > So assuming that we want multiple processes to post to > > the same QP, how can we support this? > > This looks like a lot of work for an unknown gain. Who is going to > really use this? ie is it worth the trouble? It's a valid question. But let's discuss this separately. Below are my ideas about the implementation questions that you raise. > > > - Given that everything shared is in shared memory, > > > > I think we should try and keep shared memory usage to minimum. > > For example, in mthca mr object just needs a key: we could > > keep it in non-shared memory, just pass the key around > > and save on sahred memory usage. > > This comment made me realize there are a few more problems here. What > happens if I do ibv_reg_mr() in one process, pass the MR to another > process, and then do ibv_dereg_mr() in the second process? Generally, I think it would be nice if this could work in the same way as with multiple threads: a single process does destroy, the rest must not use the same object after this, synchronisation it up to the app. But you made me realise that we need an API for non-controlling processes to release the userspace resources without destroying the kernel-level object. > What about > if someone registers a region in shared memory -- are there any > fork/copy-on-write issues with that? This can be done already, can't it? > I think there are probably bugs > in the locked_vm accounting in the kernel right now -- it doesn't take > into account the possibility of passing context fds from one process > to another. Hmm, might be a good idea to fix the bugs anyway, no? > In general what do you think the rules for destroying objects should > be? What if process A creates a QP, passes it to process B, and then > process A dies? Should the QP still be usable? Yes, I think it should - we get this for free since file won't be closed until both die, right? > Should process B be > able to destroy it? What if process A is still alive -- should > process B be able to destroy the QP? I think in practice a single process will do this. My approach generally is: let's have same rules as for multiple threads. > > We need to share file descriptors too. Is there a way to pass these > > around besides unix domain sockets? > > I guess we need this to be able to re-mmap doorbell pages etc, right? > I wonder if there's a better way around that... maybe extending the > kernel interface so that unrelated processes can share a context, eg > by putting contexts in a filesystem or something like that. Hmm, I don't have principal objection, however this would mean we'd have to change kernel-user interface again. the proposed API extensions can mostly be done in userspace only. And it seems to me like much more work that just let the app use unix domain sockets, for me. What are the advantages of this approach? Further, since there is already an existing kernel interface for this, should we be inventing our own? -- MST From mst at dev.mellanox.co.il Sat Jun 30 23:09:54 2007 From: mst at dev.mellanox.co.il (Michael S. Tsirkin) Date: Sun, 1 Jul 2007 09:09:54 +0300 Subject: [ofa-general] Re: [GIT PULL] please pull rdma-dev.git for 2.6.23 In-Reply-To: <000801c7b9e2$03dfe220$3c98070a@amr.corp.intel.com> References: <000801c7b9e2$03dfe220$3c98070a@amr.corp.intel.com> Message-ID: <20070701060953.GG7554@mellanox.co.il> > ib/cm: include HCA ACK delay in local ACK timeout I have not seen this and archive search does not give me anything > IB/sa: Add InformInfo/Notice support. > IB/sa: Add local SA path record caching. > All patches have been previously posted except for the last, which is a > one line change. There were several bugs in the local SA patches that you posted originally, and SA cache was enabled by default which we decided was not a good idea. Could the latest revision of the patches to be pulled be posted to list please? -- MST