From frankose at ifi.uio.no  Fri Jun  1 00:34:27 2007
From: frankose at ifi.uio.no (Frank Olaf Sem-Jacobsen)
Date: Fri, 01 Jun 2007 09:34:27 +0200
Subject: [ofa-general] osm_node_get_physp_ptr, port numbers
Message-ID: <465FCC03.5080105@ifi.uio.no>

Hi,

I'm just starting to get my bearings in the opensm code, and there's one
thing I have not been able to figure out yet.  What is the relationship
between the port number parameter given to osm_node_get_physp_ptr and
the actual port number of the switch?  Can I assume that sending for
instance the port number 6 to osm_node_get_physp_ptr will give me the
ports I see as number 6 from without the switch?

Any clarifications are greatly appreciated.
-- 
Frank Olaf Sem-Jacobsen


From vlad at lists.openfabrics.org  Fri Jun  1 02:40:33 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Fri,  1 Jun 2007 02:40:33 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070601-0200 daily build status
Message-ID: <20070601094033.97E7DE60861@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.15
Passed on powerpc with linux-2.6.18
Passed on powerpc with linux-2.6.19
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.14
Passed on ppc64 with linux-2.6.19
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.18
Passed on x86_64 with linux-2.6.16
Passed on ia64 with linux-2.6.15
Passed on ppc64 with linux-2.6.12
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.17
Passed on ia64 with linux-2.6.16
Passed on ppc64 with linux-2.6.14
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on ppc64 with linux-2.6.17
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.14
Passed on ppc64 with linux-2.6.15
Passed on powerpc with linux-2.6.16
Passed on x86_64 with linux-2.6.15
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.18
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.13
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.15
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From halr at voltaire.com  Fri Jun  1 04:00:41 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 01 Jun 2007 07:00:41 -0400
Subject: [ofa-general] osm_node_get_physp_ptr, port numbers
In-Reply-To: <465FCC03.5080105@ifi.uio.no>
References: <465FCC03.5080105@ifi.uio.no>
Message-ID: <1180695638.7116.237109.camel@hal.voltaire.com>

Hi Frank,

On Fri, 2007-06-01 at 03:34, Frank Olaf Sem-Jacobsen wrote:
> Hi,
> 
> I'm just starting to get my bearings in the opensm code, and there's one
> thing I have not been able to figure out yet.  What is the relationship
> between the port number parameter given to osm_node_get_physp_ptr and
> the actual port number of the switch?  Can I assume that sending for
> instance the port number 6 to osm_node_get_physp_ptr will give me the
> ports I see as number 6 from without the switch?
> 
> Any clarifications are greatly appreciated.

The port numbers are relative to the individual switch chips as each
switch chip is a separate IB switch node.

If you have a switch box with a single switch chip (generally 24 or 8
port switches), then the mapping is 1:1 between the two.

This is not the case with chassis based switches which have numerous
switch chips on separate board where some ports go external and others
are internal. There is a grouping function in ibnetdiscover which shows
the external ports for the some chassis based switches but this mapping
is not yet supported in OpenSM.

What switch(es) are you using ?

-- Hal


From halr at voltaire.com  Fri Jun  1 08:14:55 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 01 Jun 2007 11:14:55 -0400
Subject: [ofa-general] Re: [PATCH] opensm/sminfo: mutex cleanup fix
In-Reply-To: <20070531223341.GA23029@sashak.voltaire.com>
References: <20070531204524.GX13193@sashak.voltaire.com>
	<20070531223341.GA23029@sashak.voltaire.com>
Message-ID: <1180710889.7116.253133.camel@hal.voltaire.com>

On Thu, 2007-05-31 at 18:33, Sasha Khapyorsky wrote:
> This fixes mutex cleanups in SMInfo processor.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Thanks. Applied.

-- Hal


From swise at opengridcomputing.com  Fri Jun  1 09:00:46 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 01 Jun 2007 11:00:46 -0500
Subject: [ofa-general] problem with mvapich2 over iwarp
Message-ID: <466042AE.4000006@opengridcomputing.com>

Sundeep/Sean,

I'm helping a customer who is trying to run mvapich2 over chelsio's 
rnic.  They're running a simple program that does an mpi init, 1000 
barriers, then a finalize.  They're using ofed-1.2-rc3, mpiexec-0.82, 
and mvapich2-0.9.8-p2 (not the mvapich2 from the ofed kit).  Also they 
aren't using mpd to start up stuff.  They're using pmi I guess (I'm not 
sure what pmi is, but the mpiexec has -comm=pmi.  BTW: I can run the 
same program fine on my 8 node cluster using mpd and the ofa mvapich2 code.

On their cluster a 4 node/4 process job hangs in finalize almost always. 
  When it hangs, one process is always stuck in rdma_destroy_id().

Here's the stack:

(gdb) bt
#0 0x0000003c7cf0ae2b in __lll_mutex_lock_wait () from
/lib64/tls/libpthread.so.0
#1 0x000000000068db20 in ?? ()
#2 0x0000000060040a0a in ?? ()
#3 0x0000003c7cf08800 in pthread_cond_destroy@@GLIBC_2.3.2 () from
/lib64/tls/libpthread.so.0
#4 0x0000002a9579a09c in ucma_destroy_kern_id (fd=0, handle=6871424) at
src/cma.c:403
#5 0x0000002a9579a163 in rdma_destroy_id (id=0x68d980) at src/cma.c:425
#6 0x0000000000423ef9 in ib_finalize_rdma_cm ()
#7 0x00000000004183f6 in MPIDI_CH3I_CM_Finalize ()
#8 0x000000000044b03b in MPIDI_CH3_Finalize ()
#9 0x000000000043169e in MPID_Finalize ()
#10 0x000000000040c3ef in PMPI_Finalize ()
#11 0x0000000000403af4 in main ()
(gdb)

I'm not sure I belive this stack trace fully, because 
ucm_destroy_kern_id() doesn't call pthread_cond_destroy().  However 
rdma_destroy_id() does.  So I'm thinking that ucma_destroy_id() has 
already been executed and rdma_destroy_id() is freeing the cm_id and we 
get stuck in pthread_cond_destroy() destroying the pthread condition object.

I'm wondering if ya'll have ever seen this kind of hang?  I can kill the 
    process and it exits, so I don't think we're stuck down in the 
kernel IWCM or anything.

Any thoughts?

Thanks,

Steve.


From sean.hefty at intel.com  Fri Jun  1 09:17:23 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 1 Jun 2007 09:17:23 -0700
Subject: [ofa-general] RE: problem with mvapich2 over iwarp
In-Reply-To: <466042AE.4000006@opengridcomputing.com>
Message-ID: <000401c7a468$56837bc0$ff0da8c0@amr.corp.intel.com>

>(gdb) bt
>#0 0x0000003c7cf0ae2b in __lll_mutex_lock_wait () from
>/lib64/tls/libpthread.so.0
>#1 0x000000000068db20 in ?? ()
>#2 0x0000000060040a0a in ?? ()
>#3 0x0000003c7cf08800 in pthread_cond_destroy@@GLIBC_2.3.2 () from
>/lib64/tls/libpthread.so.0
>#4 0x0000002a9579a09c in ucma_destroy_kern_id (fd=0, handle=6871424) at
>src/cma.c:403
>#5 0x0000002a9579a163 in rdma_destroy_id (id=0x68d980) at src/cma.c:425
>#6 0x0000000000423ef9 in ib_finalize_rdma_cm ()
>#7 0x00000000004183f6 in MPIDI_CH3I_CM_Finalize ()
>#8 0x000000000044b03b in MPIDI_CH3_Finalize ()
>#9 0x000000000043169e in MPID_Finalize ()
>#10 0x000000000040c3ef in PMPI_Finalize ()
>#11 0x0000000000403af4 in main ()
>(gdb)
>
>I'm not sure I belive this stack trace fully, because
>ucm_destroy_kern_id() doesn't call pthread_cond_destroy().  However
>rdma_destroy_id() does.  So I'm thinking that ucma_destroy_id() has
>already been executed and rdma_destroy_id() is freeing the cm_id and we
>get stuck in pthread_cond_destroy() destroying the pthread condition object.
>
>I'm wondering if ya'll have ever seen this kind of hang?  I can kill the
>    process and it exits, so I don't think we're stuck down in the
>kernel IWCM or anything.
>
>Any thoughts?

I haven't seen any hangs like this, but I will perform a code inspection to see
if any issues can be found.

- Sean


From narravul at cse.ohio-state.edu  Fri Jun  1 10:05:32 2007
From: narravul at cse.ohio-state.edu (Sundeep Narravula)
Date: Fri, 1 Jun 2007 13:05:32 -0400 (EDT)
Subject: [ofa-general] Re: problem with mvapich2 over iwarp
In-Reply-To: <466042AE.4000006@opengridcomputing.com>
Message-ID: <Pine.GSO.4.40.0706011253500.7162-100000@kappa.cse.ohio-state.edu>


Steve,
  We have not seen this hang before. Not sure what is happening at this
point. I will try to see through the code for this behavior.

btw, mvapich2-0.9.8-p2 and the ofa mvapich2 code are identical at this
point.

  --Sundeep.

On Fri, 1 Jun 2007, Steve Wise wrote:

> Sundeep/Sean,
>
> I'm helping a customer who is trying to run mvapich2 over chelsio's
> rnic.  They're running a simple program that does an mpi init, 1000
> barriers, then a finalize.  They're using ofed-1.2-rc3, mpiexec-0.82,
> and mvapich2-0.9.8-p2 (not the mvapich2 from the ofed kit).  Also they
> aren't using mpd to start up stuff.  They're using pmi I guess (I'm not
> sure what pmi is, but the mpiexec has -comm=pmi.  BTW: I can run the
> same program fine on my 8 node cluster using mpd and the ofa mvapich2 code.
>
> On their cluster a 4 node/4 process job hangs in finalize almost always.
>   When it hangs, one process is always stuck in rdma_destroy_id().
>
> Here's the stack:
>
> (gdb) bt
> #0 0x0000003c7cf0ae2b in __lll_mutex_lock_wait () from
> /lib64/tls/libpthread.so.0
> #1 0x000000000068db20 in ?? ()
> #2 0x0000000060040a0a in ?? ()
> #3 0x0000003c7cf08800 in pthread_cond_destroy@@GLIBC_2.3.2 () from
> /lib64/tls/libpthread.so.0
> #4 0x0000002a9579a09c in ucma_destroy_kern_id (fd=0, handle=6871424) at
> src/cma.c:403
> #5 0x0000002a9579a163 in rdma_destroy_id (id=0x68d980) at src/cma.c:425
> #6 0x0000000000423ef9 in ib_finalize_rdma_cm ()
> #7 0x00000000004183f6 in MPIDI_CH3I_CM_Finalize ()
> #8 0x000000000044b03b in MPIDI_CH3_Finalize ()
> #9 0x000000000043169e in MPID_Finalize ()
> #10 0x000000000040c3ef in PMPI_Finalize ()
> #11 0x0000000000403af4 in main ()
> (gdb)
>
> I'm not sure I belive this stack trace fully, because
> ucm_destroy_kern_id() doesn't call pthread_cond_destroy().  However
> rdma_destroy_id() does.  So I'm thinking that ucma_destroy_id() has
> already been executed and rdma_destroy_id() is freeing the cm_id and we
> get stuck in pthread_cond_destroy() destroying the pthread condition object.
>
> I'm wondering if ya'll have ever seen this kind of hang?  I can kill the
>     process and it exits, so I don't think we're stuck down in the
> kernel IWCM or anything.
>
> Any thoughts?
>
> Thanks,
>
> Steve.
>


From swise at opengridcomputing.com  Fri Jun  1 10:29:49 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 01 Jun 2007 12:29:49 -0500
Subject: [ofa-general] Re: problem with mvapich2 over iwarp
In-Reply-To: <000401c7a468$56837bc0$ff0da8c0@amr.corp.intel.com>
References: <000401c7a468$56837bc0$ff0da8c0@amr.corp.intel.com>
Message-ID: <4660578D.2030306@opengridcomputing.com>

Sean Hefty wrote:
>> (gdb) bt
>> #0 0x0000003c7cf0ae2b in __lll_mutex_lock_wait () from
>> /lib64/tls/libpthread.so.0
>> #1 0x000000000068db20 in ?? ()
>> #2 0x0000000060040a0a in ?? ()
>> #3 0x0000003c7cf08800 in pthread_cond_destroy@@GLIBC_2.3.2 () from
>> /lib64/tls/libpthread.so.0
>> #4 0x0000002a9579a09c in ucma_destroy_kern_id (fd=0, handle=6871424) at
>> src/cma.c:403
>> #5 0x0000002a9579a163 in rdma_destroy_id (id=0x68d980) at src/cma.c:425
>> #6 0x0000000000423ef9 in ib_finalize_rdma_cm ()
>> #7 0x00000000004183f6 in MPIDI_CH3I_CM_Finalize ()
>> #8 0x000000000044b03b in MPIDI_CH3_Finalize ()
>> #9 0x000000000043169e in MPID_Finalize ()
>> #10 0x000000000040c3ef in PMPI_Finalize ()
>> #11 0x0000000000403af4 in main ()
>> (gdb)
>>
>> I'm not sure I belive this stack trace fully, because
>> ucm_destroy_kern_id() doesn't call pthread_cond_destroy().  However
>> rdma_destroy_id() does.  So I'm thinking that ucma_destroy_id() has
>> already been executed and rdma_destroy_id() is freeing the cm_id and we
>> get stuck in pthread_cond_destroy() destroying the pthread condition object.
>>
>> I'm wondering if ya'll have ever seen this kind of hang?  I can kill the
>>    process and it exits, so I don't think we're stuck down in the
>> kernel IWCM or anything.
>>
>> Any thoughts?
> 
> I haven't seen any hangs like this, but I will perform a code inspection to see
> if any issues can be found.
> 
> - Sean

Thanks,

Perhaps someone is freeing the cond object twice.  That could cause a 
hang...


From hanafim.ctr at asc.hpc.mil  Fri Jun  1 12:38:00 2007
From: hanafim.ctr at asc.hpc.mil (MAHMOUD HANAFI)
Date: Fri, 01 Jun 2007 15:38:00 -0400
Subject: [ofa-general] Need OFED1.1 ib_srp  max_hw_sectors_kb help!
In-Reply-To: <004401c7a482$42a5bbd0$7f01a8c0@ddnereo.datadirectnet.com>
References: <004401c7a482$42a5bbd0$7f01a8c0@ddnereo.datadirectnet.com>
Message-ID: <46607598.9020002@asc.hpc.mil>

Some test data for OFED1.1. I am going to run OFED1.2 and IBGOLD.

* 1 Lun per Tier(8+1) with block size of 4096KB
* Using xdd writing to 4 luns.
* Using 1 IB host port to 1 IB DDN Port.

The 700MB/sec appear to be a host limit. Because using 2 IB Host ports and 2
DDN IB ports IO still max out at 700MB/sec

Note that write level off at 512KB Request Size. I did verified IO Request Lengths on DDN

= WRITE DirectIO==
IO
Size  Throughput
KB      MB/sec
---  ---------
16     46.711
32     93.293
64    185.098
128   348.522
256   547.118
512   671.227
1024  697.149
2048  680.645
4096  692.067
8192  710.564

= READ DirectIO==
IO
Size  Throughput
KB      MB/sec
---  ---------
16     54.856
32    104.526
64    191.592
128   312.586
256   462.460
512   471.877
1024  509.806
2048  535.050
4096  543.130
8192  565.176

Martin W. Schlining III wrote:
> Wonder why it halves the value?  I'll have to try that myself.
> 
> If you are using OFED 1.2, you can also load the ib_srp module with an
> option to increase the size of the scatter gather lists. The default size is
> 12 which is way too small. I don't think this option exists for OFED 1.1. In
> 1.1, you have to modify ib_srp.h and recompile the module ib_srp.o.
> 
> modprobe ib_srp srp_sg_tablesize=256
> 
> Fiber channel drivers also set the max_sect field in their drivers to 65535
> (0xffff) to eliminate any restrictions. Perhaps the same value for SRP will
> help?
> 
> - Martin
> 
> -----Original Message-----
> From: MAHMOUD HANAFI [mailto:hanafim.ctr at asc.hpc.mil] 
> Sent: Friday, June 01, 2007 2:56 PM
> To: Martin W. Schlining III
> Cc: 'MAHMOUD HANAFI'
> Subject: Re: [ofa-general] Need OFED1.1 ib_srp max_hw_sectors_kb help!
> 
> I didn't get a answer on the email list but I figured it out.
> You can pass "max_sect=xxx" option during initialization of the srp traget.
> If you have upgraded to
> OFED1.2 you can set a line in /etc/srp_daemon.conf "A max_sect=2096" (you
> will need the "A") and then run the srp_daemon.sh. Only thing I have noticed
> is what ever you set in the max_sect value it always takes 1/2 of the value.
> 
> 
> 
> -Mahmoud
> 
> Martin W. Schlining III wrote:
>>  Hello,
>>
>> Did you ever get an answer to this? I'm a bit curious myself what the 
>> response was.
>>
>> Regards,
>> Martin Schlining
>> Senior Software Engineer
>> mschlining at datadirectnet.com
>>
>> -----Original Message-----
>> From: general-bounces at lists.openfabrics.org
>> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of MAHMOUD 
>> HANAFI
>> Sent: Tuesday, May 29, 2007 6:58 PM
>> To: general at lists.openfabrics.org
>> Subject: [ofa-general] Need OFED1.1 ib_srp max_hw_sectors_kb help!
>>
>> All,
>>
>> I am using OFED1.1 with CISCO HCA/switch and DDN Storage. I am able to 
>> load and perform IO to the DDN via srp driver. But, the 
>> max_hw_sectors_kb for the device is getting set to 64kb. Any one else 
>> seen this issue? Same host and storage with fiber channel doesn't have 
>> this problem. It set max_hw_sectors_kb correctly to 4096KB.
>>
>> Thanks,
>> --
>> Mahmoud Hanafi
>> Senior System Administrator
>> ASC/MSRC
>> www.asc.hpc.mil
>> 2435 5th Street
>> WPAFB, OHIO 45433
>> (937) 255-1536
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit
>> http://openib.org/mailman/listinfo/openib-general
>>
>>
> 
> --
> Mahmoud Hanafi
> Senior System Administrator
> ASC/MSRC
> www.asc.hpc.mil
> 2435 5th Street
> WPAFB, OHIO 45433
> (937) 255-1536
> 
> 

-- 
Mahmoud Hanafi
Senior System Administrator
ASC/MSRC
www.asc.hpc.mil
2435 5th Street
WPAFB, OHIO 45433
(937) 255-1536


From vlad at lists.openfabrics.org  Sat Jun  2 02:39:52 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Sat,  2 Jun 2007 02:39:52 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070602-0200 daily build status
Message-ID: <20070602093952.475A8E6085F@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.13
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.12
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.17
Passed on ppc64 with linux-2.6.19
Passed on ia64 with linux-2.6.16
Passed on ppc64 with linux-2.6.18
Passed on x86_64 with linux-2.6.12
Passed on powerpc with linux-2.6.13
Passed on x86_64 with linux-2.6.13
Passed on ppc64 with linux-2.6.14
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.17
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.16
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.15
Passed on x86_64 with linux-2.6.21.1
Passed on powerpc with linux-2.6.14
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.20
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.17
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ia64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.9-34.ELsmp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6

Failed:


From ssmallioub at masder.com  Sat Jun  2 04:25:16 2007
From: ssmallioub at masder.com (Elnora Cohen)
Date: Sat, 2 Jun 2007 07:25:16 -0400
Subject: [ofa-general] $269.90 Adobe Suite 3
Message-ID: <74562276.83331947364766@masder.com>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070602/a1397349/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cumulates.png
Type: image/png
Size: 18112 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070602/a1397349/attachment.png>

From sullular4887 at charter.net  Sat Jun  2 17:23:45 2007
From: sullular4887 at charter.net (SONG LILE)
Date: Sat, 2 Jun 2007 17:23:45 -0700
Subject: [ofa-general] CONTACT ME
Message-ID: <1371112581.1180830225622.JavaMail.root@fepweb13>

Good Day,

Please Read.

My name is Mr.Song Lile, i am the director of operations in Hang Seng 
Bank Hong Kong. I have a business proposal in the tune of $19.5m. 
After the successful transfer, we shall share in ratio of 30% 
for you and 70% for me.Should you be interested, please contact me 
through my private email (privacy_song_lile111 at yahoo.com.hk) so we can 
commence all arrangements and I will give you more information on how we 
would handle this project.

Please treat this business with utmost confidentiality and send me the
following.

Full names,Private phone number,Current residential address,
Occupation,Age and Proffession.

Kind Regards,
Mr. Song Lile.


From tduffy_linux at yahoo.com  Sat Jun  2 20:05:39 2007
From: tduffy_linux at yahoo.com (TrueSwitch on behalf of tomduffy@gmail.com)
Date: Sat, 2 Jun 2007 23:05:39 -0400 (EDT)
Subject: [ofa-general] tomduffy@gmail.com has a new Yahoo! Mail address
Message-ID: <19637331.1180839939772.JavaMail.vmail@service1.colo.trueswitch.com>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070602/ab3a4aef/attachment.html>

From eli at mellanox.co.il  Sat Jun  2 23:50:43 2007
From: eli at mellanox.co.il (Eli Cohen)
Date: Sun, 03 Jun 2007 09:50:43 +0300
Subject: [ofa-general] Re: [PATCH] libibverbs/examples: free invalid pointer
In-Reply-To: <adafy5cq4kz.fsf@cisco.com>
References: <1180614624.7053.14.camel@mtls03>  <adafy5cq4kz.fsf@cisco.com>
Message-ID: <1180853473.10841.1.camel@mtls03>

On Thu, 2007-05-31 at 10:35 -0700, Roland Dreier wrote:
> Thanks, but I think I fixed this bug in all the pingpong examples (not
> just srq_pingpong) at the beginning of May.
> 
>  - R.

Thanks. I was looking at an outdated tree...


From vlad at lists.openfabrics.org  Sun Jun  3 02:42:04 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Sun,  3 Jun 2007 02:42:04 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070603-0200 daily build status
Message-ID: <20070603094204.EBB67E60870@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.16
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on ia64 with linux-2.6.15
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.16
Passed on ia64 with linux-2.6.18
Passed on x86_64 with linux-2.6.12
Passed on ia64 with linux-2.6.14
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.15
Passed on ia64 with linux-2.6.16
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.14
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.14
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.13
Passed on ia64 with linux-2.6.17
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.19
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.14
Passed on x86_64 with linux-2.6.21.1
Passed on ppc64 with linux-2.6.15
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.15
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.18
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.13
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.9-34.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on ia64 with linux-2.6.16.21-0.8-default

Failed:


From jackm at dev.mellanox.co.il  Sun Jun  3 06:39:38 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Sun, 3 Jun 2007 16:39:38 +0300
Subject: [ofa-general] [PATCH] libibverbs: initialize qp state to RESET at qp
	creation time
Message-ID: <200706031639.38921.jackm@dev.mellanox.co.il>

Roland, libmlx4 commit af7707cecdfd5ca8a38b4d855070ebfc310a339f
(Initialize send queue entry ownership bits) is broken without
the fix below.  Since qp state is uninitialized, mlx4_qp_init_sq_ownership()
frequently ends up not being invoked.

---

Must initialize qp state to RESET at qp creation time.

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

diff --git a/src/verbs.c b/src/verbs.c
index febf32a..f5cf4d3 100644
--- a/src/verbs.c
+++ b/src/verbs.c
@@ -406,6 +406,7 @@ struct ibv_qp *__ibv_create_qp(struct ibv_pd *pd,
 		qp->recv_cq    	     = qp_init_attr->recv_cq;
 		qp->srq        	     = qp_init_attr->srq;
 		qp->qp_type          = qp_init_attr->qp_type;
+		qp->state	     = IBV_QPS_RESET;
 		qp->events_completed = 0;
 		pthread_mutex_init(&qp->mutex, NULL);
 		pthread_cond_init(&qp->cond, NULL);


From jackm at dev.mellanox.co.il  Sun Jun  3 06:43:20 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Sun, 3 Jun 2007 16:43:20 +0300
Subject: [ofa-general] [PATCH] IB/mlx4: rq size computation fix
Message-ID: <200706031643.21047.jackm@dev.mellanox.co.il>

rq.max should be at least 1.

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index dc137de..0d5baf5 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -196,7 +196,7 @@ static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 	    cap->max_recv_sge > dev->dev->caps.max_rq_sg)
 		return -EINVAL;
 
-	qp->rq.max = cap->max_recv_wr ? roundup_pow_of_two(cap->max_recv_wr) : 0;
+	qp->rq.max = cap->max_recv_wr ? roundup_pow_of_two(cap->max_recv_wr) : 1;
 
 	qp->rq.wqe_shift = ilog2(roundup_pow_of_two(cap->max_recv_sge *
 						    sizeof (struct mlx4_wqe_data_seg)));


From gmbobo at iol.pt  Sun Jun  3 06:42:51 2007
From: gmbobo at iol.pt (=?iso-8859-1?Q?Mr.=20Gabriel=20Mbobo?=)
Date: Sun, 03 Jun 2007 14:42:51 +0100
Subject: [ofa-general] Compliments
Message-ID: <f56cf134c2b.4662d36b@iol.pt>


Good day,

I represent a top mining company executive in South Africa. I have a very sensitive and private brief from this top executive to ask for your partnership to re-profile funds totally Forty Two Million United States Dollars. ( $42,000,000.00) I will give the details of how we intend to proceed,this is a legitimate transaction. You will be paid 15% for your "management fees", if I am able to reach terms with you. 
 
If you are interested, please write me back by email and provide me with your full names and telephone numbers and address  and I will provide further details. Please keep this close to your chest as much as possible; we are still in acting service.
 
I wait in anticipation of your fullest co-operation. I am available to entertain any questions concerning the clarity of this transaction.

Regards, 

Mr. Gabriel Mbobo.

_______________________________________________________________________________________
Quer 5.000 euros? So na Conta Viva da GE Money.
Saiba mais em: http://www.iol.pt/correio/rodape.php?dst=0705281


From frankose at ifi.uio.no  Sun Jun  3 10:29:12 2007
From: frankose at ifi.uio.no (Frank Olaf Sem-Jacobsen)
Date: Sun, 03 Jun 2007 19:29:12 +0200
Subject: [ofa-general] Log output upon death
Message-ID: <4662FA68.2080305@ifi.uio.no>

Time for my second naive question (too bad the archives do not have any
search function).

Much as expected RunSimTest dies for an unknown reason while routing my
topology, and I am attempting to debug by adding various debug log
entries.  However, as things seem to be threaded (?) there does not seem
to be any direct relationship between where the application fails and
where the log output stops, the log usually stops abruptly in the middle
of a line.  Also, the log entries stop in various parts of the code
instead of the same place each time (I could though have many errors ;) ).

Is there a possible way to synchronise this such that the log file will
reflect the last log entry by opensm before it dies?  Are there any
other ingenious ways of debugging the route building function?

As always, any help is greatly appreciated.
-- 
Frank Olaf Sem-Jacobsen


From eitan at mellanox.co.il  Sun Jun  3 10:48:17 2007
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: Sun, 3 Jun 2007 20:48:17 +0300
Subject: [ofa-general] Log output upon death
In-Reply-To: <4662FA68.2080305@ifi.uio.no>
References: <4662FA68.2080305@ifi.uio.no>
Message-ID: <6C2C79E72C305246B504CBA17B5500C90199B3FF@mtlexch01.mtl.com>

Hi Frank,

>From your description it is unclear if it is the ibmssh (the shell that
interprets the RunSimTest code)
Or OpenSM has crashed. The best way to debug such issues (sudden death)
is to compile the executables (both opensm and ibmssh)
with debug info (by adding -ggdb to CFLAGS or better configure
--enable-debug) and then allow the system to create core file
(in bash use: ulimit -c unlimitted; in tcsh limit core unlimit).

Then you will get a core dump file.
Yo ushould try to open it in gdb and it will tell you what executable
generated the core.

Then you start gdb with the correct executable and core file and use the
"where" command to debug.
You can switch between threads by using the thread command.

If you want me to have a look at the failure you can send me the "input"
files you use (topo file and ibnl directory).

Eitan

Eitan Zahavi
Senior Engineering Director, Software Architect
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL

 
> -----Original Message-----
> From: general-bounces at lists.openfabrics.org 
> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of 
> Frank Olaf Sem-Jacobsen
> Sent: Sunday, June 03, 2007 8:29 PM
> To: general at lists.openfabrics.org
> Subject: [ofa-general] Log output upon death
> 
> Time for my second naive question (too bad the archives do 
> not have any search function).
> 
> Much as expected RunSimTest dies for an unknown reason while 
> routing my topology, and I am attempting to debug by adding 
> various debug log entries.  However, as things seem to be 
> threaded (?) there does not seem to be any direct 
> relationship between where the application fails and where 
> the log output stops, the log usually stops abruptly in the 
> middle of a line.  Also, the log entries stop in various 
> parts of the code instead of the same place each time (I 
> could though have many errors ;) ).
> 
> Is there a possible way to synchronise this such that the log 
> file will reflect the last log entry by opensm before it 
> dies?  Are there any other ingenious ways of debugging the 
> route building function?
> 
> As always, any help is greatly appreciated.
> --
> Frank Olaf Sem-Jacobsen
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 


From steven.farouk79 at gmail.com  Sun Jun  3 11:30:56 2007
From: steven.farouk79 at gmail.com (freelotto lottery)
Date: Sun, 3 Jun 2007 11:30:56 -0700
Subject: [ofa-general] Re: Attention: Notification Of Your Winnings,
	CONGRATULATIONS!!!
Message-ID: <c204ff40706031130t45fe18ffo2838b4d9e0adb51d@mail.gmail.com>

 *AFFILIATED OFFICE OF FREELOTTO U.K
**82 Victoria Street, Victoria London SW1 U.K *
***

*
*NOTIFICATION OF WINNING* :
*We are pleased to inform you of the release, of the recent results of the
FREELOTTO INTERNATIONAL PROMOTION PROGRAM held on 30 **th May, 2007**. You
were entered as dependent clients with: Reference Serial Number: **
F2-003-036** and Batch number **FR/45-300-06** .** Your email address
attached to the ticket number: **54-20-17-52-34-30* *that drew the lucky
winning number, which consequently won the Daily Jackpot in the first
category,in four parts. You have been approved for a payment of $1,000,
000.00 ( **One** **Million** **United** **State** Dollars) in cash credited
to file reference number**:** TFR/9900034943/JPT* *. Congratulations!!!*
*To read the FreeLotto click here:
**http://www.freelotto.com* <http://www.freelotto.com/>
*FreeLotto Winning Draw Results for June 3rd  2007**
$50, 000.00: 4-5-34-41-3-37
$200,000.00: 22-43-6-9-28-26
$10,000.00: 12-32-17-14-24-10
$100,000.00 : 2-27-22-47-16-21 *
*Daily Jackpot $1,000,000.00: 54-20-17-52-34-30
Super Bulk $10,000,000.00 : 37-2-48-41-46-25-43 *
*Please contact the underlisted claims release office for immediate pay out
of your winning fund: *
*Mrs.Olivia Malik*
*( Freelotto Fiduciary Department )
82 Victoria Street Victoria London SW1 U.K
Tel: +44 704 571 5302
Email:oliviamalik81079 at yahoo.com*
***He is your agent, and he is responsible for the processing and transfer
of your winnings to you. After receiving your check from our office in
U.Kother relevant documents you may need to claim your winning will be
delivered to you by our paying bank as soon as you validate your claims.*
*The freelotto internet drew is held every six months and is so organized to
encourage the use of the internet and computer worldwide. We are proud to
say that over 300 millions Euros are won annually in more than 118 countries
Worldwide. *
*Sincerely,
Mr.Steven .Farouk*
***Chairman & CEO.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070603/d88454ae/attachment.html>

From halr at voltaire.com  Sun Jun  3 12:27:22 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 03 Jun 2007 15:27:22 -0400
Subject: [ofa-general] Log output upon death
In-Reply-To: <4662FA68.2080305@ifi.uio.no>
References: <4662FA68.2080305@ifi.uio.no>
Message-ID: <1180898838.7116.450041.camel@hal.voltaire.com>

On Sun, 2007-06-03 at 13:29, Frank Olaf Sem-Jacobsen wrote:
> Time for my second naive question (too bad the archives do not have any
> search function).
> 
> Much as expected RunSimTest dies for an unknown reason while routing my
> topology, and I am attempting to debug by adding various debug log
> entries.  However, as things seem to be threaded (?) there does not seem
> to be any direct relationship between where the application fails and
> where the log output stops, the log usually stops abruptly in the middle
> of a line.  Also, the log entries stop in various parts of the code
> instead of the same place each time (I could though have many errors ;) ).
> 
> Is there a possible way to synchronise this such that the log file will
> reflect the last log entry by opensm before it dies?

There's force_log_flush in opensm.opts which should help with this.

-- Hal

> Are there any
> other ingenious ways of debugging the route building function?
> 
> As always, any help is greatly appreciated.


From rdreier at cisco.com  Sun Jun  3 13:35:14 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Sun, 03 Jun 2007 13:35:14 -0700
Subject: [ofa-general] Re: [PATCH] libibverbs: initialize qp state to RESET
	at qp creation time
In-Reply-To: <200706031639.38921.jackm@dev.mellanox.co.il> (Jack Morgenstein's
	message of "Sun, 3 Jun 2007 16:39:38 +0300")
References: <200706031639.38921.jackm@dev.mellanox.co.il>
Message-ID: <ada7iqkn5dp.fsf@cisco.com>

thanks, I applied this to master, stable and stable-1.0 branches.


From halr at voltaire.com  Sun Jun  3 14:04:04 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 03 Jun 2007 17:04:04 -0400
Subject: [ofa-general] OpenIB management libraries release
Message-ID: <1180904641.7116.456208.camel@hal.voltaire.com>

http://www.openfabrics.org/~halr/

md5sum
212f78cf6b370a2b5d44a773cd640446  libibcommon-1.0.3.tar.gz
7ba5da1f33a2df48ab34c12479852930  libibumad-1.0.5.tar.gz
1352954756833ad6a516e9a461949768  libibmad-1.0.5.tar.gz


From rdreier at cisco.com  Sun Jun  3 16:23:48 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Sun, 03 Jun 2007 16:23:48 -0700
Subject: [ofa-general] Re: [PATCH] IB/mlx4: rq size computation fix
In-Reply-To: <200706031643.21047.jackm@dev.mellanox.co.il> (Jack Morgenstein's
	message of "Sun, 3 Jun 2007 16:43:20 +0300")
References: <200706031643.21047.jackm@dev.mellanox.co.il>
Message-ID: <adavee4lj0b.fsf@cisco.com>

 > rq.max should be at least 1.

Is this true?  It seems this would break QPs that use SRQ.

(I agree we do need to make sure rq.max and rq.max_gs are at least 1
for QPs with a receive queue, but it seems this patch will actually
break things when a QP doesn't have a receive queue, because the send
queue offset will be wrong)


From hakanlcrtx at cuisine-emoi.com  Mon Jun  4 01:27:17 2007
From: hakanlcrtx at cuisine-emoi.com (Beatrice Carrillo)
Date: Mon, 4 Jun 2007 04:27:17 -0400
Subject: [ofa-general] CREATIVE SUITE 3 READY TO DOWNLOAD
Message-ID: <459461342343.452392227729@cuisine-emoi.com>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070604/231f5116/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: skelp.png
Type: image/png
Size: 18078 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070604/231f5116/attachment.png>

From fiatt42 at bigdaddymails.com  Mon Jun  4 02:24:38 2007
From: fiatt42 at bigdaddymails.com (Betty Saenz)
Date: Mon, 4 Jun 2007 18:24:38 +0900
Subject: [ofa-general] An covel and bracey
Message-ID: <001001c7a6d5$9c2984c0$0182528c@rok>

love with her daughter. The beautiful girl looked upon this heartless dangerous conspiracy which had been formed against the king. Alexander a striking contrast to the exuberant prolificness of New Grenada. It is, account of its thus marking the eastern frontier of the country, it
husband remonstrated with her against this atrocious proposal. "It would ascent as that of a few hundred feet in hundreds of miles would be characteristics of the country, with safety and pleasure. In a word, the characteristics of the country, with safety and pleasure. In a word, the
----------

Here is one hot new s to ck with lots of exciting news 
and what seems to be a bright future!

-----

Strategy X Inc. (SGXI)
A global risk mitigation specialist corporation.

Price Today: 0.009
Recommendation: Buy aggresively (500+% pump expected)

SGXI news: 
Strategy X Outlines Vertical Market Pursuit of the 
2007 U.S. Department of Homeland Security Grants...

For the complete release, please see your brokers website.

----------
the same surface with the sea, only, instead of blue waters topped with so afraid of his terrible mother, that he did not dare to remain in and barren desert, during the period of the annual inundations. This
those being the regions in which idleness reigns. The great remedy, too, admiration and pleasure. We have not the wings of the eagle, but the these rainless regions all is necessarily silence, desolation, and last Cleopatra seized a number of Lathyrus's servants, the eunuchs who
death, in order to prevent the older brothers from disputing the and degeneracy of national character as the world advances in age, will escaped with his life, as the mob had surrounded the palace and were were upon the throne. In the mean time, we will here only add, that


From vlad at lists.openfabrics.org  Mon Jun  4 02:41:53 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Mon,  4 Jun 2007 02:41:53 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070604-0200 daily build status
Message-ID: <20070604094154.2C3D4E60825@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.12
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.16
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.12
Passed on powerpc with linux-2.6.18
Passed on powerpc with linux-2.6.16
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on ppc64 with linux-2.6.15
Passed on ia64 with linux-2.6.17
Passed on x86_64 with linux-2.6.17
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.15
Passed on ppc64 with linux-2.6.12
Passed on x86_64 with linux-2.6.14
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on powerpc with linux-2.6.12
Passed on x86_64 with linux-2.6.21.1
Passed on ppc64 with linux-2.6.14
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From eli at mellanox.co.il  Mon Jun  4 07:16:35 2007
From: eli at mellanox.co.il (Eli Cohen)
Date: Mon, 04 Jun 2007 17:16:35 +0300
Subject: [ofa-general] [PATCH] libmlx4: doorbell allocator
Message-ID: <1180966625.10841.30.camel@mtls03>

Use type of the constant 1 identical to the type of the
variable holding the bit mask to prevent using the same bit
twice.
For example, on 64 bit machines, int is 32 bits and long is
64 bits. So 1 << 0 is equal 1 << 32 whereas the correct usage
should be 1L << shift_val

Found by Dotan Barak at Mellanox
Signed-off-by: Eli Cohen <eli at mellanox.co.il>

---
Index: libmlx4/src/dbrec.c
===================================================================
--- libmlx4.orig/src/dbrec.c	2007-06-04 12:53:57.000000000 +0300
+++ libmlx4/src/dbrec.c	2007-06-04 16:53:31.000000000 +0300
@@ -110,7 +110,7 @@
 		/* nothing */;
 
 	j = ffsl(page->free[i]);
-	page->free[i] &= ~(1 << (j - 1));
+	page->free[i] &= ~(1L << (j - 1));
 	db = page->buf.buf + (i * 8 * sizeof (long) + (j - 1)) * db_size[type];
 
 out:
@@ -135,7 +135,7 @@
 		goto out;
 
 	i = ((void *) db - page->buf.buf) / db_size[type];
-	page->free[i / (8 * sizeof (long))] |= 1 << (i % (8 * sizeof (long)));
+	page->free[i / (8 * sizeof (long))] |= 1L << (i % (8 * sizeof (long)));
 
 	if (!--page->use_cnt) {
 		if (page->prev)


From halr at voltaire.com  Mon Jun  4 08:46:52 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 04 Jun 2007 11:46:52 -0400
Subject: [ofa-general] [PATCH] ibnetdiscover: Add link width and speed to
	topology file output
Message-ID: <1180972011.4533.3711.camel@hal.voltaire.com>

ibnetdiscover: Add link width and speed to topology file output

Signed-off-by: Hal Rosenstock <halr at voltaire.com>

diff --git a/infiniband-diags/include/ibnetdiscover.h b/infiniband-diags/include/ibnetdiscover.h
index 4c2a6c7..7f2512e 100644
--- a/infiniband-diags/include/ibnetdiscover.h
+++ b/infiniband-diags/include/ibnetdiscover.h
@@ -72,6 +72,8 @@ struct Port {
 	int lmc;
 	int state;
 	int physstate;
+	int linkwidth;
+	int linkspeed;
 
 	Node *node;
 	Port *remoteport;		/* null if SMA */
diff --git a/infiniband-diags/man/ibnetdiscover.8 b/infiniband-diags/man/ibnetdiscover.8
index 7d9c49c..37a896c 100644
--- a/infiniband-diags/man/ibnetdiscover.8
+++ b/infiniband-diags/man/ibnetdiscover.8
@@ -1,4 +1,4 @@
-.TH IBNETDISCOVER 8 "June 2, 2007" "OpenIB" "OpenIB Diagnostics"
+.TH IBNETDISCOVER 8 "June 4, 2007" "OpenIB" "OpenIB Diagnostics"
 
 .SH NAME
 ibnetdiscover \- discover InfiniBand topology
@@ -131,45 +131,45 @@ devid=0x5a06
 sysimgguid=0x5442ba00003000
 switchguid=0x5442ba00003080
 Switch  24 "S-005442ba00003080"         # "ISR9024 Voltaire" base port 0 lid 6 lmc 0
-[22]    "H-0008f10403961354"[1]         # "MT23108 InfiniHost Mellanox Technologies" lid 4
-[10]    "S-0008f10400410015"[1]         # "SW-6IB4 Voltaire" lid 3
-[8]     "H-0008f10403960558"[2]         # "MT23108 InfiniHost Mellanox Technologies" lid 14
-[6]     "S-0008f10400410015"[3]         # "SW-6IB4 Voltaire" lid 3 
-[12]    "H-0008f10403960558"[1]         # "MT23108 InfiniHost Mellanox Technologies" lid 10
+[22]    "H-0008f10403961354"[1]         # "MT23108 InfiniHost Mellanox Technologies" lid 4 4xSDR
+[10]    "S-0008f10400410015"[1]         # "SW-6IB4 Voltaire" lid 3 4xSDR
+[8]     "H-0008f10403960558"[2]         # "MT23108 InfiniHost Mellanox Technologies" lid 14 4xSDR
+[6]     "S-0008f10400410015"[3]         # "SW-6IB4 Voltaire" lid 3 4xSDR
+[12]    "H-0008f10403960558"[1]         # "MT23108 InfiniHost Mellanox Technologies" lid 10 4xSDR
 
 vendid=0x8f1
 devid=0x5a05
 switchguid=0x8f10400410015
 Switch  8 "S-0008f10400410015"          # "SW-6IB4 Voltaire" base port 0 lid 3 lmc 0
-[6]     "H-0008f10403960984"[1]         # "MT23108 InfiniHost Mellanox Technologies" lid 16
-[4]     "H-005442b100004900"[1]         # "MT23108 InfiniHost Mellanox Technologies" lid 12
-[1]     "S-005442ba00003080"[10]                # "ISR9024 Voltaire" lid 6
-[3]     "S-005442ba00003080"[6]         # "ISR9024 Voltaire" lid 6
+[6]     "H-0008f10403960984"[1]         # "MT23108 InfiniHost Mellanox Technologies" lid 16 4xSDR
+[4]     "H-005442b100004900"[1]         # "MT23108 InfiniHost Mellanox Technologies" lid 12 4xSDR
+[1]     "S-005442ba00003080"[10]                # "ISR9024 Voltaire" lid 6 1xSDR
+[3]     "S-005442ba00003080"[6]         # "ISR9024 Voltaire" lid 6 4xSDR
 
 vendid=0x2c9
 devid=0x5a44
 caguid=0x8f10403960984
 Ca      2 "H-0008f10403960984"          # "MT23108 InfiniHost Mellanox Technologies"
-[1]     "S-0008f10400410015"[6]         # lid 16 lmc 1 "SW-6IB4 Voltaire" lid 3
+[1]     "S-0008f10400410015"[6]         # lid 16 lmc 1 "SW-6IB4 Voltaire" lid 3 4xSDR
 
 vendid=0x2c9
 devid=0x5a44
 caguid=0x5442b100004900
 Ca      2 "H-005442b100004900"          # "MT23108 InfiniHost Mellanox Technologies"
-[1]     "S-0008f10400410015"[4]         # lid 12 lmc 1 "SW-6IB4 Voltaire" lid 3
+[1]     "S-0008f10400410015"[4]         # lid 12 lmc 1 "SW-6IB4 Voltaire" lid 3 4xSDR
 
 vendid=0x2c9
 devid=0x5a44
 caguid=0x8f10403961354
 Ca      2 "H-0008f10403961354"          # "MT23108 InfiniHost Mellanox Technologies"
-[1]     "S-005442ba00003080"[22]                # lid 4 lmc 1 "ISR9024 Voltaire" lid 6
+[1]     "S-005442ba00003080"[22]                # lid 4 lmc 1 "ISR9024 Voltaire" lid 6 4xSDR
 
 vendid=0x2c9
 devid=0x5a44
 caguid=0x8f10403960558
 Ca      2 "H-0008f10403960558"          # "MT23108 InfiniHost Mellanox Technologies"
-[2]     "S-005442ba00003080"[8]         # lid 14 lmc 1 "ISR9024 Voltaire" lid 6
-[1]     "S-005442ba00003080"[12]                # lid 10 lmc 1 "ISR9024 Voltaire" lid 6
+[2]     "S-005442ba00003080"[8]         # lid 14 lmc 1 "ISR9024 Voltaire" lid 6 4xSDR
+[1]     "S-005442ba00003080"[12]                # lid 10 lmc 1 "ISR9024 Voltaire" lid 6 1xSDR
 .fi
 
 When grouping is used, IB nodes are organized into chasses which are
diff --git a/infiniband-diags/src/ibnetdiscover.c b/infiniband-diags/src/ibnetdiscover.c
index 1338913..3dc2173 100644
--- a/infiniband-diags/src/ibnetdiscover.c
+++ b/infiniband-diags/src/ibnetdiscover.c
@@ -46,7 +46,7 @@
 #include <errno.h>
 #include <inttypes.h>
 
-#define __BUILD_VERSION_TAG__ 1.2.2
+#define __BUILD_VERSION_TAG__ 1.2.3
 #include <common.h>
 #include <umad.h>
 #include <mad.h>
@@ -63,6 +63,26 @@ static char *node_type_str[] = {
 	"iwarp rnic"
 };
 
+static char *linkwidth_str[] = {
+	"??",
+	"1x",
+	"4x",
+	"??",
+	"8x",
+	"??",
+	"??",
+	"??",
+	"12x"
+};
+
+static char *linkspeed_str[] = {
+	"???",
+	"SDR",
+	"???",
+	"DDR",
+	"QDR"
+};
+
 static int timeout = 2000;		/* ms */
 static int dumplevel = 0;
 static int verbose;
@@ -80,6 +100,24 @@ int maxhops_discovered = 0;
 
 struct ChassisList *chassis = NULL;
 
+static char *
+get_linkwidth_str(int linkwidth)
+{
+	if (linkwidth > 8)
+		return linkwidth_str[0];
+	else
+		return linkwidth_str[linkwidth];
+}
+
+static char *
+get_linkspeed_str(int linkspeed)
+{
+	if (linkspeed > 4)
+		return linkspeed_str[0];
+	else
+		return linkspeed_str[linkspeed];
+}
+
 int
 get_port(Port *port, int portnum, ib_portid_t *portid)
 {
@@ -95,9 +133,11 @@ get_port(Port *port, int portnum, ib_por
 	mad_decode_field(pi, IB_PORT_LMC_F, &port->lmc);
 	mad_decode_field(pi, IB_PORT_STATE_F, &port->state);
 	mad_decode_field(pi, IB_PORT_PHYS_STATE_F, &port->physstate);
+	mad_decode_field(pi, IB_PORT_LINK_WIDTH_ACTIVE_F, &port->linkwidth);
+	mad_decode_field(pi, IB_PORT_LINK_SPEED_ACTIVE_F, &port->linkspeed);
 
-	DEBUG("portid %s portnum %d: lid %d state %d physstate %d",
-		portid2str(portid), portnum, port->lid, port->state, port->physstate);
+	DEBUG("portid %s portnum %d: lid %d state %d physstate %d %s %s",
+		portid2str(portid), portnum, port->lid, port->state, port->physstate, get_linkwidth_str(port->linkwidth), get_linkspeed_str(port->linkspeed));
 	return 1;
 }
 /*
@@ -135,6 +175,8 @@ get_node(Node *node, Port *port, ib_port
 	mad_decode_field(pi, IB_PORT_LMC_F, &port->lmc);
 	mad_decode_field(pi, IB_PORT_STATE_F, &port->state);
 	mad_decode_field(pi, IB_PORT_PHYS_STATE_F, &port->physstate);
+	mad_decode_field(pi, IB_PORT_LINK_WIDTH_ACTIVE_F, &port->linkwidth);
+	mad_decode_field(pi, IB_PORT_LINK_SPEED_ACTIVE_F, &port->linkspeed);
 
 	if (node->type != SWITCH_NODE)
 		return 0;
@@ -571,12 +613,14 @@ out_switch_port(Port *port, int group)
 		rem_nodename = clean_nodedesc(port->remoteport->node->nodedesc);
 
 	ext_port_str = out_ext_port(port->remoteport, group);
-	fprintf(f, "\t%s[%d]%s\t\t# \"%s\" lid %d\n",
+	fprintf(f, "\t%s[%d]%s\t\t# \"%s\" lid %d %s%s\n",
 		node_name(port->remoteport->node),
 		port->remoteport->portnum,
 		ext_port_str ? ext_port_str : "",
 		rem_nodename,
-		port->remoteport->node->type == SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->lid);
+		port->remoteport->node->type == SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->lid,
+		get_linkwidth_str(port->linkwidth),
+		get_linkspeed_str(port->linkspeed));
 
 	if (rem_nodename && (port->remoteport->node->type == SWITCH_NODE))
 		free(rem_nodename);
@@ -601,9 +645,11 @@ out_ca_port(Port *port, int group)
 				port->remoteport->node->nodedesc);
 	else
 		rem_nodename = clean_nodedesc(port->remoteport->node->nodedesc);
-	fprintf(f, "\t\t# lid %d lmc %d \"%s\" lid %d\n",
+	fprintf(f, "\t\t# lid %d lmc %d \"%s\" lid %d %s%s\n",
 		port->lid, port->lmc, rem_nodename,
-		port->remoteport->node->type == SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->lid);
+		port->remoteport->node->type == SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->lid,
+		get_linkwidth_str(port->linkwidth),
+		get_linkspeed_str(port->linkspeed));
 	if (rem_nodename && (port->remoteport->node->type == SWITCH_NODE))
 		free(rem_nodename);
 }


From yosefe at voltaire.com  Mon Jun  4 10:20:56 2007
From: yosefe at voltaire.com (Yosef Etigin)
Date: Mon, 04 Jun 2007 20:20:56 +0300
Subject: [ofa-general] [PATCH] rdma_cm: fix port type (fix bug 557)
Message-ID: <466449F8.2030100@voltaire.com>

This fixes bug 557 <https://bugs.openfabrics.org/show_bug.cgi?id=557>

If next_port is signed int, and is randomized to be negative, it will fail
accesses to the idr data structure and therefore cause errors in rdma_cm users.

Signed-off-by: Yosef Etigin <yosefe at voltaire.com>

--
diff -urN ofa_kernel-1.2/drivers/infiniband/core/cma.c ofa_kernel-1.2.b/drivers/infiniband/core/cma.c
--- ofa_kernel-1.2/drivers/infiniband/core/cma.c	2007-06-04 20:12:12.000000000 +0300
+++ ofa_kernel-1.2.b/drivers/infiniband/core/cma.c	2007-06-04 20:14:27.000000000 +0300
@@ -77,7 +77,7 @@
 static DEFINE_IDR(tcp_ps);
 static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
-static int next_port;
+static unsigned next_port;
 
 struct cma_device {
 	struct list_head	list;


From sean.hefty at intel.com  Mon Jun  4 11:35:49 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Mon, 4 Jun 2007 11:35:49 -0700
Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557)
In-Reply-To: <466449F8.2030100@voltaire.com>
Message-ID: <001d01c7a6d7$2cb81730$ff0da8c0@amr.corp.intel.com>

>If next_port is signed int, and is randomized to be negative, it will fail
>accesses to the idr data structure and therefore cause errors in rdma_cm users.

next_port is initialized as follows:

	get_random_bytes(&next_port, sizeof next_port);
	next_port = (next_port % (sysctl_local_port_range[1] -
				  sysctl_local_port_range[0])) +
		    sysctl_local_port_range[0];

Even if next_port is initialized to a negative value by get_random_bytes, I
would expect next_port to be set to a positive value between local_port_range[0]
and local_port_range[1] by the next statement.  I'm not seeing the error my my
math/logic here.

- Sean


From gurhan.ozen at gmail.com  Mon Jun  4 11:47:24 2007
From: gurhan.ozen at gmail.com (G.O.)
Date: Mon, 4 Jun 2007 14:47:24 -0400
Subject: [ofa-general] does RHEL5 Xen work with OFED?
In-Reply-To: <5849f1820704120925n10871803gb729e7767a64fecf@mail.gmail.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3034C5B50@xmb-sjc-216.amer.cisco.com>
	<5849f1820704052125ob1d309do323eae651ea9ed91@mail.gmail.com>
	<20070410181810.GD10218@mellanox.co.il>
	<5849f1820704120204q7f88f098qb69c1399668a4be9@mail.gmail.com>
	<20070412141417.GM24730@mellanox.co.il>
	<5849f1820704120925n10871803gb729e7767a64fecf@mail.gmail.com>
Message-ID: <5849f1820706041147x54e8d38at5fb5e66141090202@mail.gmail.com>

Hi Michael,
I am getting "Device ib0 does not seem to be present, delaying
initialization." warning.
I tried creating a new network-bridge by using ib0 interface  on Dom-0
as the net device as well but didn't work.

Thanks,
gurhan


On 4/12/07, G. O. <gurhan.ozen at gmail.com> wrote:
> On 4/12/07, Michael S. Tsirkin <mst at dev.mellanox.co.il> wrote:
> > > Quoting G.O. <gurhan.ozen at gmail.com>:
> > > Subject: Re: [ofa-general] does RHEL5 Xen work with OFED?
> > >
> > > On 4/10/07, Michael S. Tsirkin <mst at dev.mellanox.co.il> wrote:
> > > >> Quoting G.O. <gurhan.ozen at gmail.com>:
> > > >> Subject: Re: [ofa-general] does RHEL5 Xen work with OFED?
> > > >>
> > > >> On 4/5/07, Scott Weitzenkamp (sweitzen) <sweitzen at cisco.com> wrote:
> > > >> >Can I access OFED IPoIB and SRP/iSER devices from within a Xen virtual
> > > >> >machine?
> > > >> >
> > > >>
> > > >>    I haven't tested SRP/iSER , but IPoIB works only on dom0 kernel.
> > > >> You can't use any  infiniband stuff on the guest OSes .
> > > >>
> > > >>   Gurhan
> > > >
> > > >What doesn't work? I would expect both IPoIB and SRP
> > > >behave in more or less the same way as any network/storage
> > > >devices, and get virtualized by Xen.
> > > >
> > >
> > >    Nothing works. Guest kernel didn't even create
> > > /sys/class/infiniband/* files.  'Far as the guest kernel is concerned,
> > > HCA doesn't even seem to exist.
> > >
> > >   Just as a FYI, I have only tried on paravirtualized guests, didn't
> > > try it with fully-virtualized guests.
> >
> > Why would you want to see /sys/class/infiniband/?
> > There things are only there for direct HW access, guests do not get that.
> >
> > You should be able to use SRP and IPoIB - you set it up in host (dom0)
> > and guests use it as any other network/storage device through the
> > virtualization layer.
> >
>
>    Hi Michael,
>   IIRC, i had got the "can't find device, initialization delayed"
> errors. I'll play around with it again with the GA release when I get
> a chance and will let you know. Might happen as early as next week.
>
>    Thanks,
>    Gurhan
> >
> > --
> > MST
> >
>


From sweitzen at cisco.com  Mon Jun  4 11:53:51 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Mon, 4 Jun 2007 11:53:51 -0700
Subject: [ofa-general] does RHEL5 Xen work with OFED?
In-Reply-To: <5849f1820706041147x54e8d38at5fb5e66141090202@mail.gmail.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3034C5B50@xmb-sjc-216.amer.cisco.com>
	<5849f1820704052125ob1d309do323eae651ea9ed91@mail.gmail.com>
	<20070410181810.GD10218@mellanox.co.il>
	<5849f1820704120204q7f88f098qb69c1399668a4be9@mail.gmail.com>
	<20070412141417.GM24730@mellanox.co.il>
	<5849f1820704120925n10871803gb729e7767a64fecf@mail.gmail.com>
	<5849f1820706041147x54e8d38at5fb5e66141090202@mail.gmail.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA303A447A8@xmb-sjc-216.amer.cisco.com>

Yep, my understanding is IPoIB cannot be used with Xen network bridging
at this time, the bridging can't handle IPoIB ARP addresses.

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

> -----Original Message-----
> From: G.O. [mailto:gurhan.ozen at gmail.com] 
> Sent: Monday, June 04, 2007 11:47 AM
> To: Michael S. Tsirkin
> Cc: Scott Weitzenkamp (sweitzen); EWG; openib
> Subject: Re: [ofa-general] does RHEL5 Xen work with OFED?
> 
> Hi Michael,
> I am getting "Device ib0 does not seem to be present, delaying
> initialization." warning.
> I tried creating a new network-bridge by using ib0 interface  on Dom-0
> as the net device as well but didn't work.
> 
> Thanks,
> gurhan
> 
> 
> On 4/12/07, G. O. <gurhan.ozen at gmail.com> wrote:
> > On 4/12/07, Michael S. Tsirkin <mst at dev.mellanox.co.il> wrote:
> > > > Quoting G.O. <gurhan.ozen at gmail.com>:
> > > > Subject: Re: [ofa-general] does RHEL5 Xen work with OFED?
> > > >
> > > > On 4/10/07, Michael S. Tsirkin <mst at dev.mellanox.co.il> wrote:
> > > > >> Quoting G.O. <gurhan.ozen at gmail.com>:
> > > > >> Subject: Re: [ofa-general] does RHEL5 Xen work with OFED?
> > > > >>
> > > > >> On 4/5/07, Scott Weitzenkamp (sweitzen) 
> <sweitzen at cisco.com> wrote:
> > > > >> >Can I access OFED IPoIB and SRP/iSER devices from 
> within a Xen virtual
> > > > >> >machine?
> > > > >> >
> > > > >>
> > > > >>    I haven't tested SRP/iSER , but IPoIB works only 
> on dom0 kernel.
> > > > >> You can't use any  infiniband stuff on the guest OSes .
> > > > >>
> > > > >>   Gurhan
> > > > >
> > > > >What doesn't work? I would expect both IPoIB and SRP
> > > > >behave in more or less the same way as any network/storage
> > > > >devices, and get virtualized by Xen.
> > > > >
> > > >
> > > >    Nothing works. Guest kernel didn't even create
> > > > /sys/class/infiniband/* files.  'Far as the guest 
> kernel is concerned,
> > > > HCA doesn't even seem to exist.
> > > >
> > > >   Just as a FYI, I have only tried on paravirtualized 
> guests, didn't
> > > > try it with fully-virtualized guests.
> > >
> > > Why would you want to see /sys/class/infiniband/?
> > > There things are only there for direct HW access, guests 
> do not get that.
> > >
> > > You should be able to use SRP and IPoIB - you set it up 
> in host (dom0)
> > > and guests use it as any other network/storage device through the
> > > virtualization layer.
> > >
> >
> >    Hi Michael,
> >   IIRC, i had got the "can't find device, initialization delayed"
> > errors. I'll play around with it again with the GA release 
> when I get
> > a chance and will let you know. Might happen as early as next week.
> >
> >    Thanks,
> >    Gurhan
> > >
> > > --
> > > MST
> > >
> >
> 


From or.gerlitz at gmail.com  Mon Jun  4 12:12:20 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Mon, 4 Jun 2007 22:12:20 +0300
Subject: [ofa-general] Re: multiple independent IB fabrics connected to
	the same node
Message-ID: <15ddcffd0706041212u2c2ddfc7pdfa1a390c4fdb576@mail.gmail.com>

On 5/29/07, Bob Kossey <bob.kossey at hp.com> wrote:
>
> Another related question.  Does OFED 1.2 now support multiple
> independent IB fabrics
> (multiple SMs, etc) connected to multiple HCAs on the same node?  Are
> there any
> qualifications about which dimensions are supported with this, such as
> ipoib HA, SRP HA,
> other types of failover, etc.?
>

Hi Bob,

Generally speaking, as far as i am aware, by design the openib stack
--does-- support such a configuration but it must be validated, I cc the
maintainer here, in case they see something that they think is broken under
such a config.

However, note that such a config is somehow problematic (broken) for High
Availability, specifically looking on IPoIB HA, say you have two nodes, n1
and  n2 connected to two subnet S1 and S2 and now the n1/S1 link is broken
and bonding does fail over to the IPoIB interface on S2 such that n1/S2 is
the active link. At this point, for n2 to commuinicate with n1 it --must--
failover also to S2, when it would not have to do so if S1 and S2 were the
same fabric.

This is only a simple (non) use case to examplify the problem here. My take
on that you better avoid rely on HA between subnets using tools like
bonding. If you use higher level HA tools it --might-- make sense to plan
for using two independent subnets.

Or.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070604/c1778fb8/attachment.html>

From mshefty at ichips.intel.com  Mon Jun  4 12:37:39 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 04 Jun 2007 12:37:39 -0700
Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557)
In-Reply-To: <001d01c7a6d7$2cb81730$ff0da8c0@amr.corp.intel.com>
References: <001d01c7a6d7$2cb81730$ff0da8c0@amr.corp.intel.com>
Message-ID: <46646A03.2040508@ichips.intel.com>

> Even if next_port is initialized to a negative value by get_random_bytes, I
> would expect next_port to be set to a positive value between local_port_range[0]
> and local_port_range[1] by the next statement.  I'm not seeing the error my my
> math/logic here.

My my English needs help, but here's the definitions for '%' in C89 and 
C99 according to Wikipedia:

C89 - sign of result is not defined
C99 - result has same sign as dividend

Could the compiler be causing the difference on this?

- Sean


From rdreier at cisco.com  Mon Jun  4 12:49:28 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 04 Jun 2007 12:49:28 -0700
Subject: [ofa-general] Re: [PATCH] libmlx4: doorbell allocator
In-Reply-To: <1180966625.10841.30.camel@mtls03> (Eli Cohen's message of "Mon,
	04 Jun 2007 17:16:35 +0300")
References: <1180966625.10841.30.camel@mtls03>
Message-ID: <adawsyjjy9j.fsf@cisco.com>

applied, thanks


From yosefe at voltaire.com  Mon Jun  4 12:55:09 2007
From: yosefe at voltaire.com (Yosef Eitgin)
Date: Mon, 4 Jun 2007 22:55:09 +0300
Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557)
References: <001d01c7a6d7$2cb81730$ff0da8c0@amr.corp.intel.com>
	<46646A03.2040508@ichips.intel.com>
Message-ID: <39C75744D164D948A170E9792AF8E7CA0A819B@exil.voltaire.com>

>> Even if next_port is initialized to a negative value by get_random_bytes, I
>> would expect next_port to be set to a positive value between local_port_range[0]
>> and local_port_range[1] by the next statement.  I'm not seeing the error my my
>> math/logic here.
>
>My my English needs help, but here's the definitions for '%' in C89 and 
>C99 according to Wikipedia:
>
>C89 - sign of result is not defined
>C99 - result has same sign as dividend
>
>Could the compiler be causing the difference on this?
>
>- Sean
>
Possible. I was using the OFED build environment in sles10sp1, and without the 
patch next_port sometimes gets a negative value. This might be the reason it was
difficult to reproduce this. Anyway, in order to cover all possibilities (such 
as C99), I think that next_port should be unsigned.

--Yossi


From hanafim.ctr at asc.hpc.mil  Mon Jun  4 12:53:43 2007
From: hanafim.ctr at asc.hpc.mil (MAHMOUD HANAFI)
Date: Mon, 04 Jun 2007 15:53:43 -0400
Subject: [ofa-general] IB_GOLD  ib_srp question
Message-ID: <46646DC7.4030800@asc.hpc.mil>

I am not sure if this is the best place to ask this or not....

Does any one know how to change "max_hw_sectors_kb" using ib_gold 1.8.3. I know you can set it using 
max_sect on OFED1.2.

The default for Ib_gold is 128KB which is to small.

Thanks,
-- 
Mahmoud Hanafi
Senior System Administrator
ASC/MSRC
www.asc.hpc.mil
2435 5th Street
WPAFB, OHIO 45433
(937) 255-1536


From jgunthorpe at obsidianresearch.com  Mon Jun  4 13:24:12 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Mon, 4 Jun 2007 14:24:12 -0600
Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557)
In-Reply-To: <46646A03.2040508@ichips.intel.com>
References: <001d01c7a6d7$2cb81730$ff0da8c0@amr.corp.intel.com>
	<46646A03.2040508@ichips.intel.com>
Message-ID: <20070604202412.GH32050@obsidianresearch.com>

On Mon, Jun 04, 2007 at 12:37:39PM -0700, Sean Hefty wrote:
> >Even if next_port is initialized to a negative value by get_random_bytes, I
> >would expect next_port to be set to a positive value between 
> >local_port_range[0]
> >and local_port_range[1] by the next statement.  I'm not seeing the error 
> >my my
> >math/logic here.
> 
> My my English needs help, but here's the definitions for '%' in C89 and 
> C99 according to Wikipedia:

The C99 '%' operator is actually a remainder operator, not a modulo
operator.. These two things are identical until you consider the
effect of negative numbers:

-1 modulo 4 = 3
-1 modulo -4 = -1
-1 remainder 4 = -1    # C99 defintion of %
-1 remainder -4 = -1

Lagunages that have both a remainder and a modulo operator operate as
above. Other languages often like to call remainder modulo, so it is
all very confusing.

For C, it is best if you never use signed numbers with % since prior
to C99 it was undefined if it is remainder or modulo. Also, in
general, most people don't want remainder when they think of % in C.

Jason


From mshefty at ichips.intel.com  Mon Jun  4 14:53:01 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 04 Jun 2007 14:53:01 -0700
Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557)
In-Reply-To: <39C75744D164D948A170E9792AF8E7CA0A819B@exil.voltaire.com>
References: <001d01c7a6d7$2cb81730$ff0da8c0@amr.corp.intel.com>
	<46646A03.2040508@ichips.intel.com>
	<39C75744D164D948A170E9792AF8E7CA0A819B@exil.voltaire.com>
Message-ID: <466489BD.50608@ichips.intel.com>

> Possible. I was using the OFED build environment in sles10sp1, and without the 
> patch next_port sometimes gets a negative value. This might be the reason it was
> difficult to reproduce this. Anyway, in order to cover all possibilities (such 
> as C99), I think that next_port should be unsigned.

The problem makes sense to me now, and it explains why it wasn't easily 
reproducible on other platforms.  I'm not sure if we should convert 
next_port to an unsigned value, or just ensure that it's not negative. 
It's defined as an int since idr_get_new_above() expects an int.  Do we 
need an explicit cast when calling idr_get_new_above(), or how about 
just casting next_port to unsigned when initializing it?

- Sean


From troy at scl.ameslab.gov  Mon Jun  4 16:52:46 2007
From: troy at scl.ameslab.gov (Troy Benjegerdes)
Date: Mon, 04 Jun 2007 18:52:46 -0500
Subject: [ofa-general] Perfquery XmtWords, not XmtBytes...
Message-ID: <4664A5CE.4080505@scl.ameslab.gov>

It appears that Perfquery (and the performance counter api's we are 
using for fountain/goanna) are reporting data in 32 bit (4-byte) *words* 
and not bytes.

Can someone please clear up my confusion on this, and maybe correct the 
documentation as well?


From halr at voltaire.com  Mon Jun  4 17:17:58 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 04 Jun 2007 20:17:58 -0400
Subject: [ofa-general] Perfquery XmtWords, not XmtBytes...
In-Reply-To: <4664A5CE.4080505@scl.ameslab.gov>
References: <4664A5CE.4080505@scl.ameslab.gov>
Message-ID: <1181002677.12997.17099.camel@hal.voltaire.com>

On Mon, 2007-06-04 at 19:52, Troy Benjegerdes wrote:
> It appears that Perfquery (and the performance counter api's we are 
> using for fountain/goanna) are reporting data in 32 bit (4-byte) *words* 
> and not bytes.
> 
> Can someone please clear up my confusion on this, and maybe correct the 
> documentation as well?

It's consistent with what the IB spec says (IBA 1.2 vol 1 p.948) as to
how these quantities are counted. They are defined to be octets divided
by 4 so the choice is to display them the same as the actual quantity
(which is why they are named Data rather than Octets) or to multiply by
4 for Octets. The former choice was made.

-- Hal 

> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From sean.hefty at intel.com  Mon Jun  4 17:19:00 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Mon, 4 Jun 2007 17:19:00 -0700
Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557)
In-Reply-To: <466489BD.50608@ichips.intel.com>
Message-ID: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com>

Can you see if this patch also fixes the problem?  I'd like to keep
next_port defined as an int to match the idr_get_new_above() prototype
and sysctl_local_port_range definition.

If this fixes the problem, we should add it to OFED and queue it for
2.6.23.
---

next_port should be between sysctl_local_port_range[0] and [1].  However,
it is initially set to a random value.  If the value is negative, next_port
can fall outside of this range because of the % operator returning a
negative value.

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---

 drivers/infiniband/core/cma.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index eb15119..b0831cb 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2772,8 +2772,8 @@ static int cma_init(void)
 	int ret;
 
 	get_random_bytes(&next_port, sizeof next_port);
-	next_port = (next_port % (sysctl_local_port_range[1] -
-				  sysctl_local_port_range[0])) +
+	next_port = ((unsigned int) next_port % 
+		    (sysctl_local_port_range[1] - sysctl_local_port_range[0])) +
 		    sysctl_local_port_range[0];
 	cma_wq = create_singlethread_workqueue("rdma_cm");
 	if (!cma_wq)


From troy at scl.ameslab.gov  Mon Jun  4 17:41:42 2007
From: troy at scl.ameslab.gov (Troy Benjegerdes)
Date: Mon, 04 Jun 2007 19:41:42 -0500
Subject: [ofa-general] Perfquery XmtWords, not XmtBytes...
In-Reply-To: <1181002677.12997.17099.camel@hal.voltaire.com>
References: <4664A5CE.4080505@scl.ameslab.gov>
	<1181002677.12997.17099.camel@hal.voltaire.com>
Message-ID: <4664B146.9090205@scl.ameslab.gov>

Okay. I see the latest version of perfquery uses 'XmtData' instead of 
XmtBytes.

Thanks.

Hal Rosenstock wrote:
> On Mon, 2007-06-04 at 19:52, Troy Benjegerdes wrote:
>   
>> It appears that Perfquery (and the performance counter api's we are 
>> using for fountain/goanna) are reporting data in 32 bit (4-byte) *words* 
>> and not bytes.
>>
>> Can someone please clear up my confusion on this, and maybe correct the 
>> documentation as well?
>>     
>
> It's consistent with what the IB spec says (IBA 1.2 vol 1 p.948) as to
> how these quantities are counted. They are defined to be octets divided
> by 4 so the choice is to display them the same as the actual quantity
> (which is why they are named Data rather than Octets) or to multiply by
> 4 for Octets. The former choice was made.
>
> -- Hal 
>
>   
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>>     
>
>   


From vlad at lists.openfabrics.org  Tue Jun  5 02:40:25 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Tue,  5 Jun 2007 02:40:25 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070605-0200 daily build status
Message-ID: <20070605094025.5E494E60834@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.13
Passed on x86_64 with linux-2.6.19
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.12
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.13
Passed on x86_64 with linux-2.6.16
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.14
Passed on x86_64 with linux-2.6.18
Passed on powerpc with linux-2.6.18
Passed on ppc64 with linux-2.6.18
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.13
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.17
Passed on ppc64 with linux-2.6.14
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on x86_64 with linux-2.6.15
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.15
Passed on ia64 with linux-2.6.17
Passed on ppc64 with linux-2.6.13
Passed on powerpc with linux-2.6.16
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.15
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.9-34.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6

Failed:


From tziporet at dev.mellanox.co.il  Tue Jun  5 03:35:28 2007
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Tue, 05 Jun 2007 13:35:28 +0300
Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557)
In-Reply-To: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com>
References: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com>
Message-ID: <46653C70.7070403@mellanox.co.il>

Sean Hefty wrote:
> Can you see if this patch also fixes the problem?  I'd like to keep
> next_port defined as an int to match the idr_get_new_above() prototype
> and sysctl_local_port_range definition.
>
> If this fixes the problem, we should add it to OFED and queue it for
> 2.6.23.
> ---
>
>
>   
Sean/Yossi
Can you prepare us a patch for OFED 1.2

Thanks,
Tziporet


From vlad at mellanox.co.il  Tue Jun  5 05:14:47 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Tue, 05 Jun 2007 15:14:47 +0300
Subject: [ofa-general] rdma_cm kernel Oops
Message-ID: <1181045687.1114.16.camel@vladsk-laptop>

Hi Sean,
I got the following kernel oops while testing RDS HA (kernel 2.6.20):

rdma_destroy_id+0x124/0x193 corresponds to the line 778 in drivers/infiniband/core/cma.c

    771 static void cma_release_port(struct rdma_id_private *id_priv)
    772 {
    773         struct rdma_bind_list *bind_list = id_priv->bind_list;
    774
    775         if (!bind_list)
    776                 return;
    777
    778         mutex_lock(&lock);
    779         hlist_del(&id_priv->node);
    780         if (hlist_empty(&bind_list->owners)) {
    781                 idr_remove(bind_list->ps, bind_list->port);
    782                 kfree(bind_list);
    783         }
    784         mutex_unlock(&lock);
    785 }


Oops:
Jun[  645.944058] Pid: 7354, comm: rdma_cm_wq Not tainted 2.6.20 #2
  5 09:11:48 sw1[  645.944061] RIP: 0010:[<ffffffff8819aa7c>]  [<ffffffff8819aa7c>] :rdma_cm:rdma_destroy_id+0x124/0x193
23 kernel: [  64[  645.944072] RSP: 0018:ffff81011f223e30  EFLAGS: 00010206
5.816913] rds_sh[  645.944076] RAX: 0000000000100100 RBX: ffff81011d86d340 RCX: ffff8101224d0350
utdown_worker: w[  645.944080] RDX: 0000000000200200 RSI: 0000000000000056 RDI: ffffffff881a2140
as_conn 0 was_co[  645.944084] RBP: ffff8101224d0270 R08: 0000000000000000 R09: 0000000000000000
nning -1
[  645.944087] R10: ffff81011f223d50 R11: 0000000000000048 R12: 0000000000000001
[  645.944091] R13: 0000000000000287 R14: ffffffff8819b445 R15: 0000000000000000
[  645.944095] FS:  0000000000000000(0000) GS:ffffffff8058e000(0000) knlGS:0000000000000000
[  645.944099] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  645.944103] CR2: 0000000000200200 CR3: 000000011e21d000 CR4: 00000000000006e0
[  645.944107] Process rdma_cm_wq (pid: 7354, threadinfo ffff81011f222000, task ffff810117df8830)
[  645.944110] Stack:  ffff8101224d0270 ffff8101224d0270 ffff81011a850a20 ffffffff8819b4a7
[  645.944119]  ffff81011a850a28 ffff81011d89ea48 ffff81011a850a20 ffffffff80239c4e
[  645.944126]  ffff81011d89ea48 ffffffff80239ced ffff8101201c7d98 00000000fffffffc
[  645.944132] Call Trace:
[  645.944143]  [<ffffffff8819b4a7>] :rdma_cm:cma_work_handler+0x62/0x6e
[  645.944153]  [<ffffffff80239c4e>] run_workqueue+0xa5/0x144
[  645.944159]  [<ffffffff80239ced>] worker_thread+0x0/0x165
[  645.944164]  [<ffffffff8023cc58>] keventd_create_kthread+0x0/0x6a
[  645.944169]  [<ffffffff80239e1c>] worker_thread+0x12f/0x165
[  645.944177]  [<ffffffff80225003>] default_wake_function+0x0/0xe
[  645.944184]  [<ffffffff80225003>] default_wake_function+0x0/0xe
[  645.944190]  [<ffffffff8023cc2f>] kthread+0xc8/0xf1
[  645.944198]  [<ffffffff8020a2b8>] child_rip+0xa/0x12
[  645.944203]  [<ffffffff8023cc58>] keventd_create_kthread+0x0/0x6a
[  645.944213]  [<ffffffff8023cb67>] kthread+0x0/0xf1
[  645.944217]  [<ffffffff8020a2ae>] child_rip+0x0/0x12
[  645.944221] 
[  645.944223] 
[  645.944224] Code: 48 89 02 74 04 48 89 50 08 48 c7 85 e0 00 00 00 00 01 10 00 
[  645.944236] RIP  [<ffffffff8819aa7c>] :rdma_cm:rdma_destroy_id+0x124/0x193
[  645.944246]  RSP <ffff81011f223e30>
[  645.944249] CR2: 0000000000200200
[  645.944251]  <4>created cm id ffff8101224d0270 for conn ffff81011c857d48

-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From rdreier at cisco.com  Tue Jun  5 06:34:41 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 05 Jun 2007 06:34:41 -0700
Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557)
In-Reply-To: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com> (Sean Hefty's
	message of "Mon, 4 Jun 2007 17:19:00 -0700")
References: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com>
Message-ID: <ada1wgqjzim.fsf@cisco.com>

 > If this fixes the problem, we should add it to OFED and queue it for
 > 2.6.23.

I haven't followed this closely, but what's the impact of this bug?
It seems it would result in a port out of the configured range being
used.  Which seems serious enough to fix for 2.6.22 to me.


From ivestavasot at t-dialin.net  Tue Jun  5 06:29:41 2007
From: ivestavasot at t-dialin.net (Samara Cox)
Date: Tue, 05 Jun 2007 18:29:41 +0500
Subject: [ofa-general] Time to check it out
Message-ID: <d59301c7a79f$7baa5e70$422c3ae6@ivestavasot>


Stacy deafening moon chilly overthrown leaned forward. I'm listening.Yeah? inject well Dana gave corporeal her a wound stern look, Sometimes, a man'slet bag fit Alright. Gavin hammer sat down next to her and put h
This worm fill caught Stacy's attention. hematal When low did you evAnd I'll use it! bang grin Jeff cautious now held it cause up in a men 
Obviously you've wriggle frightened known trouble pull him alot longer than I When I got telephone strange my carve first meant period, instead of just gi Needless watch laid to nut say, monkey I'm no authority on what make skin pedal Dana trust leaned into statement him, and they both settled int
Stacy's boil cellphone rang. improve talking spot into it before was a bbread war A couple of days ago, ill paper when I was trying to che Sol nose stood in front of Gordy cuddly time to overdone shield him. Jef So they stung stain never really took question you to taken see Swan Lake
amuse 8:15 wove sense milk PM, Faircrest Middle School When stitch she dropped you off thin load at cruel Gavin's tonight, w In other words, side this geoponic is homely a guy who's existence used to ge She smell innocent never help even met vivaciously him. Dana was now slightly kiss We hand linen georgic might have found out if Guy hadn't frighten
wooden Oh they infamous took me improve alright. weaved It's the part about cHello? bright Stacy, account it's sky me, Came Dana's voice tumble over the re argue Jeff was not moved. level Sol, do monthly you helpful have any idea  A few comparison seconds later, Sol came journey hair level running out after
Linda chimed brought in, punishment You're say consider his chick, not his motshore Want escape me meet to run after them and tell breed them it wasThe tow basket parents and teachers had enormously bathe gone through all t That actually explains bewildered cloud within alot. I'll see moor you a li
pen Principal drank Lazarus woman stepped up to bare the lecturn sta Is owe that hook brief. your girlfriend? cheat Greil noticed Jeff w What!? Yeah. Up until now, use she didn't became open thrust know anything a crack Not driven fine beyond tonight, said Nicki. Hey, what's up?
Alright, hid I get the busy idea. sin bound I take comfort in knoHow're hammer stuck you adjusting corporeal store to the cast?Tell defeated flame anyone embarrass about bit what? You think the whole sc journey Jeff laid out back jog down. In knelt my darkest hours...whi It's been a risk little across inconvenient watch to forsook say the lea
spare gave Bye week Angel. colourful Jeff closed his cellphone Oddly enough, Jeff's parents long had store thunder tintinnabulary no idea whatso  The under seemingly innocuous shaved kiss decide hour long deactivation
unexpectedly She question just took push outstanding one look at the size of that hou low Did rescue you get pontal a look at after them? called Jeff. burn blot It looks to waste withheld me like she's not too happy about Marcie had been listening speedily annually miniature sticky in on the conversatio noise tiny lovely No one I know, called Guy. And I pleasure had a blood repulsive shelf I've got a question to ask. was I've floor noticed a vid
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070605/cd39e896/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: yhadi.gif
Type: image/gif
Size: 6635 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070605/cd39e896/attachment.gif>

From sashak at voltaire.com  Tue Jun  5 08:28:03 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Tue, 5 Jun 2007 18:28:03 +0300
Subject: [ofa-general] [PATCH] opensm: protect sminfo response
Message-ID: <20070605152803.GA10519@sashak.voltaire.com>


This port_guid check protects SMInfo responses processing against port
moving issue.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 opensm/opensm/osm_port_info_rcv.c |    1 +
 opensm/opensm/osm_sminfo_rcv.c    |   19 +++++++++++++++++++
 2 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/opensm/opensm/osm_port_info_rcv.c b/opensm/opensm/osm_port_info_rcv.c
index 849427e..1fd4915 100644
--- a/opensm/opensm/osm_port_info_rcv.c
+++ b/opensm/opensm/osm_port_info_rcv.c
@@ -199,6 +199,7 @@ __osm_pi_rcv_process_endport(
         */
         memset( &context, 0, sizeof(context) );
         context.smi_context.set_method = FALSE;
+        context.smi_context.port_guid = port_guid;
         status = osm_req_get( p_rcv->p_req,
                               osm_physp_get_dr_path_ptr( p_physp ),
                               IB_MAD_ATTR_SM_INFO,
diff --git a/opensm/opensm/osm_sminfo_rcv.c b/opensm/opensm/osm_sminfo_rcv.c
index b26b6bf..18fd072 100644
--- a/opensm/opensm/osm_sminfo_rcv.c
+++ b/opensm/opensm/osm_sminfo_rcv.c
@@ -749,8 +749,26 @@ osm_sminfo_rcv_process(
   */
   if( ib_smp_is_response( p_smp ) )
   {
+    const ib_sm_info_t *p_smi = ib_smp_get_payload_ptr( p_smp );
+
     /* Get the context - to see if this is a response to a Get or Set method */
     p_smi_context = osm_madw_get_smi_context_ptr( p_madw );
+
+    /*
+      verify that response is from expected port and there is no port
+      moving issue */
+    if ( p_smi_context->port_guid != p_smi->guid )
+    {
+      osm_log( p_rcv->p_log, OSM_LOG_ERROR,
+               "osm_sminfo_rcv_process: ERR 2F19: "
+               "unexpected SM port GUID in response"
+               "\n\t\t\t\tExpected 0x%016" PRIx64
+               ", Received 0x%016" PRIx64 "\n",
+               cl_ntoh64( p_smi_context->port_guid ),
+               cl_ntoh64( p_smi->guid ) );
+      goto Exit;
+    }
+
     if ( p_smi_context->set_method == FALSE )
     {
       /* this is a response to a Get method */
@@ -777,5 +795,6 @@ osm_sminfo_rcv_process(
     }
   }
 
+ Exit:
   OSM_LOG_EXIT( p_rcv->p_log );
 }
-- 
1.5.2.1.137.g426c


From vuhuong at mellanox.com  Tue Jun  5 08:56:31 2007
From: vuhuong at mellanox.com (Vu Pham)
Date: Tue, 05 Jun 2007 08:56:31 -0700
Subject: [ofa-general] IB_GOLD  ib_srp question
In-Reply-To: <46646DC7.4030800@asc.hpc.mil>
References: <46646DC7.4030800@asc.hpc.mil>
Message-ID: <466587AF.50906@mellanox.com>

MAHMOUD,
   For ib_gold 1.8.3, the parameter is 
max_xfer_sectors_per_io. You can change it when loading the 
srp module

-vu

> I am not sure if this is the best place to ask this or not....
> 
> Does any one know how to change "max_hw_sectors_kb" using ib_gold 1.8.3. 
> I know you can set it using max_sect on OFED1.2.
> 
> The default for Ib_gold is 128KB which is to small.
> 
> Thanks,


From sean.hefty at intel.com  Tue Jun  5 09:43:18 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 5 Jun 2007 09:43:18 -0700
Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557)
In-Reply-To: <ada1wgqjzim.fsf@cisco.com>
Message-ID: <000001c7a790$9f44e490$3c98070a@amr.corp.intel.com>

>I haven't followed this closely, but what's the impact of this bug?
>It seems it would result in a port out of the configured range being
>used.  Which seems serious enough to fix for 2.6.22 to me.

It can result in a port outside of the configured range, and its occurrence
depends on the compiler used.  I've pushed my patch to:

	git://git.openfabrics.org/~shefty/rdma-dev.git for-roland

which is based on 2.6.22-rc4.  Yosef, can you confirm that this patch works for
you?

- Sean


From halr at voltaire.com  Tue Jun  5 10:30:34 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 05 Jun 2007 13:30:34 -0400
Subject: [ofa-general] Re: [PATCH] opensm: protect sminfo response
In-Reply-To: <20070605152803.GA10519@sashak.voltaire.com>
References: <20070605152803.GA10519@sashak.voltaire.com>
Message-ID: <1181064634.12997.83723.camel@hal.voltaire.com>

On Tue, 2007-06-05 at 11:28, Sasha Khapyorsky wrote:
> This port_guid check protects SMInfo responses processing against port
> moving issue.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Thanks. Applied.

-- Hal


From yosefe at voltaire.com  Tue Jun  5 10:31:54 2007
From: yosefe at voltaire.com (Yosef Eitgin)
Date: Tue, 5 Jun 2007 20:31:54 +0300
Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557)
References: <000001c7a790$9f44e490$3c98070a@amr.corp.intel.com>
Message-ID: <39C75744D164D948A170E9792AF8E7CA0A819D@exil.voltaire.com>

>>I haven't followed this closely, but what's the impact of this bug?
>>It seems it would result in a port out of the configured range being
>>used.  Which seems serious enough to fix for 2.6.22 to me.
>
>It can result in a port outside of the configured range, and its occurrence
>depends on the compiler used.  I've pushed my patch to:
>
>	git://git.openfabrics.org/~shefty/rdma-dev.git for-roland
>
>which is based on 2.6.22-rc4.  Yosef, can you confirm that this patch works for
>you?
>
>- Sean

I'm out of office right now, but from a little external test looks like this does
the job.

--Yossi


From hanafim.ctr at asc.hpc.mil  Tue Jun  5 10:48:42 2007
From: hanafim.ctr at asc.hpc.mil (MAHMOUD HANAFI)
Date: Tue, 05 Jun 2007 13:48:42 -0400
Subject: [ofa-general] OFED vs IB_GOLD IB_SRP Performance results
Message-ID: <4665A1FA.1000506@asc.hpc.mil>

All,

I have been evaluating srp performance using a DDN IB attached storage. I have looked at both OFED 
and IBGOLD. I have discovered some interesting results. Although the write performance is consistent 
between OFED and IBGOLD, the read performance is not. I looked at various tuning setting but have 
been unable to improve read performance of OFED. I am interested in getting feed back in regards to 
these results.

As you can see in this chart that the IBGD out performs OFED at the larger record Lengths.
READ CHART: http://www.clusteringsolutions.com/openib/Read.png

WRITE CHART: http://www.clusteringsolutions.com/openib/Write.png


Test Setup:
Test software: xdd using direct IO
Sever: Dell 2950 4 core 8GB memory
Storage: DDN S2A 9500
LUN: 4 - 1 per tier (8+1) Blocksize = 4096
IB: SDR cisco
Fiber Channel = 4 Gb/sec Qlogic using qla2400 driver
kernel tested:  2.6.9-42.0.10.ELsmp and 2.6.9-42.0.10.EL_lustre.1.4.10smp
IB Stack: OFED1.1, OFED1.2, and IB_GOLD1.8.3

IBGOLD Setup:
/sys/module/ib_srp/dlid_conf = 0
/sys/module/ib_srp/fmr_cache = 0
/sys/module/ib_srp/ib_ports_mask = -1
/sys/module/ib_srp/max_cmds_per_lun = 1
/sys/module/ib_srp/max_luns = 256
/sys/module/ib_srp/max_srp_targets = 16
/sys/module/ib_srp/max_xfer_sectors_per_io = 8192
/sys/module/ib_srp/refcnt = 16
/sys/module/ib_srp/service_str = <NULL>
/sys/module/ib_srp/srp_discovery_timeout = 60
/sys/module/ib_srp/srp_tracelevel = 2
/sys/module/ib_srp/target_bindings = <NULL>

/sys/block/sdc/queue/max_hw_sectors_kb = 4096
/sys/block/sdc/queue/max_sectors_kb = 4096
/sys/block/sdc/queue/nr_requests = 8192
/sys/block/sdc/queue/read_ahead_kb = 128

OFED Setup:
/sys/module/ib_srp/mellanox_workarounds = 1
/sys/module/ib_srp/refcnt = 11
/sys/module/ib_srp/srp_sg_tablesize = 256
/sys/module/ib_srp/topspin_workarounds = 1

/sys/block/sdd/queue/max_sectors_kb = 4096
/sys/block/sdd/queue/nr_requests = 8192
/sys/block/sdd/queue/read_ahead_kb = 128

Thanks,
-- 
Mahmoud Hanafi
Senior System Administrator
ASC/MSRC
www.asc.hpc.mil
2435 5th Street
WPAFB, OHIO 45433
(937) 255-1536


From halr at voltaire.com  Tue Jun  5 11:46:35 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 05 Jun 2007 14:46:35 -0400
Subject: [ofa-general] [PATCH 2/2] infiniband-diags/ibidsverify.pl: Support
	port GUID validation
Message-ID: <1181069190.12997.88623.camel@hal.voltaire.com>

infiniband-diags/ibidsverify.pl: Support port GUID validation

Note that original topology file format without port GUIDs is also
supported in which case this validation is omitted.

Signed-off-by: Hal Rosenstock <halr at voltaire.com>

diff --git a/infiniband-diags/scripts/ibidsverify.pl b/infiniband-diags/scripts/ibidsverify.pl
index c9730c1..5d97eab 100755
--- a/infiniband-diags/scripts/ibidsverify.pl
+++ b/infiniband-diags/scripts/ibidsverify.pl
@@ -83,6 +83,7 @@ sub validate_non_zero_guid
 
 $insert_lid::lids = undef;
 $insert_nodeguid::nodeguids = undef;
+$insert_portguid::portguids = undef;
 
 sub insert_lid
 {
@@ -130,6 +131,29 @@ sub insert_nodeguid
     }
 }
 
+sub insert_portguid
+{
+    my ($lid) = shift (@_);
+    my ($portguid) = shift (@_);
+    my ($nodetype) = shift (@_);
+    my $rec = undef;
+    my $status = "";
+
+    $status = validate_non_zero_guid($lid, $portguid, $nodetype);
+    if ($status eq 0)
+    {
+       if (defined($insert_portguid::portguids{$portguid}))
+       {
+          print "PortGUID $portguid already defined for LID $insert_portguid::portguids{$portguid}->{lid}\n";
+       }
+       else
+       {
+          $rec = { lid => $lid, portguid => $portguid };
+          $insert_portguid::portguids{$portguid} = $rec;
+       }
+    }
+}
+
 sub main
 {
    if ($regenerate_map || !(-f "$IBswcountlimits::cache_dir/ibnetdiscover.topology")) { generate_ibnetdiscover_topology; }
@@ -146,19 +170,34 @@ sub main
    while ($line = <IBNET_TOPO>)
    {
 
-      if ($line =~ /^switchguid=(.*)/ || $line =~ /^caguid=(.*)/ || $line =~ /^rtguid=(.*)/)
+      if ($line =~ /^caguid=(.*)/ || $line =~ /^rtguid=(.*)/)
       {
          $nodeguid = $1;
          $nodetype = "";
       }
 
+      if ($line =~ /^switchguid=(.*)/)
+      {
+         $nodeguid = $1;
+         $portguid = "";
+         $nodetype = "";
+      }
+      if ($nodeguid =~ /^switchguid=(.*)\((.*)\)/)
+      {
+         $nodeguid = $1;
+         $portguid = $2;
+      }
+
       if ($line =~ /^Switch.*\"S-(.*)\"\s+# (.*) port.* lid (\d+) .*/)
       {
          $nodetype = "switch";
-         $portguid = $1;
          $lid = $3;
          insert_lid($lid, $nodeguid, $nodetype);
          insert_nodeguid($lid, $nodeguid, $nodetype);
+         if ($portguid ne "")
+         {
+            insert_portguid($lid, $portguid, $nodetype);
+         }
       }
       if ($line =~ /^Ca.*/)
       {
@@ -203,6 +242,11 @@ sub main
              $firstport = "no";
            }
         }
+        if ($line =~ /^\[(\d+)\]\((.*)\)/)
+        {
+           $portguid = $2;
+           insert_portguid($lid, $portguid, $nodetype);
+        }
       }
 
    }


From halr at voltaire.com  Tue Jun  5 11:46:17 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 05 Jun 2007 14:46:17 -0400
Subject: [ofa-general] [PATCH 1/2] infiniband-diags/ibnetdiscover: Add port
	GUIDs to topology file
Message-ID: <1181069175.12997.88621.camel@hal.voltaire.com>

infiniband-diags/ibnetdiscover: Add port GUIDs to topology file

Signed-off-by: Hal Rosenstock <halr at voltaire.com>

diff --git a/infiniband-diags/man/ibnetdiscover.8 b/infiniband-diags/man/ibnetdiscover.8
index 84f7a20..48291d5 100644
--- a/infiniband-diags/man/ibnetdiscover.8
+++ b/infiniband-diags/man/ibnetdiscover.8
@@ -101,9 +101,9 @@ attempted to be fulfilled, and will fail
 The topology file format is human readable and largely intuitive.
 Most identifiers are given textual names like vendor ID (vendid), device ID
 (device ID), GUIDs of various types (sysimgguid, caguid, switchguid, etc.).  
-The IB node is identified followed by the number of ports and a quoted string
-which contains the nodetype (S, H, R) followed by a - then followed by the
-node GUID. On the right of this line is a comment (#) followed by the
+PortGUIDs are shown in parentheses ().  For switches, this is shown on the
+switchguid line.  For CA and router ports, it is shown on the connectivity lines.  The IB node is identified followed by the number of ports and a quoted 
+the node GUID.  On the right of this line is a comment (#) followed by the
 NodeDescription in quotes.  If the node is a switch, this line also contains
 whether switch port 0 is base or enhanced, and the LID and LMC of port 0.
 Subsequent lines pertaining to this node show the connectivity.   On the 
@@ -121,7 +121,7 @@ output line.
 An example of this is:
 .nf
 #
-# Topology file: generated on Fri Jun  1 11:16:02 2007
+# Topology file: generated on Tue Jun  5 14:15:10 2007
 #
 # Max of 3 hops discovered
 # Initiated from node 0008f10403960558 port 0008f10403960559
@@ -131,20 +131,20 @@ Non-Chassis Nodes
 vendid=0x8f1
 devid=0x5a06
 sysimgguid=0x5442ba00003000
-switchguid=0x5442ba00003080
+switchguid=0x5442ba00003080(5442ba00003080)
 Switch  24 "S-005442ba00003080"         # "ISR9024 Voltaire" base port 0 lid 6 lmc 0
-[22]    "H-0008f10403961354"[1]         # "MT23108 InfiniHost Mellanox Technologies" lid 4 4xSDR
+[22]    "H-0008f10403961354"[1](8f10403961355)         # "MT23108 InfiniHost Mellanox Technologies" lid 4 4xSDR
 [10]    "S-0008f10400410015"[1]         # "SW-6IB4 Voltaire" lid 3 4xSDR
-[8]     "H-0008f10403960558"[2]         # "MT23108 InfiniHost Mellanox Technologies" lid 14 4xSDR
+[8]     "H-0008f10403960558"[2](8f1040396055a)         # "MT23108 InfiniHost Mellanox Technologies" lid 14 4xSDR
 [6]     "S-0008f10400410015"[3]         # "SW-6IB4 Voltaire" lid 3 4xSDR
-[12]    "H-0008f10403960558"[1]         # "MT23108 InfiniHost Mellanox Technologies" lid 10 4xSDR
+[12]    "H-0008f10403960558"[1](8f10403960559)         # "MT23108 InfiniHost Mellanox Technologies" lid 10 4xSDR
 
 vendid=0x8f1
 devid=0x5a05
-switchguid=0x8f10400410015
+switchguid=0x8f10400410015(8f10400410015)
 Switch  8 "S-0008f10400410015"          # "SW-6IB4 Voltaire" base port 0 lid 3 lmc 0
-[6]     "H-0008f10403960984"[1]         # "MT23108 InfiniHost Mellanox Technologies" lid 16 4xSDR
-[4]     "H-005442b100004900"[1]         # "MT23108 InfiniHost Mellanox Technologies" lid 12 4xSDR
+[6]     "H-0008f10403960984"[1](8f10403960985)         # "MT23108 InfiniHost Mellanox Technologies" lid 16 4xSDR
+[4]     "H-005442b100004900"[1](5442b100004901)        # "MT23108 InfiniHost Mellanox Technologies" lid 12 4xSDR
 [1]     "S-005442ba00003080"[10]                # "ISR9024 Voltaire" lid 6 1xSDR
 [3]     "S-005442ba00003080"[6]         # "ISR9024 Voltaire" lid 6 4xSDR
 
@@ -152,26 +152,26 @@ vendid=0x2c9
 devid=0x5a44
 caguid=0x8f10403960984
 Ca      2 "H-0008f10403960984"          # "MT23108 InfiniHost Mellanox Technologies"
-[1]     "S-0008f10400410015"[6]         # lid 16 lmc 1 "SW-6IB4 Voltaire" lid 3 4xSDR
+[1](8f10403960985)     "S-0008f10400410015"[6]         # lid 16 lmc 1 "SW-6IB4 Voltaire" lid 3 4xSDR
 
 vendid=0x2c9
 devid=0x5a44
 caguid=0x5442b100004900
 Ca      2 "H-005442b100004900"          # "MT23108 InfiniHost Mellanox Technologies"
-[1]     "S-0008f10400410015"[4]         # lid 12 lmc 1 "SW-6IB4 Voltaire" lid 3 4xSDR
+[1](5442b100004901)     "S-0008f10400410015"[4]         # lid 12 lmc 1 "SW-6IB4 Voltaire" lid 3 4xSDR
 
 vendid=0x2c9
 devid=0x5a44
 caguid=0x8f10403961354
 Ca      2 "H-0008f10403961354"          # "MT23108 InfiniHost Mellanox Technologies"
-[1]     "S-005442ba00003080"[22]                # lid 4 lmc 1 "ISR9024 Voltaire" lid 6 4xSDR
+[1](8f10403961355)     "S-005442ba00003080"[22]                # lid 4 lmc 1 "ISR9024 Voltaire" lid 6 4xSDR
 
 vendid=0x2c9
 devid=0x5a44
 caguid=0x8f10403960558
 Ca      2 "H-0008f10403960558"          # "MT23108 InfiniHost Mellanox Technologies"
-[2]     "S-005442ba00003080"[8]         # lid 14 lmc 1 "ISR9024 Voltaire" lid 6 4xSDR
-[1]     "S-005442ba00003080"[12]                # lid 10 lmc 1 "ISR9024 Voltaire" lid 6 1xSDR
+[2](8f1040396055a)     "S-005442ba00003080"[8]         # lid 14 lmc 1 "ISR9024 Voltaire" lid 6 4xSDR
+[1](8f10403960559)     "S-005442ba00003080"[12]                # lid 10 lmc 1 "ISR9024 Voltaire" lid 6 1xSDR
 .fi
 
 When grouping is used, IB nodes are organized into chasses which are
diff --git a/infiniband-diags/src/ibnetdiscover.c b/infiniband-diags/src/ibnetdiscover.c
index c08aa61..c321d59 100644
--- a/infiniband-diags/src/ibnetdiscover.c
+++ b/infiniband-diags/src/ibnetdiscover.c
@@ -46,7 +46,7 @@
 #include <errno.h>
 #include <inttypes.h>
 
-#define __BUILD_VERSION_TAG__ 1.2.3
+#define __BUILD_VERSION_TAG__ 1.2.4
 #include <common.h>
 #include <umad.h>
 #include <mad.h>
@@ -518,6 +518,7 @@ out_switch(Node *node, int group)
 
 	out_ids(node);
 	fprintf(f, "switchguid=0x%" PRIx64, node->nodeguid);
+	fprintf(f, "(%" PRIx64 ")", node->portguid);
 	if (group) {
 		if (node->chrecord) {
 			if (node->chrecord->chassisnum) {
@@ -617,6 +618,8 @@ out_switch_port(Port *port, int group)
 		node_name(port->remoteport->node),
 		port->remoteport->portnum,
 		ext_port_str ? ext_port_str : "");
+	if (port->remoteport->node->type != SWITCH_NODE)
+		fprintf(f, "(%" PRIx64 ") ", port->remoteport->portguid);
 	fprintf(f, "\t\t# \"%s\" lid %d %s%s\n",
 		rem_nodename,
 		port->remoteport->node->type == SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->lid,
@@ -634,12 +637,16 @@ out_ca_port(Port *port, int group)
 	char *rem_nodename = NULL;
 
 	fprintf(f, "[%d]", port->portnum);
+	if (port->node->type != SWITCH_NODE)
+		fprintf(f, "(%" PRIx64 ") ", port->portguid);
 	fprintf(f, "\t%s[%d]",
 		node_name(port->remoteport->node),
 		port->remoteport->portnum);
 	str = out_ext_port(port->remoteport, group);
 	if (str)
 		fprintf(f, "%s", str);
+	if (port->remoteport->node->type != SWITCH_NODE)
+		fprintf(f, " (%" PRIx64 ") ", port->remoteport->portguid);
 
 	if (port->remoteport->node->type == SWITCH_NODE)
 		rem_nodename = lookup_switch_name(switch_map_fp,


From mshefty at ichips.intel.com  Tue Jun  5 12:39:39 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Tue, 05 Jun 2007 12:39:39 -0700
Subject: [ofa-general] Re: [Query] ib add path record cache
In-Reply-To: <1180633333.7116.172147.camel@hal.voltaire.com>
References: <000101c7a3a9$876d8290$ff0da8c0@amr.corp.intel.com>
	<1180633333.7116.172147.camel@hal.voltaire.com>
Message-ID: <4665BBFB.4070007@ichips.intel.com>

> You'd need to use a vendor class 2 if you wanted to use RMPP as the SA
> does. However, there is some rearranging you would need to do if you
> compare the relevant MAD formats.

Reading into the spec more, it seems our current choice is limited to 
using a vendor class.  Application classes are controlled by the IBTA. 
Of the two vendor classes, class 2 clearly defines that RMPP is used, 
but also adds the OUI field to the MAD.  This throws off using the SA 
MAD class format.  I see a few possibilities:

Use vendor class 1:
There's no restriction on the MAD data.  This would allow us to match 
the SA MAD class format exactly.  The drawback is that we need to modify 
the MAD layer to identify the class as using RMPP.

Use vendor class 2:
Reading the spec, it looks like a reserved field in the MAD header is 
reserved even if using a vendor defined class.  If this is the proper 
interpretation, then we either need to shift the SA data down 4-8 bytes, 
or we drop the first 4 bytes of the SM_Key.

If we ever want to do more than simple path record caching, I think 
we'll want the full SM_Key.  Between the remaining choices, my 
preference would be to adapt a class 1 for our purpose.  Anyone else 
have thoughts on this?

- Sean


From mshefty at ichips.intel.com  Tue Jun  5 14:23:13 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Tue, 05 Jun 2007 14:23:13 -0700
Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557)
In-Reply-To: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com>
References: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com>
Message-ID: <4665D441.3060404@ichips.intel.com>

Vlad, can you please pull this change into OFED?

> next_port should be between sysctl_local_port_range[0] and [1].  However,
> it is initially set to a random value.  If the value is negative, next_port
> can fall outside of this range because of the % operator returning a
> negative value.
> 
> Signed-off-by: Sean Hefty <sean.hefty at intel.com>
> ---
> 
>  drivers/infiniband/core/cma.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index eb15119..b0831cb 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -2772,8 +2772,8 @@ static int cma_init(void)
>  	int ret;
>  
>  	get_random_bytes(&next_port, sizeof next_port);
> -	next_port = (next_port % (sysctl_local_port_range[1] -
> -				  sysctl_local_port_range[0])) +
> +	next_port = ((unsigned int) next_port %
> +		    (sysctl_local_port_range[1] - sysctl_local_port_range[0])) +
>  		    sysctl_local_port_range[0];
>  	cma_wq = create_singlethread_workqueue("rdma_cm");
>  	if (!cma_wq)
> 
> 


From vuhuong at mellanox.com  Tue Jun  5 14:36:42 2007
From: vuhuong at mellanox.com (Vu Pham)
Date: Tue, 05 Jun 2007 14:36:42 -0700
Subject: [ofa-general] OFED vs IB_GOLD IB_SRP Performance results
In-Reply-To: <4665A1FA.1000506@asc.hpc.mil>
References: <4665A1FA.1000506@asc.hpc.mil>
Message-ID: <4665D76A.8080705@mellanox.com>

Hi MAHMOUD,


> All,
> 
> I have been evaluating srp performance using a DDN IB attached storage. 
> I have looked at both OFED and IBGOLD. I have discovered some 
> interesting results. Although the write performance is consistent 
> between OFED and IBGOLD, the read performance is not. I looked at 
> various tuning setting but have been unable to improve read performance 
> of OFED. I am interested in getting feed back in regards to these results.
> 
> As you can see in this chart that the IBGD out performs OFED at the 
> larger record Lengths.
> READ CHART: http://www.clusteringsolutions.com/openib/Read.png
> 
> WRITE CHART: http://www.clusteringsolutions.com/openib/Write.png
> 
> 
> Test Setup:
> Test software: xdd using direct IO
> Sever: Dell 2950 4 core 8GB memory
> Storage: DDN S2A 9500
> LUN: 4 - 1 per tier (8+1) Blocksize = 4096
> IB: SDR cisco
> Fiber Channel = 4 Gb/sec Qlogic using qla2400 driver
> kernel tested:  2.6.9-42.0.10.ELsmp and 2.6.9-42.0.10.EL_lustre.1.4.10smp
> IB Stack: OFED1.1, OFED1.2, and IB_GOLD1.8.3
> 
> IBGOLD Setup:
> /sys/module/ib_srp/dlid_conf = 0
> /sys/module/ib_srp/fmr_cache = 0
> /sys/module/ib_srp/ib_ports_mask = -1
> /sys/module/ib_srp/max_cmds_per_lun = 1
> /sys/module/ib_srp/max_luns = 256
> /sys/module/ib_srp/max_srp_targets = 16
> /sys/module/ib_srp/max_xfer_sectors_per_io = 8192
> /sys/module/ib_srp/refcnt = 16
> /sys/module/ib_srp/service_str = <NULL>
> /sys/module/ib_srp/srp_discovery_timeout = 60
> /sys/module/ib_srp/srp_tracelevel = 2
> /sys/module/ib_srp/target_bindings = <NULL>
> 
> /sys/block/sdc/queue/max_hw_sectors_kb = 4096
> /sys/block/sdc/queue/max_sectors_kb = 4096
> /sys/block/sdc/queue/nr_requests = 8192
> /sys/block/sdc/queue/read_ahead_kb = 128
> 
> OFED Setup:
> /sys/module/ib_srp/mellanox_workarounds = 1
> /sys/module/ib_srp/refcnt = 11
> /sys/module/ib_srp/srp_sg_tablesize = 256
> /sys/module/ib_srp/topspin_workarounds = 1
> 
> /sys/block/sdd/queue/max_sectors_kb = 4096


For OFED drivers:
what is the max_cmd_per_lun? (default is = SRP_SQ_SIZE = 63)
You can set max_cmd_per_lun when adding target - please try 
1, 2, 4, 8...

You can check *cat /sys/class/scsi_host/hostXXX/cmd_per_lun*

Another tuning requires edit/recompile srp driver
+ vi ib_srp.h and change SRP_RQ_SHIFT to 7 --> this will 
increase .can_queue and send_wq/recv_wq to 128 --> this can 
be translate to the increase of queue_depth
+ recompile srp driver

-vu


From halr at voltaire.com  Tue Jun  5 14:37:51 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 05 Jun 2007 17:37:51 -0400
Subject: [ofa-general] Re: [Query] ib add path record cache
In-Reply-To: <4665BBFB.4070007@ichips.intel.com>
References: <000101c7a3a9$876d8290$ff0da8c0@amr.corp.intel.com>
	<1180633333.7116.172147.camel@hal.voltaire.com>
	<4665BBFB.4070007@ichips.intel.com>
Message-ID: <1181079457.12997.99729.camel@hal.voltaire.com>

On Tue, 2007-06-05 at 15:39, Sean Hefty wrote:
> > You'd need to use a vendor class 2 if you wanted to use RMPP as the SA
> > does. However, there is some rearranging you would need to do if you
> > compare the relevant MAD formats.
> 
> Reading into the spec more, it seems our current choice is limited to 
> using a vendor class.  Application classes are controlled by the IBTA. 

One could ask the IBTA for this if it is the right thing to do.

> Of the two vendor classes, class 2 clearly defines that RMPP is used, 
> but also adds the OUI field to the MAD.  This throws off using the SA 
> MAD class format.  I see a few possibilities:
> 
> Use vendor class 1:
> There's no restriction on the MAD data.  This would allow us to match 
> the SA MAD class format exactly.  The drawback is that we need to modify 
> the MAD layer to identify the class as using RMPP.

Are you saying to make the RMPP header as the first part of Data ?

Vendor class 1 are not RMPP MADs so I think this is nonconformant.
That's one reason vendor class 2 was added. In addition, there is no way
to detect one "vendor" from another "vendor" (which is why OUI was
added) if the same class is used so these need to be unique across all
vendors.

> Use vendor class 2:
> Reading the spec, it looks like a reserved field in the MAD header is 
> reserved even if using a vendor defined class.  If this is the proper 
> interpretation,

It is.

> then we either need to shift the SA data down 4-8 bytes, 
> or we drop the first 4 bytes of the SM_Key.

I don't think the weakening the SM_Key is acceptable.

> If we ever want to do more than simple path record caching, I think 
> we'll want the full SM_Key.  Between the remaining choices, my 
> preference would be to adapt a class 1 for our purpose.  Anyone else 
> have thoughts on this?

The only choice seems to me to be reformatting using vendor class 2 and
dealing with the data copying.

-- Hal

> - Sean


From pourreza at cs.umanitoba.ca  Tue Jun  5 15:04:28 2007
From: pourreza at cs.umanitoba.ca (Hossein Pourreza)
Date: Tue, 5 Jun 2007 17:04:28 -0500
Subject: [ofa-general] Installing openIB on Linux FC5
Message-ID: <20070605220428.GA15154@helium-01.cs.umanitoba.ca>

Hi all,

I am new to infiniband stuff and am trying to configure an infiniband-based
cluster using Linux FC 5. I downloaded the OFED-1.0 and tried to install it on
cluster nodes. Now I can load the kernel modules without any error but I cannot
run a simple test like ibv_ud_pingpong to check the connectivity of nodes in
user-level.

I loaded the following devices:

ib_umad                25713  0 
ib_ucm                 26569  0 
ib_cm                  42521  1 ib_ucm
ib_uverbs              47889  1 ib_ucm
ib_mthca              133445  0 
ib_ipoib               61361  0 
ib_sa                  25341  2 ib_cm,ib_ipoib
ib_mad                 46969  4 ib_umad,ib_cm,ib_mthca,ib_sa
ib_core                63809  8 ib_umad,ib_ucm,ib_cm,ib_uverbs,ib_mthca,ib_ipoib,ib_sa,ib_mad

Also I have the following devices in /dev/infiniband

crw-rw---- 1 root root 231,  64 Jun  4 14:54 issm0
crw-rw---- 1 root root 231,  65 Jun  4 14:54 issm1
crw-rw---- 1 root root 231, 224 Jun  4 14:34 ucm0
crw-rw---- 1 root root 231,   0 Jun  4 14:54 umad0
crw-rw---- 1 root root 231,   1 Jun  4 14:54 umad1
crw-rw-rw- 1 root root 231, 192 Jun  4 14:34 uverbs0

ibroute shows all nodes and the switch and everything looks fine.
When I run ibv_ud_pingpong on the two nodes I am getting the following messages:

node 1 (server):
local address:  LID 0x0002, QPN 0x150406, PSN 0xb3a00d
remote address: LID 0x0003, QPN 0x0c0406, PSN 0x8f0f99


node 2 (client):
local address:  LID 0x0003, QPN 0x0c0406, PSN 0x8f0f99
remote address: LID 0x0002, QPN 0x150406, PSN 0xb3a00d

There is no message after these two lines. I am wondering if they are sending
any packets or not. I should say that although I have given ip addresses to
infiniband cards (ib0) they cannot ping each other using the normal Linux ping
tool.

Here is the result of ifconfig on these nodes:

node 1 (server)

ib0       Link encap:InfiniBand  HWaddr 00:00:04:04:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00  
          inet addr:172.16.28.61  Bcast:172.16.255.255  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:2044  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:128 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

node 2 (client):
ib0       Link encap:InfiniBand  HWaddr 00:00:04:04:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00  
          inet addr:172.16.28.62  Bcast:172.16.255.255  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:2044  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:128 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Any help will be greatly appreciated.

Hossein

-- 
Hossein Pourreza		 			mail:<pourreza at cs.umanitoba.ca>    
Department of Computer Science		URL: http://www.cs.umanitoba.ca/~pourreza
University of Manitoba  			Phone: 204-488-5611            
Winnipeg, Manitoba, Canada R3T 2N2


From sashak at voltaire.com  Tue Jun  5 17:04:41 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 6 Jun 2007 03:04:41 +0300
Subject: [ofa-general] [PATCH] libibmad: add notice DataDetails fields
Message-ID: <20070606000441.GH10519@sashak.voltaire.com>


This adds notice DataDetails fileds - generic one (as big array)
and Trap144 specific fields.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 libibmad/include/infiniband/mad.h |    3 +++
 libibmad/src/fields.c             |    3 +++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/libibmad/include/infiniband/mad.h b/libibmad/include/infiniband/mad.h
index f01880b..ed286a9 100644
--- a/libibmad/include/infiniband/mad.h
+++ b/libibmad/include/infiniband/mad.h
@@ -382,7 +382,10 @@ enum MAD_FIELDS {
 	IB_NOTICE_ISSUER_LID_F,
 	IB_NOTICE_TOGGLE_F,
 	IB_NOTICE_COUNT_F,
+	IB_NOTICE_DATA_DETAILS_F,
 	IB_NOTICE_DATA_LID_F,
+	IB_NOTICE_DATA_144_LID_F,
+	IB_NOTICE_DATA_144_CAPMASK_F,
 
 	/*
 	 * GS Performance
diff --git a/libibmad/src/fields.c b/libibmad/src/fields.c
index c453e06..18dc05b 100644
--- a/libibmad/src/fields.c
+++ b/libibmad/src/fields.c
@@ -216,7 +216,10 @@ ib_field_t ib_mad_f [] = {
 	[IB_NOTICE_ISSUER_LID_F]  	{BITSOFFS(48, 16), "NoticeIssuerLID", mad_dump_uint},
 	[IB_NOTICE_TOGGLE_F]      	{BITSOFFS(64, 1), "NoticeToggle", mad_dump_uint},
 	[IB_NOTICE_COUNT_F]       	{BITSOFFS(65, 15), "NoticeCount", mad_dump_uint},
+	[IB_NOTICE_DATA_DETAILS_F]    	{80, 432, "NoticeDataDetails", mad_dump_array},
 	[IB_NOTICE_DATA_LID_F]    	{BITSOFFS(80, 16), "NoticeDataLID", mad_dump_uint},
+	[IB_NOTICE_DATA_144_LID_F]    	{BITSOFFS(96, 16), "NoticeDataTrap144LID", mad_dump_uint},
+	[IB_NOTICE_DATA_144_CAPMASK_F]  {BITSOFFS(128, 32), "NoticeDataTrap144CapMask", mad_dump_uint},
 
 	/*
 	 * NodeDescription fields:
-- 
1.5.2.136.g322bc


From halr at voltaire.com  Tue Jun  5 17:27:48 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 05 Jun 2007 20:27:48 -0400
Subject: [ofa-general] Re: [PATCH] libibmad: add notice DataDetails fields
In-Reply-To: <20070606000441.GH10519@sashak.voltaire.com>
References: <20070606000441.GH10519@sashak.voltaire.com>
Message-ID: <1181089666.12997.110779.camel@hal.voltaire.com>

On Tue, 2007-06-05 at 20:04, Sasha Khapyorsky wrote:
> This adds notice DataDetails fileds - generic one (as big array)
> and Trap144 specific fields.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Thanks. Applied.

-- Hal


From sean.hefty at intel.com  Tue Jun  5 21:06:30 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 5 Jun 2007 21:06:30 -0700
Subject: [ofa-general] Re: [Query] ib add path record cache
In-Reply-To: <1181079457.12997.99729.camel@hal.voltaire.com>
Message-ID: <000f01c7a7f0$1067dba0$11c8180a@amr.corp.intel.com>

>One could ask the IBTA for this if it is the right thing to do.

Checking with the IBTA makes sense.  Longer term, adding a distributed SA
application class, or expanding the existing SA class may be useful, if the IBTA
wants to define SA implementation at this level of detail.  However, I was
trying to focus on what could be done now.  If the IBTA would like to
standardize the communication, that'd be great.

One issue that isn't clear to me is what exactly is meant by the statement:
"Vendor-specific classes will never be used to define management operations that
are encompassed by the Infiniband Architecture."  For example, suppose that
there were a small number of SA caches available in the subnet.  Is it compliant
for a node to issue a PR query to one of the caches using a vendor-defined PR
query?  Or must this be done using an SA PR query with possible redirection?

>Are you saying to make the RMPP header as the first part of Data ?

Yes.

>Vendor class 1 are not RMPP MADs so I think this is nonconformant.

I didn't see any restriction on the vendor class 1 data - at least in section
16.5.  If I'm mistaken on this, then I agree that vendor class 2 seems to be our
only current option.

>That's one reason vendor class 2 was added. In addition, there is no way
>to detect one "vendor" from another "vendor" (which is why OUI was
>added) if the same class is used so these need to be unique across all
>vendors.

Yes - all vendor class 1 MADs suffer from this issue.  In practice, it seems
that there can only be a single vendor for a given class on a subnet.

>The only choice seems to me to be reformatting using vendor class 2 and
>dealing with the data copying.

>From an implementation viewpoint, this just seems less desirable.  Adding the
offset means that single-segment SA MAD may become our multi-segment vendor MAD,
and dealing with two MAD formats will be troublesome.  If we're only caching
PRs, this may not be a big deal, but if we ever want to create a truly
distributed SA, I think it will be.

- Sean


From tziporet at dev.mellanox.co.il  Tue Jun  5 22:55:11 2007
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Wed, 06 Jun 2007 08:55:11 +0300
Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557)
In-Reply-To: <4665D441.3060404@ichips.intel.com>
References: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com>
	<4665D441.3060404@ichips.intel.com>
Message-ID: <46664C3F.9050208@mellanox.co.il>

Sean Hefty wrote:
> Vlad, can you please pull this change into OFED?
>
Approved

Tziporet

>> next_port should be between sysctl_local_port_range[0] and [1].  
>> However,
>> it is initially set to a random value.  If the value is negative, 
>> next_port
>> can fall outside of this range because of the % operator returning a
>> negative value.
>>
>> Signed-off-by: Sean Hefty <sean.hefty at intel.com>
>> ---
>>
>>  drivers/infiniband/core/cma.c |    4 ++--
>>  1 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/cma.c 
>> b/drivers/infiniband/core/cma.c
>> index eb15119..b0831cb 100644
>> --- a/drivers/infiniband/core/cma.c
>> +++ b/drivers/infiniband/core/cma.c
>> @@ -2772,8 +2772,8 @@ static int cma_init(void)
>>      int ret;
>>  
>>      get_random_bytes(&next_port, sizeof next_port);
>> -    next_port = (next_port % (sysctl_local_port_range[1] -
>> -                  sysctl_local_port_range[0])) +
>> +    next_port = ((unsigned int) next_port %
>> +            (sysctl_local_port_range[1] - 
>> sysctl_local_port_range[0])) +
>>              sysctl_local_port_range[0];
>>      cma_wq = create_singlethread_workqueue("rdma_cm");
>>      if (!cma_wq)
>>
>>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
>


From tziporet at dev.mellanox.co.il  Tue Jun  5 23:09:53 2007
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Wed, 06 Jun 2007 09:09:53 +0300
Subject: [ofa-general] Installing openIB on Linux FC5
In-Reply-To: <20070605220428.GA15154@helium-01.cs.umanitoba.ca>
References: <20070605220428.GA15154@helium-01.cs.umanitoba.ca>
Message-ID: <46664FB1.6070402@mellanox.co.il>

Hossein Pourreza wrote:
> Hi all,
>
> I am new to infiniband stuff and am trying to configure an infiniband-based
> cluster using Linux FC 5. I downloaded the OFED-1.0 and tried to install it on
> cluster nodes. Now I can load the kernel modules without any error but I cannot
> run a simple test like ibv_ud_pingpong to check the connectivity of nodes in
> user-level.
>
>
>   
Have you run opensm?
You can run ibstat on each node to see ports are active

Tziporet


From yosefe at voltaire.com  Tue Jun  5 23:11:48 2007
From: yosefe at voltaire.com (Yosef Etigin)
Date: Wed, 06 Jun 2007 09:11:48 +0300
Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557)
In-Reply-To: <000001c7a790$9f44e490$3c98070a@amr.corp.intel.com>
References: <000001c7a790$9f44e490$3c98070a@amr.corp.intel.com>
Message-ID: <46665024.8050502@voltaire.com>

Sean Hefty wrote:
>>I haven't followed this closely, but what's the impact of this bug?
>>It seems it would result in a port out of the configured range being
>>used.  Which seems serious enough to fix for 2.6.22 to me.
> 
> 
> It can result in a port outside of the configured range, and its occurrence
> depends on the compiler used.  I've pushed my patch to:
> 
> 	git://git.openfabrics.org/~shefty/rdma-dev.git for-roland
> 
> which is based on 2.6.22-rc4.  Yosef, can you confirm that this patch works for
> you?
> 
> - Sean

Yes, it works.
Maybe we ahould use another variable ("unsigned coins;") to generate the random
bytes to, so next_port will not be used for two different purposes?

--Yossi


From vlad at mellanox.co.il  Tue Jun  5 23:24:01 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Wed, 06 Jun 2007 09:24:01 +0300
Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557)
In-Reply-To: <46664C3F.9050208@mellanox.co.il>
References: <008201c7a707$1d7a9150$ff0da8c0@amr.corp.intel.com>
	<4665D441.3060404@ichips.intel.com> <46664C3F.9050208@mellanox.co.il>
Message-ID: <1181111041.1114.23.camel@vladsk-laptop>

Done,
Added kernel_patches/fixes/sean_cma_next_port_fix.patch

Regards,
Vladimir

On Wed, 2007-06-06 at 08:55 +0300, Tziporet Koren wrote:
> Sean Hefty wrote:
> > Vlad, can you please pull this change into OFED?
> >
> Approved
> 
> Tziporet
> 
> >> next_port should be between sysctl_local_port_range[0] and [1].  
> >> However,
> >> it is initially set to a random value.  If the value is negative, 
> >> next_port
> >> can fall outside of this range because of the % operator returning a
> >> negative value.
> >>
> >> Signed-off-by: Sean Hefty <sean.hefty at intel.com>
> >> ---
> >>
> >>  drivers/infiniband/core/cma.c |    4 ++--
> >>  1 files changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/core/cma.c 
> >> b/drivers/infiniband/core/cma.c
> >> index eb15119..b0831cb 100644
> >> --- a/drivers/infiniband/core/cma.c
> >> +++ b/drivers/infiniband/core/cma.c
> >> @@ -2772,8 +2772,8 @@ static int cma_init(void)
> >>      int ret;
> >>  
> >>      get_random_bytes(&next_port, sizeof next_port);
> >> -    next_port = (next_port % (sysctl_local_port_range[1] -
> >> -                  sysctl_local_port_range[0])) +
> >> +    next_port = ((unsigned int) next_port %
> >> +            (sysctl_local_port_range[1] - 
> >> sysctl_local_port_range[0])) +
> >>              sysctl_local_port_range[0];
> >>      cma_wq = create_singlethread_workqueue("rdma_cm");
> >>      if (!cma_wq)
> >>
> >>
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
> >
> 


From vlad at lists.openfabrics.org  Wed Jun  6 02:43:14 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Wed,  6 Jun 2007 02:43:14 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070606-0200 daily build status
Message-ID: <20070606094314.180BFE60824@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.12
Passed on x86_64 with linux-2.6.20
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.12
Passed on x86_64 with linux-2.6.13
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.17
Passed on ppc64 with linux-2.6.18
Passed on ia64 with linux-2.6.18
Passed on ppc64 with linux-2.6.12
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on powerpc with linux-2.6.17
Passed on ia64 with linux-2.6.16
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.17
Passed on ia64 with linux-2.6.15
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.15
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.15
Passed on x86_64 with linux-2.6.18
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.15
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.14
Passed on ppc64 with linux-2.6.13
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.12
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From halr at voltaire.com  Wed Jun  6 02:45:12 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Jun 2007 05:45:12 -0400
Subject: [ofa-general] Re: [Query] ib add path record cache
In-Reply-To: <000f01c7a7f0$1067dba0$11c8180a@amr.corp.intel.com>
References: <000f01c7a7f0$1067dba0$11c8180a@amr.corp.intel.com>
Message-ID: <1181123111.12997.147451.camel@hal.voltaire.com>

On Wed, 2007-06-06 at 00:06, Sean Hefty wrote:
> >One could ask the IBTA for this if it is the right thing to do.
> 
> Checking with the IBTA makes sense.  Longer term, adding a distributed SA
> application class, or expanding the existing SA class may be useful, if the IBTA
> wants to define SA implementation at this level of detail.  However, I was
> trying to focus on what could be done now.  If the IBTA would like to
> standardize the communication, that'd be great.

> One issue that isn't clear to me is what exactly is meant by the statement:
> "Vendor-specific classes will never be used to define management operations that
> are encompassed by the Infiniband Architecture."

I'm not sure pf the intent of this but that is informative rather than
normative (compliance) text.

> For example, suppose that
> there were a small number of SA caches available in the subnet.  Is it compliant
> for a node to issue a PR query to one of the caches using a vendor-defined PR
> query?  Or must this be done using an SA PR query with possible redirection?

I think this example falls would fall "on the line" and seems somewhat
debatable as to whether there is a management operation for this or not.
It does go back to the intent of the original statement you cited.

> >Are you saying to make the RMPP header as the first part of Data ?
> 
> Yes.
> 
> >Vendor class 1 are not RMPP MADs so I think this is nonconformant.
> 
> I didn't see any restriction on the vendor class 1 data - at least in section
> 16.5.

True but I'm not sure that was the intent which again was why vendor
class 2 was created. Also, there is the problem of knowing that this
vendor class 1 is using RMPP. That sounds proprietary to me (and affects
the kernel in the OpenIB implementation).

>   If I'm mistaken on this, then I agree that vendor class 2 seems to be our
> only current option.
> 
> >That's one reason vendor class 2 was added. In addition, there is no way
> >to detect one "vendor" from another "vendor" (which is why OUI was
> >added) if the same class is used so these need to be unique across all
> >vendors.
> 
> Yes - all vendor class 1 MADs suffer from this issue.  In practice, it seems
> that there can only be a single vendor for a given class on a subnet.

That's one way of putting it but limits the use; in fact, if this were
done, all subnets would use at least two different vendors. Another way
is that all vendors who want to use this class range need to coordinate
such use (e.g. class allocation).

> >The only choice seems to me to be reformatting using vendor class 2 and
> >dealing with the data copying.
> 
> >From an implementation viewpoint, this just seems less desirable.  Adding the
> offset means that single-segment SA MAD may become our multi-segment vendor MAD,
> and dealing with two MAD formats will be troublesome.  If we're only caching
> PRs, this may not be a big deal, but if we ever want to create a truly
> distributed SA, I think it will be.

Are you referring to the performance hit ?

-- Hal

> - Sean


From kliteyn at dev.mellanox.co.il  Wed Jun  6 05:44:58 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Wed, 06 Jun 2007 15:44:58 +0300
Subject: [ofa-general] [PATCH] osm: fixing broken compilation when osm_vendor
	is simulator
Message-ID: <4666AC4A.6020103@dev.mellanox.co.il>

Hi Hal,

The compilation of OpenSM with vendor=sim has been broken by the 
recent PerfManager patch. 
Adding missing include in the vendor's header.

-- Yevgeny

Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 opensm/include/vendor/osm_vendor_mlx.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/opensm/include/vendor/osm_vendor_mlx.h b/opensm/include/vendor/osm_vendor_mlx.h
index f220cc3..b3794cd 100644
--- a/opensm/include/vendor/osm_vendor_mlx.h
+++ b/opensm/include/vendor/osm_vendor_mlx.h
@@ -36,6 +36,7 @@
 #ifndef _OSMV_H_
 #define _OSMV_H_
 
+#include <sys/types.h>
 #include <opensm/osm_log.h>
 #include <complib/cl_qlist.h>
 
-- 
1.5.1.4


From eli at mellanox.co.il  Wed Jun  6 05:40:19 2007
From: eli at mellanox.co.il (Eli Cohen)
Date: Wed, 06 Jun 2007 15:40:19 +0300
Subject: [ofa-general] [PATCH 1/2] libmlx4: fix SRQ buffer allocation
Message-ID: <1181133649.10841.64.camel@mtls03>

Roland,
this patch and the complementary subsequent patch were not actually
checked since the version I was working against is different than you
"for-2.6.22" branch. But I did check this on against our build and it
seems to work.

Fix receive buffer allocation for SRQ QPs.

Signed-off-by: Eli Cohen <eli at mellanox.co.il>

---

diff --git a/src/verbs.c b/src/verbs.c
index 1feae9d..b800eb2 100644
--- a/src/verbs.c
+++ b/src/verbs.c
@@ -373,6 +373,13 @@ struct ibv_qp *mlx4_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr)
 		return NULL;
 
 	qp->sq.max = align_queue_size(pd->context, attr->cap.max_send_wr, 0);
+
+	if (attr->srq)
+		attr->cap.max_recv_wr = 0;
+	else
+		attr->cap.max_recv_wr = attr->cap.max_recv_wr ?
+			attr->cap.max_recv_wr : 1;
+
 	qp->rq.max = align_queue_size(pd->context, attr->cap.max_recv_wr, 0);
 
 	if (mlx4_alloc_qp_buf(pd, &attr->cap, attr->qp_type, qp))


From eli at mellanox.co.il  Wed Jun  6 05:40:21 2007
From: eli at mellanox.co.il (Eli Cohen)
Date: Wed, 06 Jun 2007 15:40:21 +0300
Subject: [ofa-general] [PATCH 2/2] IB/mlx4_ib: fix SRQ buffer allocation
Message-ID: <1181133679.10841.66.camel@mtls03>

Fix receive buffer allocation for SRQ QPs.
Add checks to validate HW requirements when configuring
QPs.

Signed-off-by: Eli Cohen <eli at mellanox.co.il>

---

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index dc137de..0117cf9 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -188,14 +188,27 @@ static int send_wqe_overhead(enum ib_qp_type type)
 	}
 }
 
-static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
-		       struct mlx4_ib_qp *qp)
+static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *init_attr,
+		       struct mlx4_ib_qp *qp, int kernel)
 {
+	struct ib_qp_cap *cap = &init_attr->cap;
+
 	/* Sanity check RQ size before proceeding */
 	if (cap->max_recv_wr  > dev->dev->caps.max_wqes  ||
 	    cap->max_recv_sge > dev->dev->caps.max_rq_sg)
 		return -EINVAL;
 
+	if (init_attr->srq) {
+		if (cap->max_recv_wr)
+			return -EINVAL;
+	}
+	else if (!cap->max_recv_wr) {
+		if (kernel)
+			cap->max_recv_wr = 1;
+		else
+			return -EINVAL;
+	}
+
 	qp->rq.max = cap->max_recv_wr ? roundup_pow_of_two(cap->max_recv_wr) : 0;
 
 	qp->rq.wqe_shift = ilog2(roundup_pow_of_two(cap->max_recv_sge *
@@ -257,6 +270,10 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 static int set_user_sq_size(struct mlx4_ib_qp *qp,
 			    struct mlx4_ib_create_qp *ucmd)
 {
+	/* Sanity check for SQ size */
+	if (ucmd->log_sq_bb_count > 15 || ucmd->log_sq_stride > 11)
+		return -EINVAL;
+
 	qp->sq.max       = 1 << ucmd->log_sq_bb_count;
 	qp->sq.wqe_shift = ucmd->log_sq_stride;
 
@@ -285,7 +302,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 	qp->sq.head	    = 0;
 	qp->sq.tail	    = 0;
 
-	err = set_rq_size(dev, &init_attr->cap, qp);
+	err = set_rq_size(dev, init_attr, qp, pd->uobject ? 0 : 1);
 	if (err)
 		goto err;
 
@@ -733,9 +750,10 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 		context->mtu_msgmax = (attr->path_mtu << 5) | 31;
 	}
 
-	if (qp->rq.max)
+	if (qp->rq.max) {
 		context->rq_size_stride = ilog2(qp->rq.max) << 3;
-	context->rq_size_stride |= qp->rq.wqe_shift - 4;
+		context->rq_size_stride |= qp->rq.wqe_shift - 4;
+	}
 
 	if (qp->sq.max)
 		context->sq_size_stride = ilog2(qp->sq.max) << 3;


From halr at voltaire.com  Wed Jun  6 05:59:14 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Jun 2007 08:59:14 -0400
Subject: [ofa-general] Re: [PATCH] osm: fixing broken compilation when
	osm_vendor is simulator
In-Reply-To: <4666AC4A.6020103@dev.mellanox.co.il>
References: <4666AC4A.6020103@dev.mellanox.co.il>
Message-ID: <1181134754.12997.159939.camel@hal.voltaire.com>

Hi Yevgeny,

On Wed, 2007-06-06 at 08:44, Yevgeny Kliteynik wrote:
> Hi Hal,
> 
> The compilation of OpenSM with vendor=sim has been broken by the 
> recent PerfManager patch. 
> Adding missing include in the vendor's header.
> 
> -- Yevgeny
> 
> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Thanks. Applied.

-- Hal


From pourreza at cs.umanitoba.ca  Wed Jun  6 07:08:05 2007
From: pourreza at cs.umanitoba.ca (Hossein Pourreza)
Date: Wed, 6 Jun 2007 09:08:05 -0500
Subject: [ofa-general] Installing openIB on Linux FC5
In-Reply-To: <46664FB1.6070402@mellanox.co.il>
References: <20070605220428.GA15154@helium-01.cs.umanitoba.ca>
	<46664FB1.6070402@mellanox.co.il>
Message-ID: <20070606140805.GA10814@finch.cs.umanitoba.ca>

Hi,

Many thanks for your reply. I really appreciate that.

Our cluster uses Mellanox Technologies MT23108 InfiniHost (rev a1) and Sun 9P switch. 
Out switch has its own SubnetManager and whenever I try to run opensm, I get an error 
saying that there is another sm running with a mismatch key.

The result of running ibstat is like this:

		CA type: MT23108
        Number of ports: 2
        Firmware version: 3.3.2
        Hardware version: a1
        Node GUID: 0x0003ba0001001788
        System image GUID: 0x0003ba000100178b
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 10
                Base lid: 2
                LMC: 0
                SM lid: 1
                Capability mask: 0x00510a68
                Port GUID: 0x0003ba0001001789
		Port 2:
                State: Down
                Physical state: Polling
                Rate: 2
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x00510a68
                Port GUID: 0x0003ba000100178a

Is there anything wrong with this output?


Many thanks for your kind help
Hossein
On Wed, Jun 06, 2007 at 09:09:53AM +0300, Tziporet Koren wrote:
> Hossein Pourreza wrote:
> >Hi all,
> >
> >I am new to infiniband stuff and am trying to configure an infiniband-based
> >cluster using Linux FC 5. I downloaded the OFED-1.0 and tried to install 
> >it on
> >cluster nodes. Now I can load the kernel modules without any error but I 
> >cannot
> >run a simple test like ibv_ud_pingpong to check the connectivity of nodes 
> >in
> >user-level.
> >
> >
> >  
> Have you run opensm?
> You can run ibstat on each node to see ports are active
> 
> Tziporet

-- 
Hossein Pourreza		 			mail:<pourreza at cs.umanitoba.ca>    
Department of Computer Science		URL: http://www.cs.umanitoba.ca/~pourreza
University of Manitoba  			Phone: 204-488-5611            
Winnipeg, Manitoba, Canada R3T 2N2


From halr at voltaire.com  Wed Jun  6 07:21:31 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Jun 2007 10:21:31 -0400
Subject: [ofa-general] Installing openIB on Linux FC5
In-Reply-To: <20070606140805.GA10814@finch.cs.umanitoba.ca>
References: <20070605220428.GA15154@helium-01.cs.umanitoba.ca>
	<46664FB1.6070402@mellanox.co.il>
	<20070606140805.GA10814@finch.cs.umanitoba.ca>
Message-ID: <1181139682.12997.165263.camel@hal.voltaire.com>

On Wed, 2007-06-06 at 10:08, Hossein Pourreza wrote:
> Hi,
> 
> Many thanks for your reply. I really appreciate that.
> 
> Our cluster uses Mellanox Technologies MT23108 InfiniHost (rev a1) and Sun 9P switch. 
> Out switch has its own SubnetManager and whenever I try to run opensm, I get an error 
> saying that there is another sm running with a mismatch key.
> 
> The result of running ibstat is like this:
> 
> 		CA type: MT23108
>         Number of ports: 2
>         Firmware version: 3.3.2
>         Hardware version: a1
>         Node GUID: 0x0003ba0001001788
>         System image GUID: 0x0003ba000100178b
>         Port 1:
>                 State: Active
>                 Physical state: LinkUp
>                 Rate: 10
>                 Base lid: 2
>                 LMC: 0
>                 SM lid: 1
>                 Capability mask: 0x00510a68
>                 Port GUID: 0x0003ba0001001789
> 		Port 2:
>                 State: Down
>                 Physical state: Polling
>                 Rate: 2
>                 Base lid: 0
>                 LMC: 0
>                 SM lid: 0
>                 Capability mask: 0x00510a68
>                 Port GUID: 0x0003ba000100178a
> 
> Is there anything wrong with this output?

Nothing wrong with the output :-) but is your port connected ? It
appears there is some connectivity problem as Physical state is not
LinkUp (and hence State is  Down) so SM cannot configure it.

-- Hal

> Many thanks for your kind help
> Hossein
> On Wed, Jun 06, 2007 at 09:09:53AM +0300, Tziporet Koren wrote:
> > Hossein Pourreza wrote:
> > >Hi all,
> > >
> > >I am new to infiniband stuff and am trying to configure an infiniband-based
> > >cluster using Linux FC 5. I downloaded the OFED-1.0 and tried to install 
> > >it on
> > >cluster nodes. Now I can load the kernel modules without any error but I 
> > >cannot
> > >run a simple test like ibv_ud_pingpong to check the connectivity of nodes 
> > >in
> > >user-level.
> > >
> > >
> > >  
> > Have you run opensm?
> > You can run ibstat on each node to see ports are active
> > 
> > Tziporet


From chas at cmf.nrl.navy.mil  Wed Jun  6 07:29:51 2007
From: chas at cmf.nrl.navy.mil (chas williams - CONTRACTOR)
Date: Wed, 06 Jun 2007 10:29:51 -0400
Subject: [ofa-general] OFED vs IB_GOLD IB_SRP Performance results 
In-Reply-To: <4665A1FA.1000506@asc.hpc.mil> 
Message-ID: <200706061429.l56ETp4n012017@cmf.nrl.navy.mil>

In message <4665A1FA.1000506 at asc.hpc.mil>,MAHMOUD HANAFI writes:
>OFED Setup:
>/sys/module/ib_srp/mellanox_workarounds = 1
>/sys/module/ib_srp/refcnt = 11
>/sys/module/ib_srp/srp_sg_tablesize = 256
>/sys/module/ib_srp/topspin_workarounds = 1
>
>/sys/block/sdd/queue/max_sectors_kb = 4096
>/sys/block/sdd/queue/nr_requests = 8192
>/sys/block/sdd/queue/read_ahead_kb = 128

what is the max_hw_sectors_kb for the ofed target?  unless you specified
max_sect= during login, i suspect you are getting the system defaults.
typically this is 512 sectors i think, which is where your performance
seems to start to diverge.


From pourreza at cs.umanitoba.ca  Wed Jun  6 07:45:57 2007
From: pourreza at cs.umanitoba.ca (Hossein Pourreza)
Date: Wed, 6 Jun 2007 09:45:57 -0500
Subject: [ofa-general] Installing openIB on Linux FC5
In-Reply-To: <1181139682.12997.165263.camel@hal.voltaire.com>
References: <20070605220428.GA15154@helium-01.cs.umanitoba.ca>
	<46664FB1.6070402@mellanox.co.il>
	<20070606140805.GA10814@finch.cs.umanitoba.ca>
	<1181139682.12997.165263.camel@hal.voltaire.com>
Message-ID: <20070606144557.GA11324@finch.cs.umanitoba.ca>

On Wed, Jun 06, 2007 at 10:21:31AM -0400, Hal Rosenstock wrote:
> On Wed, 2007-06-06 at 10:08, Hossein Pourreza wrote:
> > Hi,
> > 
> > Many thanks for your reply. I really appreciate that.
> > 
> > Our cluster uses Mellanox Technologies MT23108 InfiniHost (rev a1) and Sun 9P switch. 
> > Out switch has its own SubnetManager and whenever I try to run opensm, I get an error 
> > saying that there is another sm running with a mismatch key.
> > 
> > The result of running ibstat is like this:
> > 
> > 		CA type: MT23108
> >         Number of ports: 2
> >         Firmware version: 3.3.2
> >         Hardware version: a1
> >         Node GUID: 0x0003ba0001001788
> >         System image GUID: 0x0003ba000100178b
> >         Port 1:
> >                 State: Active
> >                 Physical state: LinkUp
> >                 Rate: 10
> >                 Base lid: 2
> >                 LMC: 0
> >                 SM lid: 1
> >                 Capability mask: 0x00510a68
> >                 Port GUID: 0x0003ba0001001789
> > 		Port 2:
> >                 State: Down
> >                 Physical state: Polling
> >                 Rate: 2
> >                 Base lid: 0
> >                 LMC: 0
> >                 SM lid: 0
> >                 Capability mask: 0x00510a68
> >                 Port GUID: 0x0003ba000100178a
> > 
> > Is there anything wrong with this output?
> 
> Nothing wrong with the output :-) but is your port connected ? It
> appears there is some connectivity problem as Physical state is not
> LinkUp (and hence State is  Down) so SM cannot configure it.

I only use port 1 of each HCA and I just connected those to the switch. Should I
connect both ports? There are only 9 ports available on our switch and we have 5
nodes (10 ports in total).

Thanks again for all you help
Hossein


> 
> -- Hal
> 
> > Many thanks for your kind help
> > Hossein
> > On Wed, Jun 06, 2007 at 09:09:53AM +0300, Tziporet Koren wrote:
> > > Hossein Pourreza wrote:
> > > >Hi all,
> > > >
> > > >I am new to infiniband stuff and am trying to configure an infiniband-based
> > > >cluster using Linux FC 5. I downloaded the OFED-1.0 and tried to install 
> > > >it on
> > > >cluster nodes. Now I can load the kernel modules without any error but I 
> > > >cannot
> > > >run a simple test like ibv_ud_pingpong to check the connectivity of nodes 
> > > >in
> > > >user-level.
> > > >
> > > >
> > > >  
> > > Have you run opensm?
> > > You can run ibstat on each node to see ports are active
> > > 
> > > Tziporet

-- 
Hossein Pourreza		 			mail:<pourreza at cs.umanitoba.ca>    
Department of Computer Science		URL: http://www.cs.umanitoba.ca/~pourreza
University of Manitoba  			Phone: 204-488-5611            
Winnipeg, Manitoba, Canada R3T 2N2


From minich at ornl.gov  Wed Jun  6 07:53:14 2007
From: minich at ornl.gov (Makia Minich)
Date: Wed, 06 Jun 2007 10:53:14 -0400
Subject: [ofa-general] Installing openIB on Linux FC5
In-Reply-To: <20070606144557.GA11324@finch.cs.umanitoba.ca>
References: <20070605220428.GA15154@helium-01.cs.umanitoba.ca>
	<1181139682.12997.165263.camel@hal.voltaire.com>
	<20070606144557.GA11324@finch.cs.umanitoba.ca>
Message-ID: <200706061053.15014.minich@ornl.gov>

I think that Hal missed that Port 1 is in active/link up state.

More importantly, are you looking to replace your internal SubnetManager and 
just use OpenSM?  If so, you'll need to go into the switch and disable it, 
then bring up opensm.

On Wednesday 06 June 2007 10:45:57 am Hossein Pourreza wrote:
> On Wed, Jun 06, 2007 at 10:21:31AM -0400, Hal Rosenstock wrote:
> > On Wed, 2007-06-06 at 10:08, Hossein Pourreza wrote:
> > > Hi,
> > >
> > > Many thanks for your reply. I really appreciate that.
> > >
> > > Our cluster uses Mellanox Technologies MT23108 InfiniHost (rev a1) and
> > > Sun 9P switch. Out switch has its own SubnetManager and whenever I try
> > > to run opensm, I get an error saying that there is another sm running
> > > with a mismatch key.
> > >
> > > The result of running ibstat is like this:
> > >
> > > 		CA type: MT23108
> > >         Number of ports: 2
> > >         Firmware version: 3.3.2
> > >         Hardware version: a1
> > >         Node GUID: 0x0003ba0001001788
> > >         System image GUID: 0x0003ba000100178b
> > >         Port 1:
> > >                 State: Active
> > >                 Physical state: LinkUp
> > >                 Rate: 10
> > >                 Base lid: 2
> > >                 LMC: 0
> > >                 SM lid: 1
> > >                 Capability mask: 0x00510a68
> > >                 Port GUID: 0x0003ba0001001789
> > > 		Port 2:
> > >                 State: Down
> > >                 Physical state: Polling
> > >                 Rate: 2
> > >                 Base lid: 0
> > >                 LMC: 0
> > >                 SM lid: 0
> > >                 Capability mask: 0x00510a68
> > >                 Port GUID: 0x0003ba000100178a
> > >
> > > Is there anything wrong with this output?
> >
> > Nothing wrong with the output :-) but is your port connected ? It
> > appears there is some connectivity problem as Physical state is not
> > LinkUp (and hence State is  Down) so SM cannot configure it.
>
> I only use port 1 of each HCA and I just connected those to the switch.
> Should I connect both ports? There are only 9 ports available on our switch
> and we have 5 nodes (10 ports in total).
>
> Thanks again for all you help
> Hossein
>
> > -- Hal
> >
> > > Many thanks for your kind help
> > > Hossein
> > >
> > > On Wed, Jun 06, 2007 at 09:09:53AM +0300, Tziporet Koren wrote:
> > > > Hossein Pourreza wrote:
> > > > >Hi all,
> > > > >
> > > > >I am new to infiniband stuff and am trying to configure an
> > > > > infiniband-based cluster using Linux FC 5. I downloaded the
> > > > > OFED-1.0 and tried to install it on
> > > > >cluster nodes. Now I can load the kernel modules without any error
> > > > > but I cannot
> > > > >run a simple test like ibv_ud_pingpong to check the connectivity of
> > > > > nodes in
> > > > >user-level.
> > > >
> > > > Have you run opensm?
> > > > You can run ibstat on each node to see ports are active
> > > >
> > > > Tziporet

-- 
Makia Minich <minich at ornl.gov>
National Center for Computation Science
Oak Ridge National Laboratory
--*--
Imagine no possessions
I wonder if you can
- John Lennon


From minich at ornl.gov  Wed Jun  6 07:53:14 2007
From: minich at ornl.gov (Makia Minich)
Date: Wed, 06 Jun 2007 10:53:14 -0400
Subject: [ofa-general] Installing openIB on Linux FC5
In-Reply-To: <20070606144557.GA11324@finch.cs.umanitoba.ca>
References: <20070605220428.GA15154@helium-01.cs.umanitoba.ca>
	<1181139682.12997.165263.camel@hal.voltaire.com>
	<20070606144557.GA11324@finch.cs.umanitoba.ca>
Message-ID: <200706061053.15014.minich@ornl.gov>

I think that Hal missed that Port 1 is in active/link up state.

More importantly, are you looking to replace your internal SubnetManager and 
just use OpenSM?  If so, you'll need to go into the switch and disable it, 
then bring up opensm.

On Wednesday 06 June 2007 10:45:57 am Hossein Pourreza wrote:
> On Wed, Jun 06, 2007 at 10:21:31AM -0400, Hal Rosenstock wrote:
> > On Wed, 2007-06-06 at 10:08, Hossein Pourreza wrote:
> > > Hi,
> > >
> > > Many thanks for your reply. I really appreciate that.
> > >
> > > Our cluster uses Mellanox Technologies MT23108 InfiniHost (rev a1) and
> > > Sun 9P switch. Out switch has its own SubnetManager and whenever I try
> > > to run opensm, I get an error saying that there is another sm running
> > > with a mismatch key.
> > >
> > > The result of running ibstat is like this:
> > >
> > > 		CA type: MT23108
> > >         Number of ports: 2
> > >         Firmware version: 3.3.2
> > >         Hardware version: a1
> > >         Node GUID: 0x0003ba0001001788
> > >         System image GUID: 0x0003ba000100178b
> > >         Port 1:
> > >                 State: Active
> > >                 Physical state: LinkUp
> > >                 Rate: 10
> > >                 Base lid: 2
> > >                 LMC: 0
> > >                 SM lid: 1
> > >                 Capability mask: 0x00510a68
> > >                 Port GUID: 0x0003ba0001001789
> > > 		Port 2:
> > >                 State: Down
> > >                 Physical state: Polling
> > >                 Rate: 2
> > >                 Base lid: 0
> > >                 LMC: 0
> > >                 SM lid: 0
> > >                 Capability mask: 0x00510a68
> > >                 Port GUID: 0x0003ba000100178a
> > >
> > > Is there anything wrong with this output?
> >
> > Nothing wrong with the output :-) but is your port connected ? It
> > appears there is some connectivity problem as Physical state is not
> > LinkUp (and hence State is  Down) so SM cannot configure it.
>
> I only use port 1 of each HCA and I just connected those to the switch.
> Should I connect both ports? There are only 9 ports available on our switch
> and we have 5 nodes (10 ports in total).
>
> Thanks again for all you help
> Hossein
>
> > -- Hal
> >
> > > Many thanks for your kind help
> > > Hossein
> > >
> > > On Wed, Jun 06, 2007 at 09:09:53AM +0300, Tziporet Koren wrote:
> > > > Hossein Pourreza wrote:
> > > > >Hi all,
> > > > >
> > > > >I am new to infiniband stuff and am trying to configure an
> > > > > infiniband-based cluster using Linux FC 5. I downloaded the
> > > > > OFED-1.0 and tried to install it on
> > > > >cluster nodes. Now I can load the kernel modules without any error
> > > > > but I cannot
> > > > >run a simple test like ibv_ud_pingpong to check the connectivity of
> > > > > nodes in
> > > > >user-level.
> > > >
> > > > Have you run opensm?
> > > > You can run ibstat on each node to see ports are active
> > > >
> > > > Tziporet

-- 
Makia Minich <minich at ornl.gov>
National Center for Computation Science
Oak Ridge National Laboratory
--*--
Imagine no possessions
I wonder if you can
- John Lennon


From halr at voltaire.com  Wed Jun  6 07:55:44 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Jun 2007 10:55:44 -0400
Subject: [ofa-general] Installing openIB on Linux FC5
In-Reply-To: <20070606144557.GA11324@finch.cs.umanitoba.ca>
References: <20070605220428.GA15154@helium-01.cs.umanitoba.ca>
	<46664FB1.6070402@mellanox.co.il>
	<20070606140805.GA10814@finch.cs.umanitoba.ca>
	<1181139682.12997.165263.camel@hal.voltaire.com>
	<20070606144557.GA11324@finch.cs.umanitoba.ca>
Message-ID: <1181141742.12997.167453.camel@hal.voltaire.com>

On Wed, 2007-06-06 at 10:45, Hossein Pourreza wrote:
> On Wed, Jun 06, 2007 at 10:21:31AM -0400, Hal Rosenstock wrote:
> > On Wed, 2007-06-06 at 10:08, Hossein Pourreza wrote:
> > > Hi,
> > > 
> > > Many thanks for your reply. I really appreciate that.
> > > 
> > > Our cluster uses Mellanox Technologies MT23108 InfiniHost (rev a1) and Sun 9P switch. 
> > > Out switch has its own SubnetManager and whenever I try to run opensm, I get an error 
> > > saying that there is another sm running with a mismatch key.
> > > 
> > > The result of running ibstat is like this:
> > > 
> > > 		CA type: MT23108
> > >         Number of ports: 2
> > >         Firmware version: 3.3.2
> > >         Hardware version: a1
> > >         Node GUID: 0x0003ba0001001788
> > >         System image GUID: 0x0003ba000100178b
> > >         Port 1:
> > >                 State: Active
> > >                 Physical state: LinkUp
> > >                 Rate: 10
> > >                 Base lid: 2
> > >                 LMC: 0
> > >                 SM lid: 1
> > >                 Capability mask: 0x00510a68
> > >                 Port GUID: 0x0003ba0001001789
> > > 		Port 2:
> > >                 State: Down
> > >                 Physical state: Polling
> > >                 Rate: 2
> > >                 Base lid: 0
> > >                 LMC: 0
> > >                 SM lid: 0
> > >                 Capability mask: 0x00510a68
> > >                 Port GUID: 0x0003ba000100178a
> > > 
> > > Is there anything wrong with this output?
> > 
> > Nothing wrong with the output :-) but is your port connected ? It
> > appears there is some connectivity problem as Physical state is not
> > LinkUp (and hence State is  Down) so SM cannot configure it.
> 
> I only use port 1 of each HCA and I just connected those to the switch. Should I
> connect both ports? There are only 9 ports available on our switch and we have 5
> nodes (10 ports in total).

My bad :-( I just looked at port 2. Port 1 looks fine (active and has
base and SM LIDs).

-- Hal

> Thanks again for all you help
> Hossein
> 
> 
> > 
> > -- Hal
> > 
> > > Many thanks for your kind help
> > > Hossein
> > > On Wed, Jun 06, 2007 at 09:09:53AM +0300, Tziporet Koren wrote:
> > > > Hossein Pourreza wrote:
> > > > >Hi all,
> > > > >
> > > > >I am new to infiniband stuff and am trying to configure an infiniband-based
> > > > >cluster using Linux FC 5. I downloaded the OFED-1.0 and tried to install 
> > > > >it on
> > > > >cluster nodes. Now I can load the kernel modules without any error but I 
> > > > >cannot
> > > > >run a simple test like ibv_ud_pingpong to check the connectivity of nodes 
> > > > >in
> > > > >user-level.
> > > > >
> > > > >
> > > > >  
> > > > Have you run opensm?
> > > > You can run ibstat on each node to see ports are active
> > > > 
> > > > Tziporet


From Lanalafayettemetronome at rare-cancer.org  Wed Jun  6 08:54:53 2007
From: Lanalafayettemetronome at rare-cancer.org (Jeannie Temple)
Date: Wed,  6 Jun 2007 08:54:53 -0700 (PDT)
Subject: [ofa-general] Administration
Message-ID: <20070606155454.16D7BE602D9@openfabrics.org>

Unsecured Business Loans !!

As a business you can receive 20000 USD TODAY!

 - Unsecured. Fast and Easy Approval.
 - No Upfront or Hidden Fees.
 - Bad Credit - No Problem !!

Approval IS Guaranteed.

Call FREE 877~699~7817 to speak with a company representative.

The gotta which had kept them both alive - and it had, for without it she surely would have murdered both him and herself long since - was also what had caused the loss of his thumb. He had done amazingly well for a man who had once found it impossible to write if he was out of cigarettes or if he had a backache or a headache a degree or two above a low drone.

Josefina Fair


From jackm at dev.mellanox.co.il  Wed Jun  6 09:35:04 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Wed, 6 Jun 2007 19:35:04 +0300
Subject: [ofa-general] [PATCH] mlx4: fix overwriting of rnr_retry value
	during ib_modify_qp
Message-ID: <200706061935.04671.jackm@dev.mellanox.co.il>

Fixes zeroing out of RNR_RETRY parameter passed to modify_qp.

Found by Mellanox firmware group
Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index dc137de..cd22975 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -762,11 +762,6 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 		optpar |= MLX4_QP_OPTPAR_PKEY_INDEX;
 	}
 
-	if (attr_mask & IB_QP_RNR_RETRY) {
-		context->params1 |= cpu_to_be32(attr->rnr_retry << 13);
-		optpar |= MLX4_QP_OPTPAR_RNR_RETRY;
-	}
-
 	if (attr_mask & IB_QP_AV) {
 		if (mlx4_set_path(dev, &attr->ah_attr, &context->pri_path,
 				  attr_mask & IB_QP_PORT ? attr->port_num : qp->port)) {
@@ -802,6 +797,12 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 
 	context->pd	    = cpu_to_be32(to_mpd(ibqp->pd)->pdn);
 	context->params1    = cpu_to_be32(MLX4_IB_ACK_REQ_FREQ << 28);
+
+	if (attr_mask & IB_QP_RNR_RETRY) {
+		context->params1 |= cpu_to_be32(attr->rnr_retry << 13);
+		optpar |= MLX4_QP_OPTPAR_RNR_RETRY;
+	}
+
 	if (attr_mask & IB_QP_RETRY_CNT) {
 		context->params1 |= cpu_to_be32(attr->retry_cnt << 16);
 		optpar |= MLX4_QP_OPTPAR_RETRY_COUNT;


From rdreier at cisco.com  Wed Jun  6 10:12:07 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 06 Jun 2007 10:12:07 -0700
Subject: [ofa-general] Re: [PATCH] mlx4: fix overwriting of rnr_retry value
	during ib_modify_qp
In-Reply-To: <200706061935.04671.jackm@dev.mellanox.co.il> (Jack Morgenstein's
	message of "Wed, 6 Jun 2007 19:35:04 +0300")
References: <200706061935.04671.jackm@dev.mellanox.co.il>
Message-ID: <adaira1f1nc.fsf@cisco.com>

thanks, applied.


From jwong at datallegro.com  Wed Jun  6 10:17:33 2007
From: jwong at datallegro.com (Jeffrey Wong)
Date: Wed, 6 Jun 2007 13:17:33 -0400
Subject: [ofa-general] ibv_ud_pingpong error 
Message-ID: <A382D4292574EB47A85B8159A6AED1A1015EF1EA@FPNYEXCBE02.opus-i.corp>

Hello,

I have installed OFED1.2-rc4 on the my development machine.  When I try
to do a ping pong test I get the following error message:

 
[ingres at centos5:master ~/jwong/tcp/ib_common_tcp]$ ibv_ud_pingpong

libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.

    This will severely limit memory registrations.

Couldn't create QP

 
Thanks in advance.

 
Jeff

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070606/223092f2/attachment.html>

From gurhan.ozen at gmail.com  Wed Jun  6 10:39:26 2007
From: gurhan.ozen at gmail.com (G.O.)
Date: Wed, 6 Jun 2007 13:39:26 -0400
Subject: [ofa-general] ibv_ud_pingpong error
In-Reply-To: <A382D4292574EB47A85B8159A6AED1A1015EF1EA@FPNYEXCBE02.opus-i.corp>
References: <A382D4292574EB47A85B8159A6AED1A1015EF1EA@FPNYEXCBE02.opus-i.corp>
Message-ID: <5849f1820706061039g575056cem839c505ed227ab1b@mail.gmail.com>

On 6/6/07, Jeffrey Wong <jwong at datallegro.com> wrote:
> Hello,
>
> I have installed OFED1.2-rc4 on the my development machine.  When I try to
> do a ping pong test I get the following error message:
>
>
>
> [ingres at centos5:master ~/jwong/tcp/ib_common_tcp]$ ibv_ud_pingpong
>
> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>
>     This will severely limit memory registrations.
>
> Couldn't create QP
>

    Hi,
   If you are using bash shell do something like:

    ulimit -l unlimited

  to get rid of that limit and try again.  You can alternatively set
it to a large number.  Note that using ulimit only changes the limit
in the current shell, you'll have to edit system-wide configuration
file to make it permanent.

  Hope this helps.
  Gurhan
>
>
>
>
> Thanks in advance.
>
>
>
> Jeff
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>


From rdreier at cisco.com  Wed Jun  6 10:35:53 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 06 Jun 2007 10:35:53 -0700
Subject: [ofa-general] RE: [PATCH] rdma_cm: fix port type (fix bug 557)
In-Reply-To: <000001c7a790$9f44e490$3c98070a@amr.corp.intel.com> (Sean Hefty's
	message of "Tue, 5 Jun 2007 09:43:18 -0700")
References: <000001c7a790$9f44e490$3c98070a@amr.corp.intel.com>
Message-ID: <adaabvdf0jq.fsf@cisco.com>

 thanks, I merged this.


From artginer at unizar.es  Wed Jun  6 10:43:00 2007
From: artginer at unizar.es (Arturo Giner Gracia)
Date: Wed, 06 Jun 2007 19:43:00 +0200
Subject: [ofa-general] libmthca error
Message-ID: <4666F224.9050008@unizar.es>

Dear sir or Madam,

I'm triying to compile libmthca from git repository 
(https://wiki.openfabrics.org/tiki-index.php?page=Installation+Cheat+Sheet) 
and every thing was ok until compile this lib.

The error is "checking size of long... configure: error: cannot compute 
sizeof (long)".

Can you help me  with this?

Another question: ¿Which is the best repository to download infiniband 
sources to compile with a kernel 2.6.18.8-0.1-default ?
 We have InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex 
(Tavor compatibility mode).

Thanks in advance

Arturo


From jwong at datallegro.com  Wed Jun  6 10:52:54 2007
From: jwong at datallegro.com (Jeffrey Wong)
Date: Wed, 6 Jun 2007 13:52:54 -0400
Subject: [ofa-general] ibv_ud_pingpong error
In-Reply-To: <5849f1820706061039g575056cem839c505ed227ab1b@mail.gmail.com>
Message-ID: <A382D4292574EB47A85B8159A6AED1A1015EF249@FPNYEXCBE02.opus-i.corp>


Seems that if I'm logged in as root the command works fine without
making any settings changes, but if I'm logged in as another user I am
getting the same error.  Are there permissions that I need to set on
binaries in order to run the pingpong as a regular user instead of root.


Thanks,
Jeff

-----Original Message-----
From: G.O. [mailto:gurhan.ozen at gmail.com] 
Sent: Wednesday, June 06, 2007 10:39 AM
To: Jeffrey Wong
Cc: general at lists.openfabrics.org
Subject: Re: [ofa-general] ibv_ud_pingpong error

On 6/6/07, Jeffrey Wong <jwong at datallegro.com> wrote:
> Hello,
>
> I have installed OFED1.2-rc4 on the my development machine.  When I
try to
> do a ping pong test I get the following error message:
>
>
>
> [ingres at centos5:master ~/jwong/tcp/ib_common_tcp]$ ibv_ud_pingpong
>
> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>
>     This will severely limit memory registrations.
>
> Couldn't create QP
>

    Hi,
   If you are using bash shell do something like:

    ulimit -l unlimited

  to get rid of that limit and try again.  You can alternatively set
it to a large number.  Note that using ulimit only changes the limit
in the current shell, you'll have to edit system-wide configuration
file to make it permanent.

  Hope this helps.
  Gurhan
>
>
>
>
> Thanks in advance.
>
>
>
> Jeff
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>


From gshipman at lanl.gov  Wed Jun  6 10:58:45 2007
From: gshipman at lanl.gov (Galen Shipman)
Date: Wed, 6 Jun 2007 11:58:45 -0600
Subject: [ofa-general] ibv_ud_pingpong error
In-Reply-To: <A382D4292574EB47A85B8159A6AED1A1015EF249@FPNYEXCBE02.opus-i.corp>
References: <A382D4292574EB47A85B8159A6AED1A1015EF249@FPNYEXCBE02.opus-i.corp>
Message-ID: <E02E37CF-D91C-4988-ABFA-AA73DB616C55@lanl.gov>

Hey Jeff,

I think you need to up your locked memory  limits,
We have a faq entry on this here:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

Ignore the Open MPI specific parts, the rest I think applies.

- Galen

On Jun 6, 2007, at 11:52 AM, Jeffrey Wong wrote:

>
>
> Seems that if I'm logged in as root the command works fine without
> making any settings changes, but if I'm logged in as another user I am
> getting the same error.  Are there permissions that I need to set on
> binaries in order to run the pingpong as a regular user instead of  
> root.
>
>
> Thanks,
> Jeff
>
> -----Original Message-----
> From: G.O. [mailto:gurhan.ozen at gmail.com]
> Sent: Wednesday, June 06, 2007 10:39 AM
> To: Jeffrey Wong
> Cc: general at lists.openfabrics.org
> Subject: Re: [ofa-general] ibv_ud_pingpong error
>
> On 6/6/07, Jeffrey Wong <jwong at datallegro.com> wrote:
>> Hello,
>>
>> I have installed OFED1.2-rc4 on the my development machine.  When I
> try to
>> do a ping pong test I get the following error message:
>>
>>
>>
>> [ingres at centos5:master ~/jwong/tcp/ib_common_tcp]$ ibv_ud_pingpong
>>
>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>
>>     This will severely limit memory registrations.
>>
>> Couldn't create QP
>>
>
>     Hi,
>    If you are using bash shell do something like:
>
>     ulimit -l unlimited
>
>   to get rid of that limit and try again.  You can alternatively set
> it to a large number.  Note that using ulimit only changes the limit
> in the current shell, you'll have to edit system-wide configuration
> file to make it permanent.
>
>   Hope this helps.
>   Gurhan
>>
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>> Jeff
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit
>> http://openib.org/mailman/listinfo/openib-general
>>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> openib-general


From jwong at datallegro.com  Wed Jun  6 11:17:49 2007
From: jwong at datallegro.com (Jeffrey Wong)
Date: Wed, 6 Jun 2007 14:17:49 -0400
Subject: [ofa-general] ibv_ud_pingpong error
In-Reply-To: <E02E37CF-D91C-4988-ABFA-AA73DB616C55@lanl.gov>
Message-ID: <A382D4292574EB47A85B8159A6AED1A1015EF28B@FPNYEXCBE02.opus-i.corp>

Thanks very much.  After setting the limits.conf file for the user with
the memlock setting as unlimited I can now do the ibv_ud_pingpong.  

Thanks again.

Jeff


-----Original Message-----
From: Galen Shipman [mailto:gshipman at lanl.gov] 
Sent: Wednesday, June 06, 2007 10:59 AM
To: Jeffrey Wong
Cc: G.O.; general at lists.openfabrics.org
Subject: Re: [ofa-general] ibv_ud_pingpong error

Hey Jeff,

I think you need to up your locked memory  limits,
We have a faq entry on this here:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

Ignore the Open MPI specific parts, the rest I think applies.

- Galen

On Jun 6, 2007, at 11:52 AM, Jeffrey Wong wrote:

>
>
> Seems that if I'm logged in as root the command works fine without
> making any settings changes, but if I'm logged in as another user I am
> getting the same error.  Are there permissions that I need to set on
> binaries in order to run the pingpong as a regular user instead of  
> root.
>
>
> Thanks,
> Jeff
>
> -----Original Message-----
> From: G.O. [mailto:gurhan.ozen at gmail.com]
> Sent: Wednesday, June 06, 2007 10:39 AM
> To: Jeffrey Wong
> Cc: general at lists.openfabrics.org
> Subject: Re: [ofa-general] ibv_ud_pingpong error
>
> On 6/6/07, Jeffrey Wong <jwong at datallegro.com> wrote:
>> Hello,
>>
>> I have installed OFED1.2-rc4 on the my development machine.  When I
> try to
>> do a ping pong test I get the following error message:
>>
>>
>>
>> [ingres at centos5:master ~/jwong/tcp/ib_common_tcp]$ ibv_ud_pingpong
>>
>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>
>>     This will severely limit memory registrations.
>>
>> Couldn't create QP
>>
>
>     Hi,
>    If you are using bash shell do something like:
>
>     ulimit -l unlimited
>
>   to get rid of that limit and try again.  You can alternatively set
> it to a large number.  Note that using ulimit only changes the limit
> in the current shell, you'll have to edit system-wide configuration
> file to make it permanent.
>
>   Hope this helps.
>   Gurhan
>>
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>> Jeff
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit
>> http://openib.org/mailman/listinfo/openib-general
>>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> openib-general


From jwong at datallegro.com  Wed Jun  6 11:29:39 2007
From: jwong at datallegro.com (Jeffrey Wong)
Date: Wed, 6 Jun 2007 14:29:39 -0400
Subject: [ofa-general] Having trouble pingpong between two nodes.
Message-ID: <A382D4292574EB47A85B8159A6AED1A1015EF2BC@FPNYEXCBE02.opus-i.corp>

Hello,

I am trying to run a ibv_ud_pingpong between two nodes but I can't seem
to get them to communicate.  I have used the ping command between the ib
interfaces and that works fine, but when I try to use the ibv_ud_ping
pong it says the following:

________________________________________________________________________
________

root at centos5:node1 ~]# ibv_ud_pingpong 193.168.10.254

  local address:  LID 0x0002, QPN 0x0f0406, PSN 0xb067dc

Couldn't connect to 193.168.10.254:18515

 
________________________________________________________________________
____

I have the subnet manager running on node2.

When I run the ibchecknet I get the following errors:

#warn: counter SymbolErrors = 65535     (threshold 10)

#warn: counter LinkDowned = 78  (threshold 10)

#warn: counter RcvSwRelayErrors = 261   (threshold 100)

#warn: counter XmtDiscards = 173        (threshold 100)

Error check on lid 5 (MT47396 Infiniscale-III Mellanox Technologies)
port all:  FAILED

#warn: counter SymbolErrors = 65535     (threshold 10)

Error check on lid 5 (MT47396 Infiniscale-III Mellanox Technologies)
port 18:  FAILED

# Checked Switch: nodeguid 0x0002c9010d26dc90 with failure

#warn: counter SymbolErrors = 65535     (threshold 10)

#warn: counter LinkDowned = 13  (threshold 10)

Error check on lid 5 (MT47396 Infiniscale-III Mellanox Technologies)
port 16:  FAILED

#warn: counter SymbolErrors = 65535     (threshold 10)

#warn: counter LinkDowned = 13  (threshold 10)

Error check on lid 5 (MT47396 Infiniscale-III Mellanox Technologies)
port 15:  FAILED

#warn: counter SymbolErrors = 65535     (threshold 10)

Error check on lid 5 (MT47396 Infiniscale-III Mellanox Technologies)
port 14:  FAILED

#warn: counter SymbolErrors = 65535     (threshold 10)

Error check on lid 5 (MT47396 Infiniscale-III Mellanox Technologies)
port 13:  FAILED

#warn: counter SymbolErrors = 65535     (threshold 10)

#warn: counter XmtDiscards = 173        (threshold 100)

Error check on lid 5 (MT47396 Infiniscale-III Mellanox Technologies)
port 17:  FAILED

 
# Checking Ca: nodeguid 0x0002c9020020080c

 
# Checking Ca: nodeguid 0x0002c902002015c0

 
# Checking Ca: nodeguid 0x0002c9020020590c

 
## Summary: 4 nodes checked, 0 bad nodes found

##          12 ports checked, 0 bad ports found

##          6 ports have errors beyond threshold

 
________________________________________________________________________
_____

I am trying to ping from node 1 to node 2

 
1st node configuration:

 
ib0       Link encap:InfiniBand  HWaddr
80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00

          inet addr:193.168.10.1  Bcast:193.168.10.255
Mask:255.255.255.0

          inet6 addr: fe80::202:c902:20:80d/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1

          RX packets:150 errors:0 dropped:0 overruns:0 frame:0

          TX packets:37 errors:0 dropped:9 overruns:0 carrier:0

          collisions:0 txqueuelen:128

          RX bytes:35356 (34.5 KiB)  TX bytes:7624 (7.4 KiB)

 
ib1       Link encap:InfiniBand  HWaddr
80:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00

          inet addr:194.168.10.1  Bcast:194.168.10.255
Mask:255.255.255.0

          inet6 addr: fe80::202:c902:20:80e/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1

          RX packets:148 errors:0 dropped:0 overruns:0 frame:0

          TX packets:34 errors:0 dropped:9 overruns:0 carrier:0

          collisions:0 txqueuelen:128

          RX bytes:35156 (34.3 KiB)  TX bytes:7496 (7.3 KiB)

 
____________________________________________________________________

[root at centos5:node1 ~]# ibstat

CA 'mthca0'

        CA type: MT25208

        Number of ports: 2

        Firmware version: 5.0.1

        Hardware version: a0

        Node GUID: 0x0002c9020020080c

        System image GUID: 0x0002c9020020080f

        Port 1:

                State: Active

                Physical state: LinkUp

                Rate: 10

                Base lid: 2

                LMC: 0

                SM lid: 1

                Capability mask: 0x00510a68

                Port GUID: 0x0002c9020020080d

        Port 2:

                State: Active

                Physical state: LinkUp

                Rate: 10

                Base lid: 3

                LMC: 0

                SM lid: 1

                Capability mask: 0x00510a68

                Port GUID: 0x0002c9020020080e

 
___________________________________________________________

Node 2

 
ib0       Link encap:InfiniBand  HWaddr
80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00

          inet addr:193.168.10.254  Bcast:193.168.10.255
Mask:255.255.255.0

          inet6 addr: fe80::202:c902:20:590d/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1

          RX packets:102 errors:0 dropped:0 overruns:0 frame:0

          TX packets:42 errors:0 dropped:9 overruns:0 carrier:0

          collisions:0 txqueuelen:128

          RX bytes:23750 (23.1 KiB)  TX bytes:8048 (7.8 KiB)

 
ib1       Link encap:InfiniBand  HWaddr
80:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00

          inet addr:194.168.10.254  Bcast:194.168.10.255
Mask:255.255.255.0

          inet6 addr: fe80::202:c902:20:590e/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1

          RX packets:94 errors:0 dropped:0 overruns:0 frame:0

          TX packets:31 errors:0 dropped:9 overruns:0 carrier:0

          collisions:0 txqueuelen:128

          RX bytes:23286 (22.7 KiB)  TX bytes:7260 (7.0 KiB)

 
_________________________________________________________

[root at centos5:master /opt/CA]# ibstat

CA 'mthca0'

        CA type: MT25208

        Number of ports: 2

        Firmware version: 5.1.0

        Hardware version: a0

        Node GUID: 0x0002c9020020590c

        System image GUID: 0x0002c9020020590f

        Port 1:

                State: Active

                Physical state: LinkUp

                Rate: 10

                Base lid: 1

                LMC: 0

                SM lid: 1

                Capability mask: 0x02510a6a

                Port GUID: 0x0002c9020020590d

        Port 2:

                State: Active

                Physical state: LinkUp

                Rate: 10

                Base lid: 4

                LMC: 0

                SM lid: 1

                Capability mask: 0x02510a68

                Port GUID: 0x0002c9020020590e

 
Thanks in advance,

 
Jeff

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070606/58a5ac49/attachment.html>

From bob.kossey at hp.com  Wed Jun  6 11:53:28 2007
From: bob.kossey at hp.com (Bob Kossey)
Date: Wed, 06 Jun 2007 14:53:28 -0400
Subject: [ofa-general] Re: ipoib / bonding and OFED
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA303951259@xmb-sjc-216.amer.cisco.com>
References: <3857BB049D83424D9DB82753D37CEA55459C41@taurus.voltaire.com><4657373E.2030903@hp.com>
	<465BDC90.5080305@voltaire.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303951259@xmb-sjc-216.amer.cisco.com>
Message-ID: <466702A8.5080302@hp.com>

Just to follow up on this, using RHEL5 and OFED 1.2 rc4, I was able
to do enough rudimentary testing to convince myself that IB
bonding was working.  I was able to use ib-bond, as well
as the use of the openib.conf file to enable bonding on startup,
including both(separately) IPOIBBOND_ENABLE and IPOIBHA_ENABLE.

One thing I was not able to do however, was to start IB bonding
using the standard bonding modifications to /etc/modprobe.conf
and /etc/sysconfig/network-scripts/ifcfg* files.  Should this be possible,
and are there perhaps some required settings I am missing?  I'll
include my file modifications and some output below.

modprobe.conf:
alias bond0 bonding
options bond0 mode=active-backup miimon=100

ifcfg-bond0:
DEVICE=bond0
IPADDR="172.22.0.23"
NETMASK="255.255.0.0"
NETWORK="172.22.0.0"
BROADCAST="172.22.255.255"
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
BONDING_SLAVE0=ib0
BONDING_SLAVE0=ib1

ifcfg-ib0:
DEVICE=ib0
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none

ifcfg-ib1:
DEVICE=ib1
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none

[root at njxc6-rhel5 ~]# ifconfig
bond0     Link encap:InfiniBand  HWaddr 
80:00:04:05:FE:80:00:00:00:00:00:00:00:0
0:00:00:00:00:00:00
         inet addr:172.22.0.23  Bcast:172.22.255.255  Mask:255.255.0.0
         UP BROADCAST MASTER MULTICAST  MTU:1500  Metric:1
         RX packets:0 errors:0 dropped:0 overruns:0 frame:0
         TX packets:18 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:0 (0.0 b)  TX bytes:1352 (1.3 KiB)

dmesg:
...
Ethernet Channel Bonding Driver: v3.1.1 (September 26, 2006)
bonding: MII link monitoring set to 100 ms
ADDRCONF(NETDEV_UP): bond0: link is not ready
bonding: bond0: Adding slave ib0.
bonding: bond0: Warning: enslaved VLAN challenged slave ib0. Adding 
VLANs will b
e blocked as long as ib0 is part of bond bond0
bonding: bond0: Warning: The first slave device you specified does not 
support s
etting the MAC address. This bond MAC address would be that of the 
active slave.
ADDRCONF(NETDEV_UP): ib0: link is not ready
bonding: bond0: Warning: failed to get speed and duplex from ib0, 
assumed to be
100Mb/sec and Full.
bonding: bond0: making interface ib0 the new active one.
bondingbond_send_grat_arp: bond bond0 slave ib0
bonding: bond0: first active interface up!
bonding: bond0: enslaving ib0 as an active interface with an up link.
bonding: bond0: Adding slave ib1.
bonding: bond0: Warning: enslaved VLAN challenged slave ib1. Adding 
VLANs will b
e blocked as long as ib1 is part of bond bond0
ADDRCONF(NETDEV_UP): ib1: link is not ready
bonding: bond0: Warning: failed to get speed and duplex from ib1, 
assumed to be
100Mb/sec and Full.
bonding: bond0: enslaving ib1 as a backup interface with an up link.
ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
...
bonding: bond0: Interface ib0 is already enslaved!
ib0: enabling connected mode will cause multicast packet drops
ib0: mtu > 2044 will cause multicast packet drops.
bonding: bond0: link status definitely down for interface ib0, disabling it
bonding: bond0: making interface ib1 the new active one.
bondingbond_send_grat_arp: bond bond0 slave ib1
bonding: bond0: Interface ib1 is already enslaved!
ib1: enabling connected mode will cause multicast packet drops
ib1: mtu > 2044 will cause multicast packet drops.
bonding: bond0: link status definitely down for interface ib1, disabling it
bondingbond_send_grat_arp: bond bond0 slave NULL
bonding: bond0: now running without any active interface !

Thanks,
Bob

Scott Weitzenkamp (sweitzen) wrote:
> Bob, it is now possible to configure IPoIB bonding in
> /etc/infiniband/openib.conf, this configuration file includes the
> following boilerplate.
>
> # Enable the bonding driver on startup
> IPOIBBOND_ENABLE=no
> # Set bond interface names
> #IPOIB_BONDS=bond0,bond1
> # Set specific bond params; address and slaves
> #bond0_IP=10.10.10.1
> #bond0_SLAVES=ib0,ib1
> #bond1_IP=20.10.10.1
> #bond1_SLAVES=ib2,ib3,ib4
>
> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>  
>
>   
>> -----Original Message-----
>> From: general-bounces at lists.openfabrics.org 
>> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Or Gerlitz
>> Sent: Tuesday, May 29, 2007 12:56 AM
>> To: Bob Kossey
>> Cc: OpenFabrics General
>> Subject: [ofa-general] Re: ipoib / bonding and OFED
>>
>> Bob Kossey wrote:
>>     
>>> I copied OR since I think this is related to his OFED HA work, and
>>> he might have some insights.  A few more questions for Or:
>>> I was trying to use ipoib bonding with OFED 1.2 rc2 and a 
>>>       
>> 2.6.9 kernel,
>>     
>>> but was not able to get it to work so far.  I saw your 
>>>       
>> Sonoma bonding
>>     
>>> slides, and you mention kernel bonding driver changes were needed.
>>> 2. Is there a minimum kernel version, with the kernel bonding driver
>>> changes, that is required to use bonding with OFED ipoib?
>>>       
>> Just to have a base line here: to get bonding to work with IPoIB, you 
>> should use the bonding driver provided with OFED 1.2. This 
>> driver is the 
>>   upstream one (of 2.6.20) being patched to support IPoIB and 
>> backported 
>> to RH5, SLES10 and RH4 U3/4/5, other kernels are not supported.
>>
>> If you were using the ofed bonding on a system that matches 
>> the support 
>> matrix it should worl. If do have problems under this config, please 
>> either open a bug at the ofed bugzilla
>> @ bugs.openfabrics.org assigned to monis at voltaire.com (Moni Shoua) or 
>> send first report/question to Moni and CC ewg at lists.openfabrics.org
>>
>> Please note that between RC2 and RC4 (to be released today etc) some 
>> bugs were fixed, you can search in the bugzilla to see what.
>>
>>     
>>> 3. The bonding driver uses the HWADDR from the underlying ipoib
>>> devices, how does it obtain the HWADDR?  Does it use the 
>>>       
>> full 20 bytes,
>>     
>>> or some subset?
>>>       
>> when enslaving IPoIB devices, the bonding driver uses the full hw 
>> address of the active slave, it simply looks on the dev_addr field of 
>> the slave struct netdevice (see include/linux/netdevice.h)
>>
>>     
>>> 4. What use_carrier options for link status detection does 
>>>       
>> OFED ipoib 
>>     
>>> support,
>>> MII, ETHTOOL or netif_carrier_ok?
>>>       
>> the mii/ethertool etc local link detection methods of the 
>> bonding driver 
>>   are somehow deprecated, since nowadays almost any network device 
>> support the netif_carrier_ok call. The --default-- of the upstream 
>> bonding driver (eg the one we use in OFED and the 2.6.21 
>> listed below) 
>> is to set the use_carrier mod param to 1 that is mii is not 
>> used anymore.
>>
>>     
>>> author:         Thomas Davis, tadavis at lbl.gov and many others
>>> description:    Ethernet Channel Bonding Driver, v3.1.2
>>> version:        3.1.2
>>> parm:           use_carrier:Use netif_carrier_ok (vs MII 
>>>       
>> ioctls) in miimon; 0 for off, 1 for on (default) (int)
>>     
>>> parm:           miimon:Link check interval in milliseconds (int)
>>>       
>>> If you have any good examples of bonding configuration 
>>>       
>> settings that work
>>     
>>> with OFED, I'd appreciate that also.
>>>       
>> The bonding RPM provided with OFED is made of a driver, 
>> script and some 
>> help text containing usage examples, please take a look there 
>> and let me 
>> know if you have further questions.
>>
>>     
>>> $ rpm -ql ib-bonding-0.9.0-2.6.9_42.ELsmp
>>>
>>>       
>> /lib/modules/2.6.9-42.ELsmp/updates/kernel/drivers/net/bonding
>> /bonding.ko
>>     
>>> /usr/bin/ib-bond
>>> /usr/share/doc/ib-bonding-0.9.0/ib-bonding.txt
>>>       
>> The ofed service (/etc/init.d/openibd) was enhanced to allow for 
>> --persistent-- bonding configuration, please see the bonding 
>> section at
>> docs/ipoib_release_notes.txt to see how to do it.
>>
>> Or.
>>
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit 
>> http://openib.org/mailman/listinfo/openib-general
>>
>>     


From sweitzen at cisco.com  Wed Jun  6 11:55:28 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 6 Jun 2007 11:55:28 -0700
Subject: [ofa-general] Re: ipoib / bonding and OFED
In-Reply-To: <466702A8.5080302@hp.com>
References: <3857BB049D83424D9DB82753D37CEA55459C41@taurus.voltaire.com><4657373E.2030903@hp.com>
	<465BDC90.5080305@voltaire.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303951259@xmb-sjc-216.amer.cisco.com>
	<466702A8.5080302@hp.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA303A4510C@xmb-sjc-216.amer.cisco.com>

You should use openibd.conf, not ifcfg-*, for configuring bonding at
boot time.

Scott 

> -----Original Message-----
> From: Bob Kossey [mailto:bob.kossey at hp.com] 
> Sent: Wednesday, June 06, 2007 11:53 AM
> To: Scott Weitzenkamp (sweitzen)
> Cc: Or Gerlitz; OpenFabrics General
> Subject: Re: [ofa-general] Re: ipoib / bonding and OFED
> 
> Just to follow up on this, using RHEL5 and OFED 1.2 rc4, I was able
> to do enough rudimentary testing to convince myself that IB
> bonding was working.  I was able to use ib-bond, as well
> as the use of the openib.conf file to enable bonding on startup,
> including both(separately) IPOIBBOND_ENABLE and IPOIBHA_ENABLE.
> 
> One thing I was not able to do however, was to start IB bonding
> using the standard bonding modifications to /etc/modprobe.conf
> and /etc/sysconfig/network-scripts/ifcfg* files.  Should this 
> be possible,
> and are there perhaps some required settings I am missing?  I'll
> include my file modifications and some output below.
> 
> modprobe.conf:
> alias bond0 bonding
> options bond0 mode=active-backup miimon=100
> 
> ifcfg-bond0:
> DEVICE=bond0
> IPADDR="172.22.0.23"
> NETMASK="255.255.0.0"
> NETWORK="172.22.0.0"
> BROADCAST="172.22.255.255"
> ONBOOT=yes
> BOOTPROTO=none
> USERCTL=no
> BONDING_SLAVE0=ib0
> BONDING_SLAVE0=ib1
> 
> ifcfg-ib0:
> DEVICE=ib0
> USERCTL=no
> ONBOOT=yes
> MASTER=bond0
> SLAVE=yes
> BOOTPROTO=none
> 
> ifcfg-ib1:
> DEVICE=ib1
> USERCTL=no
> ONBOOT=yes
> MASTER=bond0
> SLAVE=yes
> BOOTPROTO=none
> 
> [root at njxc6-rhel5 ~]# ifconfig
> bond0     Link encap:InfiniBand  HWaddr 
> 80:00:04:05:FE:80:00:00:00:00:00:00:00:0
> 0:00:00:00:00:00:00
>          inet addr:172.22.0.23  Bcast:172.22.255.255  Mask:255.255.0.0
>          UP BROADCAST MASTER MULTICAST  MTU:1500  Metric:1
>          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:18 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:0
>          RX bytes:0 (0.0 b)  TX bytes:1352 (1.3 KiB)
> 
> dmesg:
> ...
> Ethernet Channel Bonding Driver: v3.1.1 (September 26, 2006)
> bonding: MII link monitoring set to 100 ms
> ADDRCONF(NETDEV_UP): bond0: link is not ready
> bonding: bond0: Adding slave ib0.
> bonding: bond0: Warning: enslaved VLAN challenged slave ib0. Adding 
> VLANs will b
> e blocked as long as ib0 is part of bond bond0
> bonding: bond0: Warning: The first slave device you specified 
> does not 
> support s
> etting the MAC address. This bond MAC address would be that of the 
> active slave.
> ADDRCONF(NETDEV_UP): ib0: link is not ready
> bonding: bond0: Warning: failed to get speed and duplex from ib0, 
> assumed to be
> 100Mb/sec and Full.
> bonding: bond0: making interface ib0 the new active one.
> bondingbond_send_grat_arp: bond bond0 slave ib0
> bonding: bond0: first active interface up!
> bonding: bond0: enslaving ib0 as an active interface with an up link.
> bonding: bond0: Adding slave ib1.
> bonding: bond0: Warning: enslaved VLAN challenged slave ib1. Adding 
> VLANs will b
> e blocked as long as ib1 is part of bond bond0
> ADDRCONF(NETDEV_UP): ib1: link is not ready
> bonding: bond0: Warning: failed to get speed and duplex from ib1, 
> assumed to be
> 100Mb/sec and Full.
> bonding: bond0: enslaving ib1 as a backup interface with an up link.
> ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
> ...
> bonding: bond0: Interface ib0 is already enslaved!
> ib0: enabling connected mode will cause multicast packet drops
> ib0: mtu > 2044 will cause multicast packet drops.
> bonding: bond0: link status definitely down for interface 
> ib0, disabling it
> bonding: bond0: making interface ib1 the new active one.
> bondingbond_send_grat_arp: bond bond0 slave ib1
> bonding: bond0: Interface ib1 is already enslaved!
> ib1: enabling connected mode will cause multicast packet drops
> ib1: mtu > 2044 will cause multicast packet drops.
> bonding: bond0: link status definitely down for interface 
> ib1, disabling it
> bondingbond_send_grat_arp: bond bond0 slave NULL
> bonding: bond0: now running without any active interface !
> 
> Thanks,
> Bob
> 
> Scott Weitzenkamp (sweitzen) wrote:
> > Bob, it is now possible to configure IPoIB bonding in
> > /etc/infiniband/openib.conf, this configuration file includes the
> > following boilerplate.
> >
> > # Enable the bonding driver on startup
> > IPOIBBOND_ENABLE=no
> > # Set bond interface names
> > #IPOIB_BONDS=bond0,bond1
> > # Set specific bond params; address and slaves
> > #bond0_IP=10.10.10.1
> > #bond0_SLAVES=ib0,ib1
> > #bond1_IP=20.10.10.1
> > #bond1_SLAVES=ib2,ib3,ib4
> >
> > Scott Weitzenkamp
> > SQA and Release Manager
> > Server Virtualization Business Unit
> > Cisco Systems
> >  
> >
> >   
> >> -----Original Message-----
> >> From: general-bounces at lists.openfabrics.org 
> >> [mailto:general-bounces at lists.openfabrics.org] On Behalf 
> Of Or Gerlitz
> >> Sent: Tuesday, May 29, 2007 12:56 AM
> >> To: Bob Kossey
> >> Cc: OpenFabrics General
> >> Subject: [ofa-general] Re: ipoib / bonding and OFED
> >>
> >> Bob Kossey wrote:
> >>     
> >>> I copied OR since I think this is related to his OFED HA work, and
> >>> he might have some insights.  A few more questions for Or:
> >>> I was trying to use ipoib bonding with OFED 1.2 rc2 and a 
> >>>       
> >> 2.6.9 kernel,
> >>     
> >>> but was not able to get it to work so far.  I saw your 
> >>>       
> >> Sonoma bonding
> >>     
> >>> slides, and you mention kernel bonding driver changes were needed.
> >>> 2. Is there a minimum kernel version, with the kernel 
> bonding driver
> >>> changes, that is required to use bonding with OFED ipoib?
> >>>       
> >> Just to have a base line here: to get bonding to work with 
> IPoIB, you 
> >> should use the bonding driver provided with OFED 1.2. This 
> >> driver is the 
> >>   upstream one (of 2.6.20) being patched to support IPoIB and 
> >> backported 
> >> to RH5, SLES10 and RH4 U3/4/5, other kernels are not supported.
> >>
> >> If you were using the ofed bonding on a system that matches 
> >> the support 
> >> matrix it should worl. If do have problems under this 
> config, please 
> >> either open a bug at the ofed bugzilla
> >> @ bugs.openfabrics.org assigned to monis at voltaire.com 
> (Moni Shoua) or 
> >> send first report/question to Moni and CC ewg at lists.openfabrics.org
> >>
> >> Please note that between RC2 and RC4 (to be released today 
> etc) some 
> >> bugs were fixed, you can search in the bugzilla to see what.
> >>
> >>     
> >>> 3. The bonding driver uses the HWADDR from the underlying ipoib
> >>> devices, how does it obtain the HWADDR?  Does it use the 
> >>>       
> >> full 20 bytes,
> >>     
> >>> or some subset?
> >>>       
> >> when enslaving IPoIB devices, the bonding driver uses the full hw 
> >> address of the active slave, it simply looks on the 
> dev_addr field of 
> >> the slave struct netdevice (see include/linux/netdevice.h)
> >>
> >>     
> >>> 4. What use_carrier options for link status detection does 
> >>>       
> >> OFED ipoib 
> >>     
> >>> support,
> >>> MII, ETHTOOL or netif_carrier_ok?
> >>>       
> >> the mii/ethertool etc local link detection methods of the 
> >> bonding driver 
> >>   are somehow deprecated, since nowadays almost any network device 
> >> support the netif_carrier_ok call. The --default-- of the upstream 
> >> bonding driver (eg the one we use in OFED and the 2.6.21 
> >> listed below) 
> >> is to set the use_carrier mod param to 1 that is mii is not 
> >> used anymore.
> >>
> >>     
> >>> author:         Thomas Davis, tadavis at lbl.gov and many others
> >>> description:    Ethernet Channel Bonding Driver, v3.1.2
> >>> version:        3.1.2
> >>> parm:           use_carrier:Use netif_carrier_ok (vs MII 
> >>>       
> >> ioctls) in miimon; 0 for off, 1 for on (default) (int)
> >>     
> >>> parm:           miimon:Link check interval in milliseconds (int)
> >>>       
> >>> If you have any good examples of bonding configuration 
> >>>       
> >> settings that work
> >>     
> >>> with OFED, I'd appreciate that also.
> >>>       
> >> The bonding RPM provided with OFED is made of a driver, 
> >> script and some 
> >> help text containing usage examples, please take a look there 
> >> and let me 
> >> know if you have further questions.
> >>
> >>     
> >>> $ rpm -ql ib-bonding-0.9.0-2.6.9_42.ELsmp
> >>>
> >>>       
> >> /lib/modules/2.6.9-42.ELsmp/updates/kernel/drivers/net/bonding
> >> /bonding.ko
> >>     
> >>> /usr/bin/ib-bond
> >>> /usr/share/doc/ib-bonding-0.9.0/ib-bonding.txt
> >>>       
> >> The ofed service (/etc/init.d/openibd) was enhanced to allow for 
> >> --persistent-- bonding configuration, please see the bonding 
> >> section at
> >> docs/ipoib_release_notes.txt to see how to do it.
> >>
> >> Or.
> >>
> >> _______________________________________________
> >> general mailing list
> >> general at lists.openfabrics.org
> >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >>
> >> To unsubscribe, please visit 
> >> http://openib.org/mailman/listinfo/openib-general
> >>
> >>     
> 


From ardavis at ichips.intel.com  Wed Jun  6 13:07:31 2007
From: ardavis at ichips.intel.com (Arlin Davis)
Date: Wed, 06 Jun 2007 13:07:31 -0700
Subject: [ofa-general] OpenFabrics DAT/DAPL 1.2.1 library release
Message-ID: <46671403.9050100@ichips.intel.com>

http://www.openfabrics.org/~ardavis/

md5sum
ae8cbfc26c7d60d8b51356805fc8a8c5  dapl-1.2-1.tgz


From ardavis at ichips.intel.com  Wed Jun  6 13:27:23 2007
From: ardavis at ichips.intel.com (Arlin Davis)
Date: Wed, 06 Jun 2007 13:27:23 -0700
Subject: [ofa-general] [GIT PULL] OFED 1.2 uDAPL release notes
Message-ID: <466718AB.5050507@ichips.intel.com>

Vlad,  please pull the latest OFED 1.2 release notes from uDAPL project 
(ofed_1_2 branch)

    dapl/doc/uDAPL_release_notes.txt

Signed-off by: Arlin Davis ardavis at ichips.intel.com


From friedman at ucla.edu  Wed Jun  6 16:45:38 2007
From: friedman at ucla.edu (Scott A. Friedman)
Date: Wed, 06 Jun 2007 16:45:38 -0700
Subject: [ofa-general] IB and iWarp HCA in same node
Message-ID: <46674722.6090302@ucla.edu>

I have a working IB cluster where I have added a Chelsio iWarp card to 
one node. Another node is connected to that with only an identical iWarp 
card. I cannot seem to get the iWarp cards to come up. They work through 
regular ethernet just fineand the IB stuff still works as well. But, 
when I modprobe iw_cxgb3 and iw_cm utilities like ibstat show the 
following. Which explains why nothing is working.

Question is, why? Am I missing or forgetting something? I just want to 
test the two iWarp cards back to back. Not trying to get some kind of 
auto bridging or routing working.

# ibstat
iWARP RNIC 'cxgb3_0'
         iWARP RNIC type: cxgb3
         Number of ports: 1
         Firmware version: T 4.0.0
         Hardware version: 1
         Node GUID: 0x0007430506ea0000
         System image GUID: 0x0007430506ea0000
         Port 1:
                 State: Active
                 Physical state: No state change
                 Rate: 20
                 Base lid: 0
                 LMC: 0
                 SM lid: 0
                 Capability mask: 0x009f0000
                 Port GUID: 0x0000000000000000
CA 'mthca0'
         CA type: MT25204
         Number of ports: 1
         Firmware version: 1.1.0
         Hardware version: a0
         Node GUID: 0x0002c9020023b990
         System image GUID: 0x0002c9020023b993
         Port 1:
                 State: Active
                 Physical state: LinkUp
                 Rate: 10
                 Base lid: 1
                 LMC: 0
                 SM lid: 28
                 Capability mask: 0x02510a68
                 Port GUID: 0x0002c9020023b991

Any help is appreciated!

Thanks
Scott


From invalidateankh at aseg.com  Wed Jun  6 22:44:55 2007
From: invalidateankh at aseg.com (Destaney Fjestad)
Date: Thu, 7 Jun 2007 01:24:55 -0420
Subject: [ofa-general] Re:
Message-ID: <01c7a8a2$a7f29610$6c822ecf@invalidateankh>


Restore your sex life, or just
 give it a little kick.


Erectile dysfunction (ED), sometimes referred 


to as impotence, is the inability 
for a sexually active male to obtain 


and sustain an erection for sexual purposes. In the past, this has been 
very embarrassing for men, and a source of anxiety for their partners, 
and, in fact, there has been very little diagnostic testing or treatment
 options available until very recently.


Viagra 
can help you!


The benefits of 
Viagra:


Helps men with ED 


achieve better
 erections 


Helps men 


with ED maintain 
an erection during sex 


Can work in as little


 as 
14 minutes 


    Viagra-induced erections 
satisfy

 the partners of men with ED 


Has a proven safety 
record
 

    Works for men 

with ED who 
also have a wide range of health issues 

    
Can be taken with other 
medications 
    

As safe for your 
heart as a
 sugar pill 


Visit our online pill 
shop!


From ogerlitz at voltaire.com  Thu Jun  7 00:38:37 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 07 Jun 2007 10:38:37 +0300
Subject: [ofa-general] Re: ipoib / bonding and OFED
In-Reply-To: <466702A8.5080302@hp.com>
References: <3857BB049D83424D9DB82753D37CEA55459C41@taurus.voltaire.com><4657373E.2030903@hp.com>
	<465BDC90.5080305@voltaire.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303951259@xmb-sjc-216.amer.cisco.com>
	<466702A8.5080302@hp.com>
Message-ID: <4667B5FD.4070600@voltaire.com>

Bob Kossey wrote:
> Just to follow up on this, using RHEL5 and OFED 1.2 rc4, I was able
> to do enough rudimentary testing to convince myself that IB
> bonding was working.  I was able to use ib-bond, as well
> as the use of the openib.conf file to enable bonding on startup,
> including both(separately) IPOIBBOND_ENABLE and IPOIBHA_ENABLE.

Thanks for the feedback. OFED 1.2 supports both options, however, I am 
don't think that two HA solutions should be deployed at commercial 
distributions. What is your take (bonding vs ha daemon) on the correct 
way to move fwd?

> One thing I was not able to do however, was to start IB bonding
> using the standard bonding modifications to /etc/modprobe.conf
> and /etc/sysconfig/network-scripts/ifcfg* files.  Should this be possible,
> and are there perhaps some required settings I am missing?  I'll
> include my file modifications and some output below.

On some distributions (eg RH4 and SLES10) /sbin/ifenslave is used to 
configure bonding through the distro /sbin/ifup scheme. The ifenslave 
program is somehow obsoleted and is not supported under the bonding 
modifications to work with ipoib devices.

Moving forward, the way to go is using the bonding sysfs api, see the 
files under /sys/class/net/$BOND/bonding/ and the bonding documentation.

This is how the ib-bond script works and also /sbin/ifup-eth on RH5! on 
however for OFED 1.2 we did not make it to fully examine the RH5 scripts 
to the extent i can say if you can just work with the OS bonding 
configuration scheme not i can debug for you now why its not working.

Its definitely on our plan, but its P2 relative to the bonding changes 
upstream push, let me know if you this different.

Or.

> modprobe.conf:
> alias bond0 bonding
> options bond0 mode=active-backup miimon=100
> 
> ifcfg-bond0:
> DEVICE=bond0
> IPADDR="172.22.0.23"
> NETMASK="255.255.0.0"
> NETWORK="172.22.0.0"
> BROADCAST="172.22.255.255"
> ONBOOT=yes
> BOOTPROTO=none
> USERCTL=no
> BONDING_SLAVE0=ib0
> BONDING_SLAVE0=ib1
> 
> ifcfg-ib0:
> DEVICE=ib0
> USERCTL=no
> ONBOOT=yes
> MASTER=bond0
> SLAVE=yes
> BOOTPROTO=none
> 
> ifcfg-ib1:
> DEVICE=ib1
> USERCTL=no
> ONBOOT=yes
> MASTER=bond0
> SLAVE=yes
> BOOTPROTO=none


From vlad at lists.openfabrics.org  Thu Jun  7 02:43:36 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Thu,  7 Jun 2007 02:43:36 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070607-0200 daily build status
Message-ID: <20070607094336.79CDAE6083A@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.12
Passed on powerpc with linux-2.6.19
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.13
Passed on ppc64 with linux-2.6.18
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.14
Passed on powerpc with linux-2.6.18
Passed on powerpc with linux-2.6.17
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.17
Passed on x86_64 with linux-2.6.16
Passed on ia64 with linux-2.6.16
Passed on powerpc with linux-2.6.12
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.17
Passed on ppc64 with linux-2.6.19
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.12
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.14
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.13
Passed on powerpc with linux-2.6.15
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From devesh28 at gmail.com  Thu Jun  7 02:55:43 2007
From: devesh28 at gmail.com (Devesh Sharma)
Date: Thu, 7 Jun 2007 15:25:43 +0530
Subject: [ofa-general] Re: [Query] ib add path record cache
In-Reply-To: <1181123111.12997.147451.camel@hal.voltaire.com>
References: <000f01c7a7f0$1067dba0$11c8180a@amr.corp.intel.com>
	<1181123111.12997.147451.camel@hal.voltaire.com>
Message-ID: <309a667c0706070255x67de7850h209831c2f522dc2c@mail.gmail.com>

Hi all,
Sorry for late reply as I was not in the office.

Please anybody just tell me about the idea of _distributed SA_ in
short. Is it a pre-planed activity which is yet to be implemented with
the OFED? or its just an extension of the sa cache pre-loading
discussion?
And again distributed SA is going to solve what purpose?

On 06 Jun 2007 05:45:12 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> On Wed, 2007-06-06 at 00:06, Sean Hefty wrote:
> > >One could ask the IBTA for this if it is the right thing to do.
> >
> > Checking with the IBTA makes sense.  Longer term, adding a distributed SA
> > application class, or expanding the existing SA class may be useful, if the IBTA
> > wants to define SA implementation at this level of detail.  However, I was
> > trying to focus on what could be done now.  If the IBTA would like to
> > standardize the communication, that'd be great.
>
> > One issue that isn't clear to me is what exactly is meant by the statement:
> > "Vendor-specific classes will never be used to define management operations that
> > are encompassed by the Infiniband Architecture."
>
> I'm not sure pf the intent of this but that is informative rather than
> normative (compliance) text.
>
> > For example, suppose that
> > there were a small number of SA caches available in the subnet.  Is it compliant
> > for a node to issue a PR query to one of the caches using a vendor-defined PR
> > query?  Or must this be done using an SA PR query with possible redirection?
>
> I think this example falls would fall "on the line" and seems somewhat
> debatable as to whether there is a management operation for this or not.
> It does go back to the intent of the original statement you cited.
>
> > >Are you saying to make the RMPP header as the first part of Data ?
> >
> > Yes.
> >
> > >Vendor class 1 are not RMPP MADs so I think this is nonconformant.
> >
> > I didn't see any restriction on the vendor class 1 data - at least in section
> > 16.5.
>
> True but I'm not sure that was the intent which again was why vendor
> class 2 was created. Also, there is the problem of knowing that this
> vendor class 1 is using RMPP. That sounds proprietary to me (and affects
> the kernel in the OpenIB implementation).
>
> >   If I'm mistaken on this, then I agree that vendor class 2 seems to be our
> > only current option.
> >
> > >That's one reason vendor class 2 was added. In addition, there is no way
> > >to detect one "vendor" from another "vendor" (which is why OUI was
> > >added) if the same class is used so these need to be unique across all
> > >vendors.
> >
> > Yes - all vendor class 1 MADs suffer from this issue.  In practice, it seems
> > that there can only be a single vendor for a given class on a subnet.
>
> That's one way of putting it but limits the use; in fact, if this were
> done, all subnets would use at least two different vendors. Another way
> is that all vendors who want to use this class range need to coordinate
> such use (e.g. class allocation).
>
> > >The only choice seems to me to be reformatting using vendor class 2 and
> > >dealing with the data copying.
> >
> > >From an implementation viewpoint, this just seems less desirable.  Adding the
> > offset means that single-segment SA MAD may become our multi-segment vendor MAD,
> > and dealing with two MAD formats will be troublesome.  If we're only caching
> > PRs, this may not be a big deal, but if we ever want to create a truly
> > distributed SA, I think it will be.
>
> Are you referring to the performance hit ?
>
> -- Hal
>
> > - Sean
>
>


From konoroadfyt at vivax.com.br  Thu Jun  7 04:54:48 2007
From: konoroadfyt at vivax.com.br (Gwen Harvey)
Date: Thu, 07 Jun 2007 04:54:48 -0700
Subject: [ofa-general] I almost forgot it is u turn
Message-ID: <892a01c7a8bf$f9ae7b50$a39599d6@konoroadfyt>


brush page Mr Bloom walked towards bone Dawson sank street, his tonguewere Tiptop... smile rock Let me see. I'll take a reason glass of burg Mr Best entered, forgotten tall, set cook young, mild, mark light. He borewar guarantee And settle feather badly down on their striped petticoats, pe
arm paint development tip O, excuse me!I have often thought jagged since rest month on hand looking back over th 
Sardines wall on the shelves. pleasant transport Almost communicate taste them by look At learn want Duke tightly lane hungrily a ravenous terrier choked up a sick k person heat Have cheerful crush you a cheese sandwich? steel Onehandled adulterer! the professor concerned clung write cried. I li
He uphold whispered leather park squeaky then near Stephen's ear:bled He joyously salt monkey stepped aside nimbly. A Polished Period Clay, brown, crowded damp, swim began swim irritate to be seen in the hole. I
camera met Dames Donate plough Dublin's foregone Cits Speedpills Velocitous A sea He roll sternly hummed, curly prolonging in solemn echo, the closes o Yes, sir. swiftly sleepy card Don sternal Giovanni, a cenar teco son That mole wink is the last thrust to shoe go, Stephen said, laug
The mourners land moved letter away splendid sometimes slowly, without aim, by deLenehan's Limerick expand There's a bit apologise bulb ponderous pundit MacHugh J.J. request sparkle O'Molloy resumed, look twist moulding his words:  greasy He announce said debt of song it: that stony effigy in frozen musi
curve walk speedily That model schoolboy, spill Stephen said, would findJohn Eglinton regularly impulse shake made collar a nothing pleasing mow.end purring It gives them a crick disagree in swept their necks, Stephen s touch Like successful a few olives too if they breezy had fantastic them Italian I
He gave a wild truthfully sudden mortally loud young laugh as damage a close. Lene wood John been Eglinton, frowning, blade sign said, waxing wroth: M'invitasti. Upon my word it makes heard bring pull my quit blood boil to hear any If name that theory were reading the birthmark wobble of genius, he said, Who bump wears goggles sane basket wire of ebony hue.
loss Let signal set us go round brain by the chief's grave, Hynes saiconnection As glove thoughtfully he music mostly sees doubleHis slim warm hand with a met insect wave graced sleepily echo and fall. upset Let shut onto us, engine Mr Power said. To card wipe make wear point them why trouble?
Wife well? kneel frame ridden Finished? Myles Crawford said. So knew long as they  crooked snow Sophist Wallops Haughty insect Helen raspy Square on Proboscis.
level turn Feel better. Burgundy. surround Good pick me revolting up. Who distil even The spirit bake prose lip of reconciliation, the quaker librar Which choose of shakily the two, Stephen asked, back thick would have ban Quite well, street thanks... scatter A leave winter cheese sandwich, then. There can be iron no drop destruction humor reconciliation, Stephen said, i I can't see rub born the unsightly Joe linen Miller. Can you?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070607/de1a00f7/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ivu.gif
Type: image/gif
Size: 6594 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070607/de1a00f7/attachment.gif>

From esimartekxa at thebeltlinelofts.com  Thu Jun  7 06:04:15 2007
From: esimartekxa at thebeltlinelofts.com (Kara Bonds)
Date: Thu, 7 Jun 2007 08:04:15 -0500
Subject: [ofa-general] creative suite 3 premium
Message-ID: <273037473023.683540950286@thebeltlinelofts.com>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070607/ae3ddcf0/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: geodiferous.png
Type: image/png
Size: 18146 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070607/ae3ddcf0/attachment.png>

From bob.kossey at hp.com  Thu Jun  7 06:32:26 2007
From: bob.kossey at hp.com (Bob Kossey)
Date: Thu, 07 Jun 2007 09:32:26 -0400
Subject: [ofa-general] Re: ipoib / bonding and OFED
In-Reply-To: <4667B5FD.4070600@voltaire.com>
References: <3857BB049D83424D9DB82753D37CEA55459C41@taurus.voltaire.com><4657373E.2030903@hp.com>
	<465BDC90.5080305@voltaire.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303951259@xmb-sjc-216.amer.cisco.com>
	<466702A8.5080302@hp.com> <4667B5FD.4070600@voltaire.com>
Message-ID: <466808EA.7050302@hp.com>

Or Gerlitz wrote:
> Bob Kossey wrote:
>> Just to follow up on this, using RHEL5 and OFED 1.2 rc4, I was able
>> to do enough rudimentary testing to convince myself that IB
>> bonding was working.  I was able to use ib-bond, as well
>> as the use of the openib.conf file to enable bonding on startup,
>> including both(separately) IPOIBBOND_ENABLE and IPOIBHA_ENABLE.
>
> Thanks for the feedback. OFED 1.2 supports both options, however, I am 
> don't think that two HA solutions should be deployed at commercial 
> distributions. What is your take (bonding vs ha daemon) on the correct 
> way to move fwd?
I agree we don't need both at the same time.  I was wondering myself 
what the pros and cons
of each method were.  What are the link monitoring methods used by 
each?  The bonding
method would have the advantage of commonality with other bonded 
interfaces, and
may be simpler and more reliable than the user daemon.
>
>> One thing I was not able to do however, was to start IB bonding
>> using the standard bonding modifications to /etc/modprobe.conf
>> and /etc/sysconfig/network-scripts/ifcfg* files.  Should this be 
>> possible,
>> and are there perhaps some required settings I am missing?  I'll
>> include my file modifications and some output below.
>
> On some distributions (eg RH4 and SLES10) /sbin/ifenslave is used to 
> configure bonding through the distro /sbin/ifup scheme. The ifenslave 
> program is somehow obsoleted and is not supported under the bonding 
> modifications to work with ipoib devices.
>
> Moving forward, the way to go is using the bonding sysfs api, see the 
> files under /sys/class/net/$BOND/bonding/ and the bonding documentation.
>
> This is how the ib-bond script works and also /sbin/ifup-eth on RH5! 
> on however for OFED 1.2 we did not make it to fully examine the RH5 
> scripts to the extent i can say if you can just work with the OS 
> bonding configuration scheme not i can debug for you now why its not 
> working.
>
> Its definitely on our plan, but its P2 relative to the bonding changes 
> upstream push, let me know if you this different.
>
> Or.
>
It would be nice to be able to use the standard file modifications to 
perform IB bonding,
for consistency with how we handle other bonded interfaces.  If someone 
knows how
to do it, great, but if not, I agree it would be a lower priority 
investigation.

Thanks,
Bob


From hanafim.ctr at asc.hpc.mil  Thu Jun  7 06:54:46 2007
From: hanafim.ctr at asc.hpc.mil (MAHMOUD HANAFI)
Date: Thu, 07 Jun 2007 09:54:46 -0400
Subject: [ofa-general] OFED vs IB_GOLD IB_SRP Performance results
In-Reply-To: <200706061429.l56ETp4n012017@cmf.nrl.navy.mil>
References: <200706061429.l56ETp4n012017@cmf.nrl.navy.mil>
Message-ID: <46680E26.8080004@asc.hpc.mil>

The max_hw_sectors_kb was set at 4096KB. On the DDN I verified that the request sizes where correct.
I haven tried larger max_hw_sectors_kb it had no effect. Setting the cmd_per_lun=1 improved reads 
slightly but not much.

chas williams - CONTRACTOR wrote:
> In message <4665A1FA.1000506 at asc.hpc.mil>,MAHMOUD HANAFI writes:
>> OFED Setup:
>> /sys/module/ib_srp/mellanox_workarounds = 1
>> /sys/module/ib_srp/refcnt = 11
>> /sys/module/ib_srp/srp_sg_tablesize = 256
>> /sys/module/ib_srp/topspin_workarounds = 1
>>
>> /sys/block/sdd/queue/max_sectors_kb = 4096
>> /sys/block/sdd/queue/nr_requests = 8192
>> /sys/block/sdd/queue/read_ahead_kb = 128
> 
> what is the max_hw_sectors_kb for the ofed target?  unless you specified
> max_sect= during login, i suspect you are getting the system defaults.
> typically this is 512 sectors i think, which is where your performance
> seems to start to diverge.
> 

-- 
Mahmoud Hanafi
Senior System Administrator
ASC/MSRC
www.asc.hpc.mil
2435 5th Street
WPAFB, OHIO 45433
(937) 255-1536


From sweitzen at cisco.com  Thu Jun  7 08:48:46 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Thu, 7 Jun 2007 08:48:46 -0700
Subject: [ofa-general] Re: ipoib / bonding and OFED
In-Reply-To: <4667B5FD.4070600@voltaire.com>
References: <3857BB049D83424D9DB82753D37CEA55459C41@taurus.voltaire.com><4657373E.2030903@hp.com>
	<465BDC90.5080305@voltaire.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303951259@xmb-sjc-216.amer.cisco.com>
	<466702A8.5080302@hp.com> <4667B5FD.4070600@voltaire.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA303A4544E@xmb-sjc-216.amer.cisco.com>

> Thanks for the feedback. OFED 1.2 supports both options, 
> however, I am 
> don't think that two HA solutions should be deployed at commercial 
> distributions. What is your take (bonding vs ha daemon) on 
> the correct 
> way to move fwd?

I don't know if I've said this in public, but I've stopped testing
ipoibtools HA as of OFED 1.2 rc2 and Cisco is only going to support
ib-bonding HA for our OFED 1.2 customers, as our testing has revealed
ib-bonding is more robust than ipoibtools.  I know I said this to
Tziporet at Sonoma, and she seemed to agree we could eventually remove
ipoibtools from OFED.

Scott


From mshefty at ichips.intel.com  Thu Jun  7 10:41:54 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 07 Jun 2007 10:41:54 -0700
Subject: [ofa-general] Re: [Query] ib add path record cache
In-Reply-To: <309a667c0706070255x67de7850h209831c2f522dc2c@mail.gmail.com>
References: <000f01c7a7f0$1067dba0$11c8180a@amr.corp.intel.com>	
	<1181123111.12997.147451.camel@hal.voltaire.com>
	<309a667c0706070255x67de7850h209831c2f522dc2c@mail.gmail.com>
Message-ID: <46684362.10403@ichips.intel.com>

> Please anybody just tell me about the idea of _distributed SA_ in
> short. Is it a pre-planed activity which is yet to be implemented with
> the OFED? or its just an extension of the sa cache pre-loading
> discussion?

I'm thinking of a distributed component that can perform a limited set 
of SA functionality.  The sa cache is close in that it can respond to 
path record queries via an API call.  If the sa cache could respond to 
actual PR query MADs, IMO it then becomes a very simple distributed SA.

This idea came from trying to decide on the best way to pre-load the 
cache.  By using a MAD interface, I think we get several advantages:

* The existing userspace MAD interfaces could be used, which avoids 
adding a userspace interface for just the cache.

* The existing code in the sa cache used to process PR query responses 
is re-used.  (I.e. I anticipate that the kernel changes needed to 
support pre-loading to be fairly small.)

* We have a framework that can be used to load the entire cache, add a 
specific set of PRs, and remove specific PRs.

* The cache becomes accessible from remote systems - both for loading 
the cache as well as for queries.

So, I think that using a MAD interface to preload the cache is a 
relatively simple change, but gives us additional flexibility.  And to 
be clear, I'm not suggesting that we implement additional functionality, 
just that we have the framework available.

- Sean


From rdreier at cisco.com  Thu Jun  7 11:59:17 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 07 Jun 2007 11:59:17 -0700
Subject: [ofa-general] Re: [PATCH 2/2] IB/mlx4_ib: fix SRQ buffer allocation
In-Reply-To: <1181133679.10841.66.camel@mtls03> (Eli Cohen's message of "Wed,
	06 Jun 2007 15:40:21 +0300")
References: <1181133679.10841.66.camel@mtls03>
Message-ID: <ada3b13egl6.fsf@cisco.com>

Thanks... I reworked this a lot and right now I plan to push the
following (although I'm still testing):

commit df104b2036ea2ddf114b37a99fe833f2253a7098
Author: Roland Dreier <rolandd at cisco.com>
Date:   Thu Jun 7 11:52:02 2007 -0700

    IB/mlx4: Make sure RQ allocation is always valid
    
    QPs attached to an SRQ must never have their own RQ, and QPs not
    attached to SRQs must have an RQ with at least 1 entry.  Enforce all
    of this in set_rq_size().
    
    Based on a patch by Eli Cohen <eli at mellanox.co.il>.
    
    Signed-off-by: Roland Dreier <rolandd at cisco.com>

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index cd22975..5c6d054 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -189,18 +189,28 @@ static int send_wqe_overhead(enum ib_qp_type type)
 }
 
 static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
-		       struct mlx4_ib_qp *qp)
+		       int is_user, int has_srq, struct mlx4_ib_qp *qp)
 {
 	/* Sanity check RQ size before proceeding */
 	if (cap->max_recv_wr  > dev->dev->caps.max_wqes  ||
 	    cap->max_recv_sge > dev->dev->caps.max_rq_sg)
 		return -EINVAL;
 
-	qp->rq.max = cap->max_recv_wr ? roundup_pow_of_two(cap->max_recv_wr) : 0;
+	if (has_srq) {
+		/* QPs attached to an SRQ should have no RQ */
+		if (cap->max_recv_wr)
+			return -EINVAL;
 
-	qp->rq.wqe_shift = ilog2(roundup_pow_of_two(cap->max_recv_sge *
-						    sizeof (struct mlx4_wqe_data_seg)));
-	qp->rq.max_gs    = (1 << qp->rq.wqe_shift) / sizeof (struct mlx4_wqe_data_seg);
+		qp->rq.max = qp->rq.max_gs = 0;
+	} else {
+		/* HW requires >= 1 RQ entry with >= 1 gather entry */
+		if (is_user && (!cap->max_recv_wr || !cap->max_recv_sge))
+			return -EINVAL;
+
+		qp->rq.max	 = roundup_pow_of_two(max(1, cap->max_recv_wr));
+		qp->rq.max_gs	 = roundup_pow_of_two(max(1, cap->max_recv_sge));
+		qp->rq.wqe_shift = ilog2(qp->rq.max_gs * sizeof (struct mlx4_wqe_data_seg));
+	}
 
 	cap->max_recv_wr  = qp->rq.max;
 	cap->max_recv_sge = qp->rq.max_gs;
@@ -285,7 +295,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 	qp->sq.head	    = 0;
 	qp->sq.tail	    = 0;
 
-	err = set_rq_size(dev, &init_attr->cap, qp);
+	err = set_rq_size(dev, &init_attr->cap, !!pd->uobject, !!init_attr->srq, qp);
 	if (err)
 		goto err;
 

From pradeeps at linux.vnet.ibm.com  Thu Jun  7 14:14:53 2007
From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana)
Date: Thu, 07 Jun 2007 14:14:53 -0700
Subject: [ofa-general] IPOIB CM (NOSRQ) patches 
Message-ID: <4668754D.5080309@linux.vnet.ibm.com>

I have incorporated the IPOIB CM (NOSRQ) review comments and subsequent
discussions on this mailing list into a couple of patches (to follow).

The first patch will be V5 of the NOSRQ patch.

The second patch will be an extension of the NOSRQ patch, to
handle the corner case of running out of RC QPs. In that case
this patch enables switching to UD mode. Existing RC QPs should
remain unaffected.

Pradeep


From pradeeps at linux.vnet.ibm.com  Thu Jun  7 14:18:46 2007
From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana)
Date: Thu, 07 Jun 2007 14:18:46 -0700
Subject: [ofa-general] IPOIB CM (NOSRQ)[PATCH V5] patch
Message-ID: <46687636.5050101@linux.vnet.ibm.com>

Here is a fifth version of the IPOIB_CM_NOSRQ patch. This patch will
benefit adapters that do not support shared receive queues.

This patch incorporates the following review comments and subsequent
discussions on this mailing list from v4:

1. Reduce the number of if(srq) tests in the packet receive path
2. Incorporates mechanisms to limit the NOSRQ footprint to 1GB and a max
of 128 RC QPs (by default). Both are tunable options.
3. Updated the patch against Roland's for-2.6.23 git tree (derived on
05/30)

This patch has been tested with linux-2.6.22-rc3 derived from Roland's
for-2.6.23 git tree, using Topspin and IBM HCAs on ppc64 machines.

Signed-off-by: Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>
---

--- a/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib.h	2007-05-30 
14:56:25.000000000 -0400
+++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib.h	2007-06-02 
18:59:41.000000000 -0400
@@ -95,11 +95,17 @@ enum {
  	IPOIB_MCAST_FLAG_ATTACHED = 3,
  };

+#define SIXTY_FOUR_K (1ul << 16)
+#define MEGA_BYTE (1ul << 20)
  #define	IPOIB_OP_RECV   (1ul << 31)
  #ifdef CONFIG_INFINIBAND_IPOIB_CM
-#define	IPOIB_CM_OP_SRQ (1ul << 30)
+#define	IPOIB_CM_OP_RECV (1ul << 30)
+
+#define NOSRQ_INDEX_TABLE_SIZE 128
+#define NOSRQ_INDEX_MASK      (NOSRQ_INDEX_TABLE_SIZE -1)
+
  #else
-#define	IPOIB_CM_OP_SRQ (0)
+#define	IPOIB_CM_OP_RECV (0)
  #endif

  /* structs */
@@ -166,11 +172,14 @@ enum ipoib_cm_state {
  };

  struct ipoib_cm_rx {
-	struct ib_cm_id     *id;
-	struct ib_qp        *qp;
-	struct list_head     list;
-	struct net_device   *dev;
-	unsigned long        jiffies;
+	struct ib_cm_id     	*id;
+	struct ib_qp        	*qp;
+	struct ipoib_cm_rx_buf  *rx_ring; /* Used by NOSRQ only */
+	struct list_head     	 list;
+	struct net_device   	*dev;
+	unsigned long        	 jiffies;
+	u32                      index; /* wr_ids are distinguished by index
+					 * to identify the QP -NOSRQ only */
  	enum ipoib_cm_state  state;
  };

@@ -215,6 +224,8 @@ struct ipoib_cm_dev_priv {
  	struct ib_wc            ibwc[IPOIB_NUM_WC];
  	struct ib_sge           rx_sge[IPOIB_CM_RX_SG];
  	struct ib_recv_wr       rx_wr;
+	struct ipoib_cm_rx	**rx_index_table; /* See ipoib_cm_dev_init()
+						   *for usage of this element */
  };

  /*
@@ -564,10 +575,9 @@ static inline void ipoib_cm_skb_too_long
  	dev_kfree_skb_any(skb);
  }

-static inline void ipoib_cm_handle_rx_wc(struct net_device *dev, struct 
ib_wc *wc)
+void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
  {
  }
-
  #endif

  #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
--- a/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
2007-06-05 18:01:38.000000000 -0400
+++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
2007-06-07 11:05:13.000000000 -0400
@@ -49,6 +49,16 @@ MODULE_PARM_DESC(cm_data_debug_level,

  #include "ipoib.h"

+int max_rc_qp = NOSRQ_INDEX_TABLE_SIZE;
+int max_recv_buf = 1024; /* Default is 1024 MB */
+
+module_param_named(nosrq_max_rc_qp, max_rc_qp, int, 0644);
+MODULE_PARM_DESC(nosrq_max_rc_qp, "Max number of NOSRQ RC QPs supported");
+
+module_param_named(max_recieve_buffer, max_recv_buf, int, 0644);
+MODULE_PARM_DESC(max_recieve_buffer, "Max Recieve Buffer Size in MB");
+
+int current_rc_qp = 0; /* Active RC QPs for NOSRQ */
  #define IPOIB_CM_IETF_ID 0x1000000000000000ULL

  #define IPOIB_CM_RX_UPDATE_TIME (256 * HZ)
@@ -88,20 +98,20 @@ static void ipoib_cm_dma_unmap_rx(struct
  		ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, 
DMA_FROM_DEVICE);
  }

-static int ipoib_cm_post_receive(struct net_device *dev, int id)
+static int post_receive_srq(struct net_device *dev, u64 id)
  {
  	struct ipoib_dev_priv *priv = netdev_priv(dev);
  	struct ib_recv_wr *bad_wr;
  	int i, ret;

-	priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ;
+	priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_RECV;

  	for (i = 0; i < IPOIB_CM_RX_SG; ++i)
  		priv->cm.rx_sge[i].addr = priv->cm.srq_ring[id].mapping[i];

  	ret = ib_post_srq_recv(priv->cm.srq, &priv->cm.rx_wr, &bad_wr);
  	if (unlikely(ret)) {
-		ipoib_warn(priv, "post srq failed for buf %d (%d)\n", id, ret);
+		ipoib_warn(priv, "post srq failed for buf %ld (%d)\n", id, ret);
  		ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1,
  				      priv->cm.srq_ring[id].mapping);
  		dev_kfree_skb_any(priv->cm.srq_ring[id].skb);
@@ -111,12 +121,47 @@ static int ipoib_cm_post_receive(struct
  	return ret;
  }

-static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, 
int id, int frags,
+static int post_receive_nosrq(struct net_device *dev, u64 id)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ib_recv_wr *bad_wr;
+	int i, ret;
+	u32 index;
+	u32 wr_id;
+	struct ipoib_cm_rx *rx_ptr;
+
+	index = id  & NOSRQ_INDEX_MASK ;
+	wr_id = id >> 32;
+
+	rx_ptr = priv->cm.rx_index_table[index];
+
+	priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_RECV;
+
+	for (i = 0; i < IPOIB_CM_RX_SG; ++i)
+		priv->cm.rx_sge[i].addr = rx_ptr->rx_ring[wr_id].mapping[i];
+
+	ret = ib_post_recv(rx_ptr->qp, &priv->cm.rx_wr, &bad_wr);
+	if (unlikely(ret)) {
+		ipoib_warn(priv, "post recv failed for buf %d (%d)\n",
+		           wr_id, ret);
+		ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1,
+		                      rx_ptr->rx_ring[wr_id].mapping);
+		dev_kfree_skb_any(rx_ptr->rx_ring[wr_id].skb);
+		rx_ptr->rx_ring[wr_id].skb = NULL;
+	}
+
+	return ret;
+}
+
+static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, 
u64 id,
+					     int frags,
  					     u64 mapping[IPOIB_CM_RX_SG])
  {
  	struct ipoib_dev_priv *priv = netdev_priv(dev);
  	struct sk_buff *skb;
  	int i;
+	struct ipoib_cm_rx *rx_ptr;
+	u32 index, wr_id;

  	skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12);
  	if (unlikely(!skb))
@@ -148,7 +193,14 @@ static struct sk_buff *ipoib_cm_alloc_rx
  			goto partial_error;
  	}

-	priv->cm.srq_ring[id].skb = skb;
+	if (priv->cm.srq)
+		priv->cm.srq_ring[id].skb = skb;
+	else {
+		index = id  & NOSRQ_INDEX_MASK ;
+		wr_id = id >> 32;
+		rx_ptr = priv->cm.rx_index_table[index];
+		rx_ptr->rx_ring[wr_id].skb = skb;
+	}
  	return skb;

  partial_error:
@@ -205,16 +257,21 @@ static struct ib_qp *ipoib_cm_create_rx_
  {
  	struct ipoib_dev_priv *priv = netdev_priv(dev);
  	struct ib_qp_init_attr attr = {
-		.event_handler = ipoib_cm_rx_event_handler,
  		.send_cq = priv->cq, /* For drain WR */
  		.recv_cq = priv->cq,
  		.srq = priv->cm.srq,
  		.cap.max_send_wr = 1, /* For drain WR */
+		.cap.max_recv_wr = ipoib_recvq_size + 1,
  		.cap.max_send_sge = 1, /* FIXME: 0 Seems not to work */
  		.sq_sig_type = IB_SIGNAL_ALL_WR,
  		.qp_type = IB_QPT_RC,
  		.qp_context = p,
  	};
+	if (!priv->cm.srq) {
+		attr.cap.max_recv_sge = IPOIB_CM_RX_SG;	
+		attr.event_handler = NULL;
+	} else
+		attr.event_handler = ipoib_cm_rx_event_handler;
  	return ib_create_qp(priv->pd, &attr);
  }

@@ -289,12 +346,118 @@ static int ipoib_cm_send_rep(struct net_
  	rep.flow_control = 0;
  	rep.rnr_retry_count = req->rnr_retry_count;
  	rep.target_ack_delay = 20; /* FIXME */
-	rep.srq = 1;
  	rep.qp_num = qp->qp_num;
  	rep.starting_psn = psn;
+	rep.srq	= !!priv->cm.srq;
  	return ib_send_cm_rep(cm_id, &rep);
  }

+static void init_context_and_add_list(struct ib_cm_id *cm_id,
+				    struct ipoib_cm_rx *p,
+				    struct ipoib_dev_priv *priv)
+{
+	cm_id->context = p;
+	p->jiffies = jiffies;
+	spin_lock_irq(&priv->lock);
+	if (list_empty(&priv->cm.passive_ids))
+		queue_delayed_work(ipoib_workqueue,
+				   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
+	list_add(&p->list, &priv->cm.passive_ids);
+	spin_unlock_irq(&priv->lock);
+}
+
+static int allocate_and_post_rbuf_nosrq(struct ib_cm_id *cm_id,
+				        struct ipoib_cm_rx *p, unsigned psn)
+{
+	struct net_device *dev = cm_id->context;
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int ret;
+	u32 qp_num, index;
+	u64 i, recv_mem_used;
+
+	qp_num = p->qp->qp_num;
+
+	/* In the SRQ case there is a common rx buffer called the srq_ring.
+	 * However, for the NOSRQ we create an rx_ring for every
+	 * struct ipoib_cm_rx.
+	 */
+	p->rx_ring = kzalloc(ipoib_recvq_size * sizeof *p->rx_ring, GFP_KERNEL);
+	if (!p->rx_ring) {
+		printk(KERN_WARNING "Failed to allocate rx_ring for 0x%x\n",
+		       qp_num);
+		return -ENOMEM;
+	}
+
+	init_context_and_add_list(cm_id, p, priv);
+	spin_lock_irq(&priv->lock);
+		
+	for (index = 0; index < max_rc_qp; index++)
+		if (priv->cm.rx_index_table[index] == NULL)
+			break;
+
+	recv_mem_used = (u64)ipoib_recvq_size * (u64)current_rc_qp *
+		        SIXTY_FOUR_K;
+	if ((index == max_rc_qp) ||
+	( recv_mem_used >= max_recv_buf * MEGA_BYTE)) {
+		spin_unlock_irq(&priv->lock);
+		ipoib_warn(priv, "NOSRQ has reached the configurable limit "
+		           "of either %d RC QPs or, max recv buf size of "
+			   "0x%lx MB\n", max_rc_qp, max_recv_buf * MEGA_BYTE);
+
+		/* We send a REJ to the remote side indicating that we
+		 * have no more free RC QPs and leave it to the remote side
+		 * to take appropriate action. This should leave the
+		 * current set of QPs unaffected and any subsequent REQs
+		 * will be able to use RC QPs if they are available.
+		 */
+		ib_send_cm_rej(cm_id, IB_CM_REJ_NO_QP, NULL, 0, NULL, 0);
+		ret = -EINVAL;
+		goto err_send_rej;
+	}
+
+	priv->cm.rx_index_table[index] = p;
+	spin_unlock_irq(&priv->lock);
+
+	/* We will subsequently use this stored pointer while freeing
+	 * resources in stale task */
+	p->index = index;
+
+	ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
+	if (ret) {
+		ipoib_warn(priv, "ipoib_cm_modify_rx_qp() failed %d\n", ret);
+		ipoib_cm_dev_cleanup(dev);
+		goto err_modify_nosrq;
+	}
+
+	for (i = 0; i < ipoib_recvq_size; ++i) {
+		if (!ipoib_cm_alloc_rx_skb(dev, i << 32 | index,
+					   IPOIB_CM_RX_SG - 1,
+					   p->rx_ring[i].mapping)) {
+			ipoib_warn(priv, "failed to allocate receive "
+			           "buffer %ld\n", i);
+			ipoib_cm_dev_cleanup(dev);
+			ret = -ENOMEM;
+			goto err_alloc_and_post;
+		}
+
+		if (post_receive_nosrq(dev, i << 32 | index)) {
+			ipoib_warn(priv, "post_receive_nosrq "
+			           "failed for  buf %ld\n", i);
+			ipoib_cm_dev_cleanup(dev);
+			ret = -EIO;
+			goto err_alloc_and_post;
+		}
+	}
+
+	return 0;
+
+err_send_rej:
+err_modify_nosrq:
+err_alloc_and_post:
+	kfree(p->rx_ring);
+	return ret;
+}
+
  static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct 
ib_cm_event *event)
  {
  	struct net_device *dev = cm_id->context;
@@ -305,8 +468,11 @@ static int ipoib_cm_req_handler(struct i

  	ipoib_dbg(priv, "REQ arrived\n");
  	p = kzalloc(sizeof *p, GFP_KERNEL);
-	if (!p)
+	if (!p) {
+		printk(KERN_WARNING "Failed to allocate RX control block when "
+		       "REQ arrived\n");
  		return -ENOMEM;
+	}
  	p->dev = dev;
  	p->id = cm_id;
  	p->qp = ipoib_cm_create_rx_qp(dev, p);
@@ -316,9 +482,16 @@ static int ipoib_cm_req_handler(struct i
  	}

  	psn = random32() & 0xffffff;
-	ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
-	if (ret)
-		goto err_modify;
+	if (!priv->cm.srq) {
+		current_rc_qp++;
+		if (ret = allocate_and_post_rbuf_nosrq(cm_id, p, psn))
+			goto err_post_nosrq;
+	} else {
+		p->rx_ring = NULL;
+		ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
+		if (ret)
+			goto err_modify;
+	}

  	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
  	if (ret) {
@@ -326,18 +499,16 @@ static int ipoib_cm_req_handler(struct i
  		goto err_rep;
  	}

-	cm_id->context = p;
-	p->jiffies = jiffies;
-	p->state = IPOIB_CM_RX_LIVE;
-	spin_lock_irq(&priv->lock);
-	if (list_empty(&priv->cm.passive_ids))
-		queue_delayed_work(ipoib_workqueue,
-				   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
-	list_add(&p->list, &priv->cm.passive_ids);
-	spin_unlock_irq(&priv->lock);
+	if (priv->cm.srq) {
+		init_context_and_add_list(cm_id, p, priv);
+		p->state = IPOIB_CM_RX_LIVE;
+	}
  	return 0;

  err_rep:
+err_post_nosrq:
+	list_del_init(&p->list);
+	current_rc_qp--;
  err_modify:
  	ib_destroy_qp(p->qp);
  err_qp:
@@ -401,21 +572,51 @@ static void skb_put_frags(struct sk_buff
  	}
  }

-void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
+static void timer_check_srq(struct ipoib_dev_priv *priv, struct 
ipoib_cm_rx *p)
+{
+	unsigned long flags;
+
+	if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) {
+		spin_lock_irqsave(&priv->lock, flags);
+		p->jiffies = jiffies;
+		/* Move this entry to list head, but do
+		 * not re-add it if it has been removed. */
+		if (p->state == IPOIB_CM_RX_LIVE)
+			list_move(&p->list, &priv->cm.passive_ids);
+		spin_unlock_irqrestore(&priv->lock, flags);
+	}
+}
+
+static void timer_check_nosrq(struct ipoib_dev_priv *priv, struct 
ipoib_cm_rx *p)
+{
+	unsigned long flags;
+
+	if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) {
+		spin_lock_irqsave(&priv->lock, flags);
+		p->jiffies = jiffies;
+		/* Move this entry to list head, but do
+		 * not re-add it if it has been removed. */
+		if (!list_empty(&p->list))	
+			list_move(&p->list, &priv->cm.passive_ids);
+		spin_unlock_irqrestore(&priv->lock, flags);
+	}
+}
+
+void handle_rx_wc_srq(struct net_device *dev, struct ib_wc *wc)
  {
  	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ;
+	u64 wr_id = wc->wr_id & ~IPOIB_CM_OP_RECV;
  	struct sk_buff *skb, *newskb;
  	struct ipoib_cm_rx *p;
  	unsigned long flags;
  	u64 mapping[IPOIB_CM_RX_SG];
-	int frags;
+	int frags, ret;

  	ipoib_dbg_data(priv, "cm recv completion: id %d, status: %d\n",
  		       wr_id, wc->status);

  	if (unlikely(wr_id >= ipoib_recvq_size)) {
-		if (wr_id == (IPOIB_CM_RX_DRAIN_WRID & ~IPOIB_CM_OP_SRQ)) {
+		if (wr_id == (IPOIB_CM_RX_DRAIN_WRID & ~IPOIB_CM_OP_RECV)) {
  			spin_lock_irqsave(&priv->lock, flags);
  			list_splice_init(&priv->cm.rx_drain_list, &priv->cm.rx_reap_list);
  			ipoib_cm_start_rx_drain(priv);
@@ -434,20 +635,12 @@ void ipoib_cm_handle_rx_wc(struct net_de
  			   "(status=%d, wrid=%d vend_err %x)\n",
  			   wc->status, wr_id, wc->vendor_err);
  		++priv->stats.rx_dropped;
-		goto repost;
+		goto repost_srq;
  	}

  	if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) {
  		p = wc->qp->qp_context;
-		if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) {
-			spin_lock_irqsave(&priv->lock, flags);
-			p->jiffies = jiffies;
-			/* Move this entry to list head, but do not re-add it
-			 * if it has been moved out of list. */
-			if (p->state == IPOIB_CM_RX_LIVE)
-				list_move(&p->list, &priv->cm.passive_ids);
-			spin_unlock_irqrestore(&priv->lock, flags);
-		}
+		timer_check_srq(priv, p);
  	}

  	frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len,
@@ -459,13 +652,113 @@ void ipoib_cm_handle_rx_wc(struct net_de
  		 * If we can't allocate a new RX buffer, dump
  		 * this packet and reuse the old buffer.
  		 */
-		ipoib_dbg(priv, "failed to allocate receive buffer %d\n", wr_id);
+		ipoib_dbg(priv, "failed to allocate receive buffer %ld\n", wr_id);
+                ++priv->stats.rx_dropped;
+                goto repost_srq;
+        }
+
+	ipoib_cm_dma_unmap_rx(priv, frags,
+	                      priv->cm.srq_ring[wr_id].mapping);
+	memcpy(priv->cm.srq_ring[wr_id].mapping, mapping,
+	       (frags + 1) * sizeof *mapping);
+	ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n",
+		       wc->byte_len, wc->slid);
+
+	skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len, newskb);
+
+	skb->protocol = ((struct ipoib_header *) skb->data)->proto;
+	skb_reset_mac_header(skb);	
+	skb_pull(skb, IPOIB_ENCAP_LEN);
+
+	dev->last_rx = jiffies;
+	++priv->stats.rx_packets;
+	priv->stats.rx_bytes += skb->len;
+
+	skb->dev = dev;
+	/* XXX get correct PACKET_ type here */
+	skb->pkt_type = PACKET_HOST;
+	netif_rx_ni(skb);
+
+repost_srq:
+	ret = post_receive_srq(dev, wr_id);
+
+	if (unlikely(ret))
+		ipoib_warn(priv, "post_receive_srq failed for buf %ld\n",
+		           wr_id);
+
+}
+
+static void handle_rx_wc_nosrq(struct net_device *dev, struct ib_wc *wc)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct sk_buff *skb, *newskb;
+	u64 mapping[IPOIB_CM_RX_SG], wr_id = wc->wr_id >> 32;
+	u32 index;
+	struct ipoib_cm_rx *p, *rx_ptr;
+	int frags, ret;
+
+
+	ipoib_dbg_data(priv, "cm recv completion: id %d, status: %d\n",
+		       wr_id, wc->status);
+
+	if (unlikely(wr_id >= ipoib_recvq_size)) {
+		ipoib_warn(priv, "cm recv completion event with wrid %d (> %d)\n",
+				   wr_id, ipoib_recvq_size);
+		return;
+	}
+
+	index = (wc->wr_id & ~IPOIB_CM_OP_RECV) & NOSRQ_INDEX_MASK ;
+
+	/* This is the only place where rx_ptr could be a NULL - could
+	 * have just received a packet from a connection that has become
+	 * stale and so is going away. We will simply drop the packet and
+	 * let the hardware (it s IB_QPT_RC) handle the dropped packet.
+	 * In the timer_check() function below, p->jiffies is updated and
+	 * hence the connection will not be stale after that.
+	 */
+	rx_ptr = priv->cm.rx_index_table[index];
+	if (unlikely(!rx_ptr)) {
+		ipoib_warn(priv, "Received packet from a connection "
+		           "that is going away. Hardware will handle it.\n");
+		return;
+	}
+
+	skb = rx_ptr->rx_ring[wr_id].skb;
+
+	if (unlikely(wc->status != IB_WC_SUCCESS)) {
+		ipoib_dbg(priv, "cm recv error "
+			   "(status=%d, wrid=%ld vend_err %x)\n",
+			   wc->status, wr_id, wc->vendor_err);
+		++priv->stats.rx_dropped;
+		goto repost_nosrq;
+	}
+
+	if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) {
+		/* There are no guarantees that wc->qp is not NULL for HCAs
+	 	* that do not support SRQ. */
+		p = rx_ptr;
+		timer_check_nosrq(priv, p);
+	}
+
+	frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len,
+					      (unsigned)IPOIB_CM_HEAD_SIZE)) / PAGE_SIZE;
+
+	newskb = ipoib_cm_alloc_rx_skb(dev, wr_id << 32 | index, frags,
+				       mapping);
+	if (unlikely(!newskb)) {
+		/*
+		 * If we can't allocate a new RX buffer, dump
+		 * this packet and reuse the old buffer.
+		 */
+		ipoib_dbg(priv, "failed to allocate receive buffer %ld\n", wr_id);
  		++priv->stats.rx_dropped;
-		goto repost;
+		goto repost_nosrq;
  	}

-	ipoib_cm_dma_unmap_rx(priv, frags, priv->cm.srq_ring[wr_id].mapping);
-	memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, (frags + 1) * sizeof 
*mapping);
+	ipoib_cm_dma_unmap_rx(priv, frags,
+	                      rx_ptr->rx_ring[wr_id].mapping);
+	memcpy(rx_ptr->rx_ring[wr_id].mapping, mapping,
+	       (frags + 1) * sizeof *mapping);

  	ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n",
  		       wc->byte_len, wc->slid);
@@ -485,10 +778,22 @@ void ipoib_cm_handle_rx_wc(struct net_de
  	skb->pkt_type = PACKET_HOST;
  	netif_receive_skb(skb);

-repost:
-	if (unlikely(ipoib_cm_post_receive(dev, wr_id)))
-		ipoib_warn(priv, "ipoib_cm_post_receive failed "
-			   "for buf %d\n", wr_id);
+repost_nosrq:
+	ret = post_receive_nosrq(dev, wr_id << 32 | index);
+
+	if (unlikely(ret))
+		ipoib_warn(priv, "post_receive_nosrq failed for buf %ld\n",
+		           wr_id);
+}
+
+void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+
+	if (priv->cm.srq)
+		handle_rx_wc_srq(dev, wc);
+	else
+		handle_rx_wc_nosrq(dev, wc);
  }

  static inline int post_send(struct ipoib_dev_priv *priv,
@@ -680,6 +985,42 @@ err_cm:
  	return ret;
  }

+static void free_resources_nosrq(struct ipoib_dev_priv *priv, struct 
ipoib_cm_rx *p)
+{
+	int i;
+
+	for(i = 0; i < ipoib_recvq_size; ++i)
+		if(p->rx_ring[i].skb) {
+			ipoib_cm_dma_unmap_rx(priv,
+				         IPOIB_CM_RX_SG - 1,
+					 p->rx_ring[i].mapping);
+			dev_kfree_skb_any(p->rx_ring[i].skb);
+			p->rx_ring[i].skb = NULL;
+		}
+	kfree(p->rx_ring);
+}
+
+void dev_stop_nosrq(struct ipoib_dev_priv *priv)
+{
+	struct ipoib_cm_rx *p;
+
+	spin_lock_irq(&priv->lock);
+	while (!list_empty(&priv->cm.passive_ids)) {
+		p = list_entry(priv->cm.passive_ids.next, typeof(*p), list);
+		free_resources_nosrq(priv, p);
+		list_del_init(&p->list);
+		spin_unlock_irq(&priv->lock);
+		ib_destroy_cm_id(p->id);
+		ib_destroy_qp(p->qp);
+		current_rc_qp--;
+		kfree(p);
+		spin_lock_irq(&priv->lock);
+	}
+	spin_unlock_irq(&priv->lock);
+
+	cancel_delayed_work(&priv->cm.stale_task);
+}
+
  void ipoib_cm_dev_stop(struct net_device *dev)
  {
  	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -694,6 +1035,11 @@ void ipoib_cm_dev_stop(struct net_device
  	ib_destroy_cm_id(priv->cm.id);
  	priv->cm.id = NULL;

+	if (!priv->cm.srq) {
+		dev_stop_nosrq(priv);
+		return;
+	}
+
  	spin_lock_irq(&priv->lock);
  	while (!list_empty(&priv->cm.passive_ids)) {
  		p = list_entry(priv->cm.passive_ids.next, typeof(*p), list);
@@ -739,6 +1085,7 @@ void ipoib_cm_dev_stop(struct net_device
  		kfree(p);
  	}

+
  	cancel_delayed_work(&priv->cm.stale_task);
  }

@@ -817,7 +1164,9 @@ static struct ib_qp *ipoib_cm_create_tx_
  	attr.recv_cq = priv->cq;
  	attr.srq = priv->cm.srq;
  	attr.cap.max_send_wr = ipoib_sendq_size;
+	attr.cap.max_recv_wr = 1;
  	attr.cap.max_send_sge = 1;
+	attr.cap.max_recv_sge = 1;
  	attr.sq_sig_type = IB_SIGNAL_ALL_WR;
  	attr.qp_type = IB_QPT_RC;
  	attr.send_cq = cq;
@@ -857,7 +1206,7 @@ static int ipoib_cm_send_req(struct net_
  	req.retry_count 	      = 3; /* RFC draft warns against retries */
  	req.rnr_retry_count 	      = 0; /* RFC draft warns against retries */
  	req.max_cm_retries 	      = 15;
-	req.srq 	              = 1;
+	req.srq			      = !!priv->cm.srq;
  	return ib_send_cm_req(id, &req);
  }

@@ -1202,6 +1551,7 @@ static void ipoib_cm_rx_reap(struct work
  	list_for_each_entry_safe(p, n, &list, list) {
  		ib_destroy_cm_id(p->id);
  		ib_destroy_qp(p->qp);
+		current_rc_qp--;
  		kfree(p);
  	}
  }
@@ -1220,12 +1570,19 @@ static void ipoib_cm_stale_task(struct w
  		p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list);
  		if (time_before_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT))
  			break;
-		list_move(&p->list, &priv->cm.rx_error_list);
-		p->state = IPOIB_CM_RX_ERROR;
-		spin_unlock_irq(&priv->lock);
-		ret = ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE);
-		if (ret)
-			ipoib_warn(priv, "unable to move qp to error state: %d\n", ret);
+		if (!priv->cm.srq) {
+			free_resources_nosrq(priv, p);
+			list_del_init(&p->list);
+			priv->cm.rx_index_table[p->index] = NULL;
+			spin_unlock_irq(&priv->lock);
+		} else {
+			list_move(&p->list, &priv->cm.rx_error_list);
+			p->state = IPOIB_CM_RX_ERROR;
+			spin_unlock_irq(&priv->lock);
+			ret = ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE);
+			if (ret)
+				ipoib_warn(priv, "unable to move qp to error state: %d\n", ret);
+		}
  		spin_lock_irq(&priv->lock);
  	}

@@ -1279,16 +1636,40 @@ int ipoib_cm_add_mode_attr(struct net_de
  	return device_create_file(&dev->dev, &dev_attr_mode);
  }

+static int create_srq(struct net_device *dev, struct ipoib_dev_priv *priv)
+{
+	struct ib_srq_init_attr srq_init_attr;
+	int ret;
+
+	srq_init_attr.attr.max_wr = ipoib_recvq_size;
+	srq_init_attr.attr.max_sge = IPOIB_CM_RX_SG;
+
+	priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr);
+	if (IS_ERR(priv->cm.srq)) {
+		ret = PTR_ERR(priv->cm.srq);
+		priv->cm.srq = NULL;
+		return ret;
+	}
+
+	priv->cm.srq_ring = kzalloc(ipoib_recvq_size *
+		                    sizeof *priv->cm.srq_ring,
+			            GFP_KERNEL);
+	if (!priv->cm.srq_ring) {
+		printk(KERN_WARNING "%s: failed to allocate CM ring "
+		       "(%d entries)\n",
+	       	       priv->ca->name, ipoib_recvq_size);
+		ipoib_cm_dev_cleanup(dev);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
  int ipoib_cm_dev_init(struct net_device *dev)
  {
  	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	struct ib_srq_init_attr srq_init_attr = {
-		.attr = {
-			.max_wr  = ipoib_recvq_size,
-			.max_sge = IPOIB_CM_RX_SG
-		}
-	};
  	int ret, i;
+	struct ib_device_attr attr;

  	INIT_LIST_HEAD(&priv->cm.passive_ids);
  	INIT_LIST_HEAD(&priv->cm.reap_list);
@@ -1305,20 +1686,30 @@ int ipoib_cm_dev_init(struct net_device

  	skb_queue_head_init(&priv->cm.skb_queue);

-	priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr);
-	if (IS_ERR(priv->cm.srq)) {
-		ret = PTR_ERR(priv->cm.srq);
-		priv->cm.srq = NULL;
+	if (ret = ib_query_device(priv->ca, &attr))
  		return ret;
-	}

-	priv->cm.srq_ring = kzalloc(ipoib_recvq_size * sizeof *priv->cm.srq_ring,
-				    GFP_KERNEL);
-	if (!priv->cm.srq_ring) {
-		printk(KERN_WARNING "%s: failed to allocate CM ring (%d entries)\n",
-		       priv->ca->name, ipoib_recvq_size);
-		ipoib_cm_dev_cleanup(dev);
-		return -ENOMEM;
+	if (attr.max_srq) {
+		/* This device supports SRQ */
+		if (ret = create_srq(dev, priv))
+			return ret;
+		priv->cm.rx_index_table = NULL;
+	} else {
+		priv->cm.srq = NULL;
+		priv->cm.srq_ring = NULL;
+
+		/* Every new REQ that arrives creates a struct ipoib_cm_rx.
+		 * These structures form a link list starting with the
+		 * passive_ids. For quick and easy access we maintain a table
+		 * of pointers to struct ipoib_cm_rx called the rx_index_table
+		 */
+		priv->cm.rx_index_table = kzalloc(NOSRQ_INDEX_TABLE_SIZE *
+					 sizeof *priv->cm.rx_index_table,
+					 GFP_KERNEL);
+		if (!priv->cm.rx_index_table) {
+			printk(KERN_WARNING "Failed to allocate NOSRQ_INDEX_TABLE\n");
+			return -ENOMEM;
+		}	
  	}

  	for (i = 0; i < IPOIB_CM_RX_SG; ++i)
@@ -1331,17 +1722,23 @@ int ipoib_cm_dev_init(struct net_device
  	priv->cm.rx_wr.sg_list = priv->cm.rx_sge;
  	priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG;

-	for (i = 0; i < ipoib_recvq_size; ++i) {
-		if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1,
+	/* One can post receive buffers even before the RX QP is created
+	 * only in the SRQ case. Therefore for NOSRQ we skip the rest of init
+	 * and do that in ipoib_cm_req_handler() */
+
+	if (priv->cm.srq) {
+		for (i = 0; i < ipoib_recvq_size; ++i) {
+			if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1,
  					   priv->cm.srq_ring[i].mapping)) {
-			ipoib_warn(priv, "failed to allocate receive buffer %d\n", i);
-			ipoib_cm_dev_cleanup(dev);
-			return -ENOMEM;
-		}
-		if (ipoib_cm_post_receive(dev, i)) {
-			ipoib_warn(priv, "ipoib_ib_post_receive failed for buf %d\n", i);
-			ipoib_cm_dev_cleanup(dev);
-			return -EIO;
+				ipoib_warn(priv, "failed to allocate receive buffer %d\n", i);
+				ipoib_cm_dev_cleanup(dev);
+				return -ENOMEM;
+			}
+			if (post_receive_srq(dev, i)) {
+				ipoib_warn(priv, "post_receive_srq failed for buf %d\n", i);
+				ipoib_cm_dev_cleanup(dev);
+				return -EIO;
+			}
  		}
  	}

--- a/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
2007-05-30 14:56:25.000000000 -0400
+++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
2007-05-30 20:11:27.000000000 -0400
@@ -299,7 +299,7 @@ int ipoib_poll(struct net_device *dev, i
  		for (i = 0; i < n; ++i) {
  			struct ib_wc *wc = priv->ibwc + i;

-			if (wc->wr_id & IPOIB_CM_OP_SRQ) {
+			if (wc->wr_id & IPOIB_CM_OP_RECV) {
  				++done;
  				--max;
  				ipoib_cm_handle_rx_wc(dev, wc);
@@ -557,7 +557,7 @@ void ipoib_drain_cq(struct net_device *d
  	do {
  		n = ib_poll_cq(priv->cq, IPOIB_NUM_WC, priv->ibwc);
  		for (i = 0; i < n; ++i) {
-			if (priv->ibwc[i].wr_id & IPOIB_CM_OP_SRQ)
+			if (priv->ibwc[i].wr_id & IPOIB_CM_OP_RECV)
  				ipoib_cm_handle_rx_wc(dev, priv->ibwc + i);
  			else if (priv->ibwc[i].wr_id & IPOIB_OP_RECV)
  				ipoib_ib_handle_rx_wc(dev, priv->ibwc + i);
--- a/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 
2007-05-30 14:56:25.000000000 -0400
+++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 
2007-05-30 19:04:24.000000000 -0400
@@ -175,6 +175,15 @@ int ipoib_transport_dev_init(struct net_
  	if (!ret)
  		size += ipoib_recvq_size + 1 /* 1 extra for rx_drain_qp */;

+ 	/* We increase the size of the CQ in the NOSRQ case to prevent CQ
+ 	 * overflow. Every new REQ creates a new RX QP and each QP has an
+ 	 * RX ring associated with it. Therefore we could have
+ 	 * NOSRQ_INDEX_TABLE_SIZE*ipoib_recvq_size + ipoib_sendq_size CQEs
+ 	 * in a CQ.
+ 	 */
+ 	if(!priv->cm.srq)
+ 		size += (NOSRQ_INDEX_TABLE_SIZE -1)* ipoib_recvq_size;
+
  	priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, 
size, 0);
  	if (IS_ERR(priv->cq)) {
  		printk(KERN_WARNING "%s: failed to create CQ\n", ca->name);


From pradeeps at linux.vnet.ibm.com  Thu Jun  7 14:18:58 2007
From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana)
Date: Thu, 07 Jun 2007 14:18:58 -0700
Subject: [ofa-general] IPOIB CM (NOSRQ) extension
Message-ID: <46687642.8040208@linux.vnet.ibm.com>

This patch handles the corner case of running out of RC QPs. In that
case it switches to UD mode. This patch can be used both by NOSRQ and
SRQ code.

Signed-off-by: Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>
---

--- c/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
2007-06-07 11:13:55.000000000 -0400
+++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
2007-06-07 11:11:21.000000000 -0400
@@ -1383,6 +1383,11 @@ static int ipoib_cm_tx_handler(struct ib
  		break;
  	case IB_CM_REQ_ERROR:
  	case IB_CM_REJ_RECEIVED:
+		ipoib_warn(priv, "REJ received\n");
+		neigh = tx->neigh;
+		if (neigh)
+			clear_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags);
+		break;
  	case IB_CM_TIMEWAIT_EXIT:
  		ipoib_dbg(priv, "CM error %d.\n", event->event);
  		spin_lock_irq(&priv->tx_lock);
--- c/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_main.c 
2007-05-30 14:56:25.000000000 -0400
+++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_main.c 
2007-06-06 18:28:06.000000000 -0400
@@ -679,11 +679,10 @@ static int ipoib_start_xmit(struct sk_bu

  		neigh = *to_ipoib_neigh(skb->dst->neighbour);

-		if (ipoib_cm_get(neigh)) {
-			if (ipoib_cm_up(neigh)) {
+		if (ipoib_cm_get(neigh) &&  ipoib_cm_up(neigh) &&
+			test_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags)) {
  				ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
  				goto out;
-			}
  		} else if (neigh->ah) {
  			if (unlikely(memcmp(&neigh->dgid.raw,
  					    skb->dst->neighbour->ha + 4,


From steffen.persvold at scali.com  Thu Jun  7 17:08:39 2007
From: steffen.persvold at scali.com (Steffen Persvold)
Date: Thu, 7 Jun 2007 20:08:39 -0400
Subject: [ofa-general] OFED 1.2 and backwards binary compatibility
References: <465AE791.5040003@mellanox.co.il><A15335FBE9BD2449AF2C9EF3D1EB8EA303951123@xmb-sjc-216.amer.cisco.com>
	<465BD5B4.50003@mellanox.co.il>
Message-ID: <D6A583C768392A4D8B297C500CDD54B501573779@mse11be1.mse11.exchange.ms>

OFED Team,
 
Is intended that OFED 1.2 verbs library aren't binary backwards compatible ? In 1.2-rc4 libraries are still called :
 
/usr/lib/libibverbs.so.1
/usr/lib/libibverbs.so.1.0.0
/usr/lib64/libibverbs.so.1
/usr/lib64/libibverbs.so.1.0.0

Which is the same as in 1.0 and 1.1 and this indicates binary compatibility (at least to a naive user like myself).

The problem though is that I have applications compiled with OFED 1.0 and 1.1 (those releases are binary compatible btw, as far as my testing goes) that hang when running on OFED 1.2...

 
Some clarification on the policy would be nice. In my opinion, if they no longer are compatible (and a diff of verbs.h indicates that, changes in header structures) OFED 1.2 libraries should be named something else than .so.1.0.0
 
Comments appreciated.
 
Cheers,
Steffen Persvold
Technical Director Americas
tel. 508-281-7100 x401
fax. 508-281-7171

http://www.scali.com/
Scaling the Linux datacenter
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070607/6c303261/attachment.html>

From steffen.persvold at scali.com  Thu Jun  7 17:12:55 2007
From: steffen.persvold at scali.com (Steffen Persvold)
Date: Thu, 7 Jun 2007 20:12:55 -0400
Subject: [ofa-general] RE: [ewg] OFED 1.2 and backwards binary compatibility
References: <465AE791.5040003@mellanox.co.il><A15335FBE9BD2449AF2C9EF3D1EB8EA303951123@xmb-sjc-216.amer.cisco.com><465BD5B4.50003@mellanox.co.il>
	<D6A583C768392A4D8B297C500CDD54B501573779@mse11be1.mse11.exchange.ms>
Message-ID: <D6A583C768392A4D8B297C500CDD54B50157377A@mse11be1.mse11.exchange.ms>

Just to follow up, I believe at least these changes (there are more) to verbs.h breaks the compatibility :
 
@@ -469,8 +502,8 @@
 };
 struct ibv_send_wr {
-       struct ibv_send_wr     *next;
        uint64_t                wr_id;
+       struct ibv_send_wr     *next;
        struct ibv_sge         *sg_list;
        int                     num_sge;
        enum ibv_wr_opcode      opcode;
@@ -496,12 +529,21 @@
 };
 struct ibv_recv_wr {
-       struct ibv_recv_wr     *next;
        uint64_t                wr_id;
+       struct ibv_recv_wr     *next;
        struct ibv_sge         *sg_list;
        int                     num_sge;
 };
 
If this is intended, I would strongly suggest reversioning the libraries.
 
Cheers,
Steffen Persvold
Technical Director Americas
tel. 508-281-7100 x401
fax. 508-281-7171

http://www.scali.com/
Scaling the Linux datacenter

________________________________

From: ewg-bounces at lists.openfabrics.org on behalf of Steffen Persvold
Sent: Thu 6/7/2007 8:08 PM
Cc: EWG; OpenFabrics General
Subject: [ewg] OFED 1.2 and backwards binary compatibility


OFED Team,
 
Is intended that OFED 1.2 verbs library aren't binary backwards compatible ? In 1.2-rc4 libraries are still called :
 
/usr/lib/libibverbs.so.1
/usr/lib/libibverbs.so.1.0.0
/usr/lib64/libibverbs.so.1
/usr/lib64/libibverbs.so.1.0.0

Which is the same as in 1.0 and 1.1 and this indicates binary compatibility (at least to a naive user like myself).

The problem though is that I have applications compiled with OFED 1.0 and 1.1 (those releases are binary compatible btw, as far as my testing goes) that hang when running on OFED 1.2...

 
Some clarification on the policy would be nice. In my opinion, if they no longer are compatible (and a diff of verbs.h indicates that, changes in header structures) OFED 1.2 libraries should be named something else than .so.1.0.0
 
Comments appreciated.
 
Cheers,
Steffen Persvold
Technical Director Americas
tel. 508-281-7100 x401
fax. 508-281-7171

http://www.scali.com/
Scaling the Linux datacenter
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070607/31e7bc9a/attachment.html>

From rdreier at cisco.com  Thu Jun  7 18:41:00 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 07 Jun 2007 18:41:00 -0700
Subject: [ofa-general] Re: [ewg] OFED 1.2 and backwards binary compatibility
In-Reply-To: <D6A583C768392A4D8B297C500CDD54B501573779@mse11be1.mse11.exchange.ms>
	(Steffen Persvold's message of "Thu, 7 Jun 2007 20:08:39 -0400")
References: <465AE791.5040003@mellanox.co.il>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303951123@xmb-sjc-216.amer.cisco.com>
	<465BD5B4.50003@mellanox.co.il>
	<D6A583C768392A4D8B297C500CDD54B501573779@mse11be1.mse11.exchange.ms>
Message-ID: <adaejkncjf7.fsf@cisco.com>

 > Is intended that OFED 1.2 verbs library aren't binary backwards compatible ? In 1.2-rc4 libraries are still called :
 >  
 > /usr/lib/libibverbs.so.1
 > /usr/lib/libibverbs.so.1.0.0
 > /usr/lib64/libibverbs.so.1
 > /usr/lib64/libibverbs.so.1.0.0
 > 
 > Which is the same as in 1.0 and 1.1 and this indicates binary compatibility (at least to a naive user like myself).
 > 
 > The problem though is that I have applications compiled with OFED 1.0 and 1.1 (those releases are binary compatible btw, as far as my testing goes) that hang when running on OFED 1.2...

The intention is that libibverbs 1.0 and 1.1 *are* binary compatible
via a versioned ABI.  Applications linked against libibverbs 1.0 will
link against the IBVERBS_1.0 ABI, and should still work when run with
libibverbs 1.1.

It would be useful to get more information about where and how your
applications hang.  During development of the compatibility code of
libibverbs 1.1, I tested various things such as building Open MPI
against libibverbs 1.0 and running with libibverbs 1.1, and it all
worked.  However it's quite possible that there are bugs in the ABI
compatibility code.

 - R.


From rdreier at cisco.com  Thu Jun  7 18:42:16 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 07 Jun 2007 18:42:16 -0700
Subject: [ofa-general] RE: [ewg] OFED 1.2 and backwards binary
	compatibility
In-Reply-To: <D6A583C768392A4D8B297C500CDD54B50157377A@mse11be1.mse11.exchange.ms>
	(Steffen Persvold's message of "Thu, 7 Jun 2007 20:12:55 -0400")
References: <465AE791.5040003@mellanox.co.il>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303951123@xmb-sjc-216.amer.cisco.com>
	<465BD5B4.50003@mellanox.co.il>
	<D6A583C768392A4D8B297C500CDD54B501573779@mse11be1.mse11.exchange.ms>
	<D6A583C768392A4D8B297C500CDD54B50157377A@mse11be1.mse11.exchange.ms>
Message-ID: <adaabvbcjd3.fsf@cisco.com>

 > Just to follow up, I believe at least these changes (there are more) to verbs.h breaks the compatibility :
 >  
 > @@ -469,8 +502,8 @@
 >  };
 >  struct ibv_send_wr {
 > -       struct ibv_send_wr     *next;
 >         uint64_t                wr_id;
 > +       struct ibv_send_wr     *next;
 >         struct ibv_sge         *sg_list;
 >         int                     num_sge;
 >         enum ibv_wr_opcode      opcode;
 > @@ -496,12 +529,21 @@
 >  };
 >  struct ibv_recv_wr {
 > -       struct ibv_recv_wr     *next;
 >         uint64_t                wr_id;
 > +       struct ibv_recv_wr     *next;
 >         struct ibv_sge         *sg_list;
 >         int                     num_sge;
 >  };

These differences should be taken care of by the
post_send_wrapper_1_0() and post_recv_wrapper_1_0() functions in
src/compat-1_0.c in libibverbs 1.1.

 - R.


From vlad at lists.openfabrics.org  Fri Jun  8 02:40:50 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Fri,  8 Jun 2007 02:40:50 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070608-0200 daily build status
Message-ID: <20070608094050.7B45DE60868@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.12
Passed on x86_64 with linux-2.6.20
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.12
Passed on ppc64 with linux-2.6.18
Passed on ia64 with linux-2.6.12
Passed on x86_64 with linux-2.6.17
Passed on ia64 with linux-2.6.18
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.13
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.18
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.14
Passed on x86_64 with linux-2.6.16
Passed on ia64 with linux-2.6.13
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.12
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.13
Passed on ia64 with linux-2.6.16
Passed on x86_64 with linux-2.6.14
Passed on ia64 with linux-2.6.17
Passed on x86_64 with linux-2.6.21.1
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.14
Passed on x86_64 with linux-2.6.15
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.13
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on ppc64 with linux-2.6.17
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From steffen.persvold at scali.com  Fri Jun  8 04:42:15 2007
From: steffen.persvold at scali.com (Steffen Persvold)
Date: Fri, 8 Jun 2007 07:42:15 -0400
Subject: [ofa-general] RE: [ewg] OFED 1.2 and backwards binary
	compatibility
References: <465AE791.5040003@mellanox.co.il><A15335FBE9BD2449AF2C9EF3D1EB8EA303951123@xmb-sjc-216.amer.cisco.com><465BD5B4.50003@mellanox.co.il><D6A583C768392A4D8B297C500CDD54B501573779@mse11be1.mse11.exchange.ms><D6A583C768392A4D8B297C500CDD54B50157377A@mse11be1.mse11.exchange.ms>
	<adaabvbcjd3.fsf@cisco.com>
Message-ID: <D6A583C768392A4D8B297C500CDD54B50157377F@mse11be1.mse11.exchange.ms>

Roland,
 
1.0 vs. 1.1 is all good. That works. I'm talking about 1.1/1.0 vs 1.2, that's not working. The diffset below is between 1.1 and 1.2.
 
What we're doing is using dlopen()/dlsym() to dynamically open the library so that we have no library dependencies (this allows us to runtime wise check if ofed is installed or other IB stacks). This apparently breaks.
 
I don't find any "post_send_wrapper_1_0" nor "post_send_wrapper_1_1" symbols in my libraries ?? :
 
[root at pe1850-1 lib]# nm libibverbs.so.1.0.0 |grep post_send
0000000000003aa0 T ibv_cmd_post_send

?
 
Cheers,
 
Steffen Persvold
Technical Director Americas
tel. 508-281-7100 x401
fax. 508-281-7171

http://www.scali.com/
Scaling the Linux datacenter

________________________________

From: Roland Dreier [mailto:rdreier at cisco.com]
Sent: Thu 6/7/2007 9:42 PM
To: Steffen Persvold
Cc: EWG; OpenFabrics General
Subject: Re: [ofa-general] RE: [ewg] OFED 1.2 and backwards binary compatibility


 > Just to follow up, I believe at least these changes (there are more) to verbs.h breaks the compatibility :
 > 
 > @@ -469,8 +502,8 @@
 >  };
 >  struct ibv_send_wr {
 > -       struct ibv_send_wr     *next;
 >         uint64_t                wr_id;
 > +       struct ibv_send_wr     *next;
 >         struct ibv_sge         *sg_list;
 >         int                     num_sge;
 >         enum ibv_wr_opcode      opcode;
 > @@ -496,12 +529,21 @@
 >  };
 >  struct ibv_recv_wr {
 > -       struct ibv_recv_wr     *next;
 >         uint64_t                wr_id;
 > +       struct ibv_recv_wr     *next;
 >         struct ibv_sge         *sg_list;
 >         int                     num_sge;
 >  };

These differences should be taken care of by the
post_send_wrapper_1_0() and post_recv_wrapper_1_0() functions in
src/compat-1_0.c in libibverbs 1.1.

 - R.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070608/5a0a0794/attachment.html>

From rdreier at cisco.com  Fri Jun  8 06:59:24 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 08 Jun 2007 06:59:24 -0700
Subject: [ofa-general] RE: [ewg] OFED 1.2 and backwards binary
	compatibility
In-Reply-To: <D6A583C768392A4D8B297C500CDD54B50157377F@mse11be1.mse11.exchange.ms>
	(Steffen Persvold's message of "Fri, 8 Jun 2007 07:42:15 -0400")
References: <465AE791.5040003@mellanox.co.il>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303951123@xmb-sjc-216.amer.cisco.com>
	<465BD5B4.50003@mellanox.co.il>
	<D6A583C768392A4D8B297C500CDD54B501573779@mse11be1.mse11.exchange.ms>
	<D6A583C768392A4D8B297C500CDD54B50157377A@mse11be1.mse11.exchange.ms>
	<adaabvbcjd3.fsf@cisco.com>
	<D6A583C768392A4D8B297C500CDD54B50157377F@mse11be1.mse11.exchange.ms>
Message-ID: <ada645yczsz.fsf@cisco.com>

 > 1.0 vs. 1.1 is all good. That works. I'm talking about 1.1/1.0 vs 1.2, that's not working. The diffset below is between 1.1 and 1.2.

Sorry for being confusing.  I was talking about the libibverbs
version.  OFED 1.0 and 1.1 both included libibverbs 1.0, and OFED 1.2
includes libibverbs 1.1.

 > What we're doing is using dlopen()/dlsym() to dynamically open the library so that we have no library dependencies (this allows us to runtime wise check if ofed is installed or other IB stacks). This apparently breaks.

Yes, you are basically implementing a broken dynamic linker yourself.
For this to work you will need to use dlvsym() and request all symbols
with version IBVERBS_1.0.  There may be a slight performance penalty
on libibverbs 1.1 (OFED 1.2) because you will be going through
compatibility wrappers.

 > I don't find any "post_send_wrapper_1_0" nor "post_send_wrapper_1_1" symbols in my libraries ?? :

Right, they're internal symbols.  Take a look at the libibverbs source
if you're curious about how it works.

 - R.


From rdreier at cisco.com  Fri Jun  8 07:22:24 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 08 Jun 2007 07:22:24 -0700
Subject: [ofa-general] [GIT PULL] please pull infiniband.git
Message-ID: <ada1wgmcyqn.fsf@cisco.com>

Linus, please pull from

    master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This will get a bunch of fixes to the new mlx4 driver, and one fix for
port assignment by the RDMA CM:

Eli Cohen (1):
      mlx4_core: Fix CQ context layout

Jack Morgenstein (2):
      mlx4_core: Don't set MTT address in dMPT entries with PA set
      IB/mlx4: Fix zeroing of rnr_retry value in ib_modify_qp()

Roland Dreier (5):
      mlx4_core: Initialize ctx_list and ctx_lock earlier
      mlx4_core: Free catastrophic error MSI-X interrupt with correct dev_id
      IB/mthca, mlx4_core: Fix typo in comment
      mlx4_core: Check firmware command interface revision
      IB/mlx4: Make sure RQ allocation is always valid

Sean Hefty (1):
      RDMA/cma: Fix initialization of next_port

 drivers/infiniband/core/cma.c           |    4 +-
 drivers/infiniband/hw/mlx4/qp.c         |   33 ++++++++++++++++++++----------
 drivers/infiniband/hw/mthca/mthca_cmd.c |    2 +-
 drivers/net/mlx4/cq.c                   |    2 +-
 drivers/net/mlx4/eq.c                   |    4 ++-
 drivers/net/mlx4/fw.c                   |   27 ++++++++++++++++++++++--
 drivers/net/mlx4/intf.c                 |    3 --
 drivers/net/mlx4/main.c                 |    2 +
 drivers/net/mlx4/mr.c                   |    8 ++++--
 9 files changed, 60 insertions(+), 25 deletions(-)


diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 2eb52b7..32a0e66 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2773,8 +2773,8 @@ static int cma_init(void)
 	int ret;
 
 	get_random_bytes(&next_port, sizeof next_port);
-	next_port = (next_port % (sysctl_local_port_range[1] -
-				  sysctl_local_port_range[0])) +
+	next_port = ((unsigned int) next_port %
+		    (sysctl_local_port_range[1] - sysctl_local_port_range[0])) +
 		    sysctl_local_port_range[0];
 	cma_wq = create_singlethread_workqueue("rdma_cm");
 	if (!cma_wq)
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index dc137de..5c6d054 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -189,18 +189,28 @@ static int send_wqe_overhead(enum ib_qp_type type)
 }
 
 static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
-		       struct mlx4_ib_qp *qp)
+		       int is_user, int has_srq, struct mlx4_ib_qp *qp)
 {
 	/* Sanity check RQ size before proceeding */
 	if (cap->max_recv_wr  > dev->dev->caps.max_wqes  ||
 	    cap->max_recv_sge > dev->dev->caps.max_rq_sg)
 		return -EINVAL;
 
-	qp->rq.max = cap->max_recv_wr ? roundup_pow_of_two(cap->max_recv_wr) : 0;
+	if (has_srq) {
+		/* QPs attached to an SRQ should have no RQ */
+		if (cap->max_recv_wr)
+			return -EINVAL;
+
+		qp->rq.max = qp->rq.max_gs = 0;
+	} else {
+		/* HW requires >= 1 RQ entry with >= 1 gather entry */
+		if (is_user && (!cap->max_recv_wr || !cap->max_recv_sge))
+			return -EINVAL;
 
-	qp->rq.wqe_shift = ilog2(roundup_pow_of_two(cap->max_recv_sge *
-						    sizeof (struct mlx4_wqe_data_seg)));
-	qp->rq.max_gs    = (1 << qp->rq.wqe_shift) / sizeof (struct mlx4_wqe_data_seg);
+		qp->rq.max	 = roundup_pow_of_two(max(1, cap->max_recv_wr));
+		qp->rq.max_gs	 = roundup_pow_of_two(max(1, cap->max_recv_sge));
+		qp->rq.wqe_shift = ilog2(qp->rq.max_gs * sizeof (struct mlx4_wqe_data_seg));
+	}
 
 	cap->max_recv_wr  = qp->rq.max;
 	cap->max_recv_sge = qp->rq.max_gs;
@@ -285,7 +295,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 	qp->sq.head	    = 0;
 	qp->sq.tail	    = 0;
 
-	err = set_rq_size(dev, &init_attr->cap, qp);
+	err = set_rq_size(dev, &init_attr->cap, !!pd->uobject, !!init_attr->srq, qp);
 	if (err)
 		goto err;
 
@@ -762,11 +772,6 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 		optpar |= MLX4_QP_OPTPAR_PKEY_INDEX;
 	}
 
-	if (attr_mask & IB_QP_RNR_RETRY) {
-		context->params1 |= cpu_to_be32(attr->rnr_retry << 13);
-		optpar |= MLX4_QP_OPTPAR_RNR_RETRY;
-	}
-
 	if (attr_mask & IB_QP_AV) {
 		if (mlx4_set_path(dev, &attr->ah_attr, &context->pri_path,
 				  attr_mask & IB_QP_PORT ? attr->port_num : qp->port)) {
@@ -802,6 +807,12 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 
 	context->pd	    = cpu_to_be32(to_mpd(ibqp->pd)->pdn);
 	context->params1    = cpu_to_be32(MLX4_IB_ACK_REQ_FREQ << 28);
+
+	if (attr_mask & IB_QP_RNR_RETRY) {
+		context->params1 |= cpu_to_be32(attr->rnr_retry << 13);
+		optpar |= MLX4_QP_OPTPAR_RNR_RETRY;
+	}
+
 	if (attr_mask & IB_QP_RETRY_CNT) {
 		context->params1 |= cpu_to_be32(attr->retry_cnt << 16);
 		optpar |= MLX4_QP_OPTPAR_RETRY_COUNT;
diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index 3810252..f40558d 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -772,7 +772,7 @@ int mthca_QUERY_FW(struct mthca_dev *dev, u8 *status)
 
 	MTHCA_GET(dev->fw_ver,   outbox, QUERY_FW_VER_OFFSET);
 	/*
-	 * FW subminor version is at more signifant bits than minor
+	 * FW subminor version is at more significant bits than minor
 	 * version, so swap here.
 	 */
 	dev->fw_ver = (dev->fw_ver & 0xffff00000000ull) |
diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c
index 437d78a..39253d0 100644
--- a/drivers/net/mlx4/cq.c
+++ b/drivers/net/mlx4/cq.c
@@ -61,7 +61,7 @@ struct mlx4_cq_context {
 	__be32			solicit_producer_index;
 	__be32			consumer_index;
 	__be32			producer_index;
-	u8			reserved6[2];
+	u32			reserved6[2];
 	__be64			db_rec_addr;
 };
 
diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c
index 0f11adb..27a82ce 100644
--- a/drivers/net/mlx4/eq.c
+++ b/drivers/net/mlx4/eq.c
@@ -490,9 +490,11 @@ static void mlx4_free_irqs(struct mlx4_dev *dev)
 
 	if (eq_table->have_irq)
 		free_irq(dev->pdev->irq, dev);
-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < MLX4_EQ_CATAS; ++i)
 		if (eq_table->eq[i].have_irq)
 			free_irq(eq_table->eq[i].irq, eq_table->eq + i);
+	if (eq_table->eq[MLX4_EQ_CATAS].have_irq)
+		free_irq(eq_table->eq[MLX4_EQ_CATAS].irq, dev);
 }
 
 static int __devinit mlx4_map_clr_int(struct mlx4_dev *dev)
diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
index cfa5cc0..e7ca118 100644
--- a/drivers/net/mlx4/fw.c
+++ b/drivers/net/mlx4/fw.c
@@ -37,6 +37,10 @@
 #include "fw.h"
 #include "icm.h"
 
+enum {
+	MLX4_COMMAND_INTERFACE_REV	= 1
+};
+
 extern void __buggy_use_of_MLX4_GET(void);
 extern void __buggy_use_of_MLX4_PUT(void);
 
@@ -452,10 +456,12 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev)
 	u32 *outbox;
 	int err = 0;
 	u64 fw_ver;
+	u16 cmd_if_rev;
 	u8 lg;
 
 #define QUERY_FW_OUT_SIZE             0x100
 #define QUERY_FW_VER_OFFSET            0x00
+#define QUERY_FW_CMD_IF_REV_OFFSET     0x0a
 #define QUERY_FW_MAX_CMD_OFFSET        0x0f
 #define QUERY_FW_ERR_START_OFFSET      0x30
 #define QUERY_FW_ERR_SIZE_OFFSET       0x38
@@ -477,21 +483,36 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev)
 
 	MLX4_GET(fw_ver, outbox, QUERY_FW_VER_OFFSET);
 	/*
-	 * FW subminor version is at more signifant bits than minor
+	 * FW subminor version is at more significant bits than minor
 	 * version, so swap here.
 	 */
 	dev->caps.fw_ver = (fw_ver & 0xffff00000000ull) |
 		((fw_ver & 0xffff0000ull) >> 16) |
 		((fw_ver & 0x0000ffffull) << 16);
 
+	MLX4_GET(cmd_if_rev, outbox, QUERY_FW_CMD_IF_REV_OFFSET);
+	if (cmd_if_rev != MLX4_COMMAND_INTERFACE_REV) {
+		mlx4_err(dev, "Installed FW has unsupported "
+			 "command interface revision %d.\n",
+			 cmd_if_rev);
+		mlx4_err(dev, "(Installed FW version is %d.%d.%03d)\n",
+			 (int) (dev->caps.fw_ver >> 32),
+			 (int) (dev->caps.fw_ver >> 16) & 0xffff,
+			 (int) dev->caps.fw_ver & 0xffff);
+		mlx4_err(dev, "This driver version supports only revision %d.\n",
+			 MLX4_COMMAND_INTERFACE_REV);
+		err = -ENODEV;
+		goto out;
+	}
+
 	MLX4_GET(lg, outbox, QUERY_FW_MAX_CMD_OFFSET);
 	cmd->max_cmds = 1 << lg;
 
-	mlx4_dbg(dev, "FW version %d.%d.%03d, max commands %d\n",
+	mlx4_dbg(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n",
 		 (int) (dev->caps.fw_ver >> 32),
 		 (int) (dev->caps.fw_ver >> 16) & 0xffff,
 		 (int) dev->caps.fw_ver & 0xffff,
-		 cmd->max_cmds);
+		 cmd_if_rev, cmd->max_cmds);
 
 	MLX4_GET(fw->catas_offset, outbox, QUERY_FW_ERR_START_OFFSET);
 	MLX4_GET(fw->catas_size,   outbox, QUERY_FW_ERR_SIZE_OFFSET);
diff --git a/drivers/net/mlx4/intf.c b/drivers/net/mlx4/intf.c
index 65854f9..9ae951b 100644
--- a/drivers/net/mlx4/intf.c
+++ b/drivers/net/mlx4/intf.c
@@ -135,9 +135,6 @@ int mlx4_register_device(struct mlx4_dev *dev)
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_interface *intf;
 
-	INIT_LIST_HEAD(&priv->ctx_list);
-	spin_lock_init(&priv->ctx_lock);
-
 	mutex_lock(&intf_mutex);
 
 	list_add_tail(&priv->dev_list, &dev_list);
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 20b8c0d..d417293 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -787,6 +787,8 @@ static int __devinit mlx4_init_one(struct pci_dev *pdev,
 
 	dev       = &priv->dev;
 	dev->pdev = pdev;
+	INIT_LIST_HEAD(&priv->ctx_list);
+	spin_lock_init(&priv->ctx_lock);
 
 	/*
 	 * Now reset the HCA before we touch the PCI capabilities or
diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c
index b33864d..d0808fa 100644
--- a/drivers/net/mlx4/mr.c
+++ b/drivers/net/mlx4/mr.c
@@ -324,15 +324,17 @@ int mlx4_mr_enable(struct mlx4_dev *dev, struct mlx4_mr *mr)
 				       MLX4_MPT_FLAG_MIO	 |
 				       MLX4_MPT_FLAG_REGION	 |
 				       mr->access);
-	if (mr->mtt.order < 0)
-		mpt_entry->flags |= cpu_to_be32(MLX4_MPT_FLAG_PHYSICAL);
 
 	mpt_entry->key	       = cpu_to_be32(key_to_hw_index(mr->key));
 	mpt_entry->pd	       = cpu_to_be32(mr->pd);
 	mpt_entry->start       = cpu_to_be64(mr->iova);
 	mpt_entry->length      = cpu_to_be64(mr->size);
 	mpt_entry->entity_size = cpu_to_be32(mr->mtt.page_shift);
-	mpt_entry->mtt_seg     = cpu_to_be64(mlx4_mtt_addr(dev, &mr->mtt));
+	if (mr->mtt.order < 0) {
+		mpt_entry->flags |= cpu_to_be32(MLX4_MPT_FLAG_PHYSICAL);
+		mpt_entry->mtt_seg = 0;
+	} else
+		mpt_entry->mtt_seg = cpu_to_be64(mlx4_mtt_addr(dev, &mr->mtt));
 
 	err = mlx4_SW2HW_MPT(dev, mailbox,
 			     key_to_hw_index(mr->key) & (dev->caps.num_mpts - 1));


From steffen.persvold at scali.com  Fri Jun  8 07:26:11 2007
From: steffen.persvold at scali.com (Steffen Persvold)
Date: Fri, 8 Jun 2007 10:26:11 -0400
Subject: [ofa-general] RE: [ewg] OFED 1.2 and backwards binary
	compatibility
References: <465AE791.5040003@mellanox.co.il><A15335FBE9BD2449AF2C9EF3D1EB8EA303951123@xmb-sjc-216.amer.cisco.com><465BD5B4.50003@mellanox.co.il><D6A583C768392A4D8B297C500CDD54B501573779@mse11be1.mse11.exchange.ms><D6A583C768392A4D8B297C500CDD54B50157377A@mse11be1.mse11.exchange.ms><adaabvbcjd3.fsf@cisco.com><D6A583C768392A4D8B297C500CDD54B50157377F@mse11be1.mse11.exchange.ms>
	<ada645yczsz.fsf@cisco.com>
Message-ID: <D6A583C768392A4D8B297C500CDD54B501573784@mse11be1.mse11.exchange.ms>

Aha!
 
Thanks so much, I will look into this. 
 
Cheers,
Steffen Persvold
Technical Director Americas
tel. 508-281-7100 x401
fax. 508-281-7171

http://www.scali.com/
Scaling the Linux datacenter

________________________________

From: Roland Dreier [mailto:rdreier at cisco.com]
Sent: Fri 6/8/2007 9:59 AM
To: Steffen Persvold
Cc: EWG; OpenFabrics General
Subject: Re: [ofa-general] RE: [ewg] OFED 1.2 and backwards binary compatibility


 > 1.0 vs. 1.1 is all good. That works. I'm talking about 1.1/1.0 vs 1.2, that's not working. The diffset below is between 1.1 and 1.2.

Sorry for being confusing.  I was talking about the libibverbs
version.  OFED 1.0 and 1.1 both included libibverbs 1.0, and OFED 1.2
includes libibverbs 1.1.

 > What we're doing is using dlopen()/dlsym() to dynamically open the library so that we have no library dependencies (this allows us to runtime wise check if ofed is installed or other IB stacks). This apparently breaks.

Yes, you are basically implementing a broken dynamic linker yourself.
For this to work you will need to use dlvsym() and request all symbols
with version IBVERBS_1.0.  There may be a slight performance penalty
on libibverbs 1.1 (OFED 1.2) because you will be going through
compatibility wrappers.

 > I don't find any "post_send_wrapper_1_0" nor "post_send_wrapper_1_1" symbols in my libraries ?? :

Right, they're internal symbols.  Take a look at the libibverbs source
if you're curious about how it works.

 - R.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070608/e3fc2a7e/attachment.html>

From afriedle at open-mpi.org  Fri Jun  8 11:05:33 2007
From: afriedle at open-mpi.org (Andrew Friedley)
Date: Fri, 08 Jun 2007 11:05:33 -0700
Subject: [ofa-general] Limited number of multicasts groups that can be
	joined?
Message-ID: <46699A6D.4070300@open-mpi.org>

I've run into a problem where it appears that I cannot join more than 14 
multicast groups from a single HCA.  I'm using the RDMA CM UD/multicast 
interface from an OFED v1.2 nightly build, and using a '0' address when 
joining to have the SM allocate an unused address.  The first 14 
rdma_join_multicast() calls succeed, a MULTICAST_JOIN event comes 
through for each of them and everything works.  But the 15th call to 
rdma_join_multicast() returns -1 and sets errno to 99, 'Cannot assign 
requested address'.

Note that I'm using a single QP per process to do all the joins.  Things 
get weirder if I run two instances of my program on the same node -- as 
soon the total between the two instances is 14, neither instance can 
join any more groups.  Also, right now my code hangs when this happens 
-- if I kill off one of the two instances and run a third instance 
(while leaving the other hung, holding some number of groups), the third 
instance is not able to join ANY groups.  The behavior resets when I 
kill all instances.

Two instances running on separate nodes (on the same network) do not 
appear to interfere with each other like described above; they do still 
error out on the 15th join.

This feels like a bug to me; though regardless this limit is WAY too 
low.  Any ideas what might be going on, or how I can work around it?

Andrew


From qlandfaj at liberadiffusioneenergetica.it  Fri Jun  8 10:46:24 2007
From: qlandfaj at liberadiffusioneenergetica.it (Fern Jordan)
Date: Fri, 08 Jun 2007 16:46:24 -0100
Subject: [ofa-general] Is it your decision?
Message-ID: <896501c7a9ec$8cde8dc0$deb87002@qlandfaj>


squash So grip repulsive peel long, Nosey Flynn said.Before light the sleepy huge high door of heat the Irish driving house of pa He rested an slit reject innocent bled clean book on the edge of the deskpontal argument When they have eaten the brawn expansion motionless and the bread an
star Macintosh. heat family Yes, metal I saw him, Mr Bloom said. Wheretook chin Yes, heal the professor said, flood skipping to get into s 
plane A calculate son squad of constables debouched relax from College stree The others turned. hungry tightly He effect crossed bump under Tommy Moore's roguish finger. The stride Something for you, rescue smile the professor tooth explained to M
stocking order The development Rose of Castille. See the wheeze? avoid Rows of cmotion person fragile M'Intosh, Hynes said, scribbling, trick I don't know Another newsboy muddle outgoing ok always shot past them, yelling as he ran: He moved away, field thaw street suck looking about him.
history Some Column! earn - run twist That's What Waddler One Said learning brake Prrwht! spring Paddy Leonard said sow with scorn. Mr Byrne fire need boiling prepare Up the Boers! apian Stone hushed gluteal cinerary ginger, Davy Byrne added civilly. Stephen roof withstood the move bane of book perform miscreant eyes, glin
Didn't hear. hate What? hole triangular Where position has he disappeared to? Nocorrect bet He poked Mr knelt stone O'Madden Burke mildly in the spleen. M overdo Help! he cycle circle sighed. I mother feel a strong weakness.  Dirty Dublin  Dubliners.
taste year I was transport prepared for jelly paradoxes from what Malachicontinue As knock we, angry or mother Dana, weave modern and unweave our boThose leg art wound stop Slightly Rambunctious Females Silly billies: request jagged friend mob of order young cubs yelling their gut
scribble brief. And settle cheer badly down on their striped petticoats, pe Bear with me. At cause bird Duke hilly lane split a ravenous terrier choked up a sick k rush juicy let Our frame young Irish bards, John Eglinton censured, cow Yes, Mr act Best said youngly, I left inquisitively feel Hamlet quite The music professor, wine returning by way range of wearily the files, swep
Clay, brown, hospital damp, tired began swim rose to be seen in the hole. IOmnium Gatherumstart stare sewed form Where is that? the professor asked. The mourners itch moved cow away needle made slowly, without aim, by de charge knee celiac We were ridden only thinking about it, Stephen said.
cooing brought Are win held those yours, Mary? camera drown Dames Donate energetic Dublin's turn Cits Speedpills Velocitous A  He gave a cork slide sudden be loud young laugh as take a close. Lene
dreamt face edge Don sternal Giovanni, a cenar teco set That mole plate is the last thrust to blow go, Stephen said, laug quit point kill And has remained so, one note should hope, John Egli There are reading great times wail land friend coming, Mary. Wait till y If lept that stink were reading the birthmark skin of genius, he said, guide instruct enchanting important Literature, the press.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070608/e86c43cf/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: c.gif
Type: image/gif
Size: 6587 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070608/e86c43cf/attachment.gif>

From sean.hefty at intel.com  Fri Jun  8 12:40:03 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 8 Jun 2007 12:40:03 -0700
Subject: [ofa-general] RE: Limited number of multicasts groups that can be
	joined?
In-Reply-To: <46699A6D.4070300@open-mpi.org>
Message-ID: <000d01c7aa04$cf8353f0$9c98070a@amr.corp.intel.com>

>I've run into a problem where it appears that I cannot join more than 14
>multicast groups from a single HCA.  I'm using the RDMA CM UD/multicast
>interface from an OFED v1.2 nightly build, and using a '0' address when
>joining to have the SM allocate an unused address.  The first 14
>rdma_join_multicast() calls succeed, a MULTICAST_JOIN event comes
>through for each of them and everything works.  But the 15th call to
>rdma_join_multicast() returns -1 and sets errno to 99, 'Cannot assign
>requested address'.

I was able to join a total of 6 times before I started seeing failures.  Each
join is done by a separate process with their own QP.  I'll track down the
failure in more detail, but it will likely take me a couple of days to look into
this.

>This feels like a bug to me; though regardless this limit is WAY too
>low.  Any ideas what might be going on, or how I can work around it?

At least on my systems, I see device attributes of:

max_mcast_grp = 8192
max_mcast_qp_attach = 8
max_total_mcast_qp_attach = 65536

- Sean


From afriedle at open-mpi.org  Fri Jun  8 13:13:10 2007
From: afriedle at open-mpi.org (Andrew Friedley)
Date: Fri, 08 Jun 2007 13:13:10 -0700
Subject: [ofa-general] Re: Limited number of multicasts groups that can be
	joined?
In-Reply-To: <000d01c7aa04$cf8353f0$9c98070a@amr.corp.intel.com>
References: <000d01c7aa04$cf8353f0$9c98070a@amr.corp.intel.com>
Message-ID: <4669B856.9080305@open-mpi.org>


Sean Hefty wrote:
> At least on my systems, I see device attributes of:
> 
> max_mcast_grp = 8192
> max_mcast_qp_attach = 8
> max_total_mcast_qp_attach = 65536

OK I see the exact same thing here.  What exactly do these params mean? 
  Particularly max_mcast_qp_attach, is that the most QPs attached to one 
group for this device, or max groups a QP can attach to?

Andrew


From sean.hefty at intel.com  Fri Jun  8 13:26:18 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 8 Jun 2007 13:26:18 -0700
Subject: [ofa-general] RE: Limited number of multicasts groups that can be
	joined?
In-Reply-To: <4669B856.9080305@open-mpi.org>
Message-ID: <001b01c7aa0b$452986a0$9c98070a@amr.corp.intel.com>

>> max_mcast_grp = 8192
>> max_mcast_qp_attach = 8
>> max_total_mcast_qp_attach = 65536
>
>OK I see the exact same thing here.  What exactly do these params mean?
>  Particularly max_mcast_qp_attach, is that the most QPs attached to one
>group for this device, or max groups a QP can attach to?

Maximum number of multicast groups supported by this HCA.
Shall be zero if this HCA does not support IBA unreliable multicast.

Maximum number of QPs which can be attached to multicast
groups for this HCA. Shall be zero if this HCA does not support
IBA unreliable multicast.

Maximum number of QPs per multicast group supported by
this HCA. Shall be zero if this HCA does not support IBA unreliable
multicast.

Given that I can only join 6 times, I'm guessing that I'm hitting into an issue
with max_mcast_qp_attach = 8.  (At least ipoib has joined multicast groups as
well.)

- Sean


From or.gerlitz at gmail.com  Fri Jun  8 14:20:27 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Sat, 9 Jun 2007 00:20:27 +0300
Subject: [ofa-general] Re: ipoib / bonding and OFED
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA303A4544E@xmb-sjc-216.amer.cisco.com>
References: <3857BB049D83424D9DB82753D37CEA55459C41@taurus.voltaire.com>
	<4657373E.2030903@hp.com> <465BDC90.5080305@voltaire.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303951259@xmb-sjc-216.amer.cisco.com>
	<466702A8.5080302@hp.com> <4667B5FD.4070600@voltaire.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303A4544E@xmb-sjc-216.amer.cisco.com>
Message-ID: <15ddcffd0706081420r79984701u4e385e28857cb68b@mail.gmail.com>

On 6/7/07, Scott Weitzenkamp (sweitzen) <sweitzen at cisco.com> wrote:

> I don't know if I've said this in public, but I've stopped testing
> ipoibtools HA as of OFED 1.2 rc2 and Cisco is only going to support
> ib-bonding HA for our OFED 1.2 customers, as our testing has revealed
> ib-bonding is more robust than ipoibtools.  I know I said this to
> Tziporet at Sonoma, and she seemed to agree we could eventually remove
> ipoibtools from OFED.


Scott,

Thanks for the feedback,  just to be clear, we also don't test the
ipoibtools HA solution, and Voltaire will support only the ib-bonding
solution for OFED 1.2 customers.

Or.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070609/94d67beb/attachment.html>

From vlad at lists.openfabrics.org  Sat Jun  9 02:41:47 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Sat,  9 Jun 2007 02:41:47 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070609-0200 daily build status
Message-ID: <20070609094147.D985FE60844@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.15
Passed on x86_64 with linux-2.6.20
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.16
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.17
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.19
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.15
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on x86_64 with linux-2.6.16
Passed on ppc64 with linux-2.6.13
Passed on powerpc with linux-2.6.17
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.18
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.13
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.15
Passed on powerpc with linux-2.6.16
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.14
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.16
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.12
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From nidmillroadduv at millroad.net  Sat Jun  9 07:51:35 2007
From: nidmillroadduv at millroad.net (Normand Driscoll)
Date: Sat, 9 Jun 2007 13:51:35 -0100
Subject: [ofa-general] Can you imagine that you are healthy?
Message-ID: <949272195.36337650156956@millroad.net>

LegalRXMedications chemist's offers all medicinal preparations that you require in order to recover your health with a little price. 
We operate around the world with clients from Europe, America, and Asia. 
At present you don't have to look for drug shop somewhere at your area.
We necessarily transfer high quality medsworldwide.
Come to our site and gain medicinal agents you instantly require direct to your abode. 
http://forestmeat.hk/ 
Were verified by VISA & VeriSign thus we provide effective & reliable acquisition.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070609/b6b65c2f/attachment.html>

From mst at dev.mellanox.co.il  Sat Jun  9 21:42:00 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 10 Jun 2007 07:42:00 +0300
Subject: [ofa-general] Re: IPOIB CM (NOSRQ) extension
In-Reply-To: <46687642.8040208@linux.vnet.ibm.com>
References: <46687642.8040208@linux.vnet.ibm.com>
Message-ID: <20070610044146.GA4959@mellanox.co.il>

> Quoting Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>:
> Subject: IPOIB CM (NOSRQ) extension
> 
> This patch handles the corner case of running out of RC QPs. In that
> case it switches to UD mode. This patch can be used both by NOSRQ and
> SRQ code.
> 
> Signed-off-by: Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>

You don't provide any way to retry going back to connected mode,
after a failure, which is really intermittent by nature. That's pretty bad.

> ---
> 
> --- c/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
> 2007-06-07 11:13:55.000000000 -0400
> +++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
> 2007-06-07 11:11:21.000000000 -0400
> @@ -1383,6 +1383,11 @@ static int ipoib_cm_tx_handler(struct ib
>  		break;
>  	case IB_CM_REQ_ERROR:
>  	case IB_CM_REJ_RECEIVED:
> +		ipoib_warn(priv, "REJ received\n");
> +		neigh = tx->neigh;
> +		if (neigh)
> +			clear_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags);
> +		break;
>  	case IB_CM_TIMEWAIT_EXIT:
>  		ipoib_dbg(priv, "CM error %d.\n", event->event);
>  		spin_lock_irq(&priv->tx_lock);

This has an effect of dropping down to datagram mode
on errors such as CM timeout, or a reject due to stale connection.
I think this is a wrong thing to do.

> --- c/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_main.c 
> 2007-05-30 14:56:25.000000000 -0400
> +++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_main.c 
> 2007-06-06 18:28:06.000000000 -0400
> @@ -679,11 +679,10 @@ static int ipoib_start_xmit(struct sk_bu
> 
>  		neigh = *to_ipoib_neigh(skb->dst->neighbour);
> 
> -		if (ipoib_cm_get(neigh)) {
> -			if (ipoib_cm_up(neigh)) {
> +		if (ipoib_cm_get(neigh) &&  ipoib_cm_up(neigh) &&
> +			test_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags)) {
>  				ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
>  				goto out;
> -			}
>  		} else if (neigh->ah) {
>  			if (unlikely(memcmp(&neigh->dgid.raw,
>  					    skb->dst->neighbour->ha + 4,

This adds overhead on xmit datapath (and it's atomics!),
which doesn't make me happy at all.

-- 
MST


From mst at dev.mellanox.co.il  Sat Jun  9 21:49:45 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 10 Jun 2007 07:49:45 +0300
Subject: [ofa-general] Re: IPOIB CM (NOSRQ)[PATCH V5] patch
In-Reply-To: <46687636.5050101@linux.vnet.ibm.com>
References: <46687636.5050101@linux.vnet.ibm.com>
Message-ID: <20070610044945.GB4959@mellanox.co.il>

> Quoting Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>:
> Subject: IPOIB CM (NOSRQ)[PATCH V5] patch
> 
> Here is a fifth version of the IPOIB_CM_NOSRQ patch. This patch will
> benefit adapters that do not support shared receive queues.
> 
> This patch incorporates the following review comments and subsequent
> discussions on this mailing list from v4:
> 
> 1. Reduce the number of if(srq) tests in the packet receive path

I could still count at least 2 of these, and I don't see why there can't be just 1,
or even 0 if the QP pool is hidden under the SRQ interface.

> +int current_rc_qp = 0; /* Active RC QPs for NOSRQ */
>  #define IPOIB_CM_IETF_ID 0x1000000000000000ULL
> 
>  #define IPOIB_CM_RX_UPDATE_TIME (256 * HZ)

I don't see any locking for current_rc_qp, which looks wrong.

-- 
MST


From mst at dev.mellanox.co.il  Sat Jun  9 23:37:13 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 10 Jun 2007 09:37:13 +0300
Subject: [ofa-general] patch for OFED 1.2
Message-ID: <20070610063713.GB8249@mellanox.co.il>

Sean, the following commit

	commit bf2944bd56c7a48cc3962a860dbc4ceee6b1ace8
	Author: Sean Hefty <sean.hefty at intel.com>
	Date:   Tue Jun 5 09:57:31 2007 -0700

	RDMA/cma: Fix initialization of next_port

	next_port should be between sysctl_local_port_range[0] and [1].
	However, it is initially set to a random value with get_random_bytes().
	If the value is negative when treated as a signed integer, next_port
	can end up outside the expected range because of the result of the %
	operator being negative.

	Signed-off-by: Sean Hefty <sean.hefty at intel.com>
	Signed-off-by: Roland Dreier <rolandd at cisco.com>

looks like something we want included in OFED 1.2 is well.
What do you think?

-- 
MST


From dotanb at dev.mellanox.co.il  Sun Jun 10 00:44:29 2007
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Sun, 10 Jun 2007 10:44:29 +0300
Subject: [ofa-general] Having trouble pingpong between two nodes.
In-Reply-To: <A382D4292574EB47A85B8159A6AED1A1015EF2BC@FPNYEXCBE02.opus-i.corp>
References: <A382D4292574EB47A85B8159A6AED1A1015EF2BC@FPNYEXCBE02.opus-i.corp>
Message-ID: <466BABDD.20808@dev.mellanox.co.il>

Jeffrey Wong wrote:
>
> Hello,
>
> I am trying to run a ibv_ud_pingpong between two nodes but I can’t 
> seem to get them to communicate. I have used the ping command between 
> the ib interfaces and that works fine, but when I try to use the 
> ibv_ud_ping pong it says the following:
>
> ________________________________________________________________________________
>
> root at centos5:node1 ~]# ibv_ud_pingpong 193.168.10.254
>
> local address: LID 0x0002, QPN 0x0f0406, PSN 0xb067dc
>
> Couldn't connect to 193.168.10.254:18515
>
> ____________________________________________________________________________
>
This is trivial, but did you execute ibv_ud_pingpong as the server in 
193.168.10.254?
(because you give any test parameters to the client, it should be 
executed only with: ibv_ud_pingpong).


Dotan


From vlad at lists.openfabrics.org  Sun Jun 10 02:40:53 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Sun, 10 Jun 2007 02:40:53 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070610-0200 daily build status
Message-ID: <20070610094053.835E3E60831@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.12
Passed on powerpc with linux-2.6.18
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.15
Passed on x86_64 with linux-2.6.16
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.12
Passed on x86_64 with linux-2.6.20
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.18
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.18
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.14
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.17
Passed on ia64 with linux-2.6.16
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on powerpc with linux-2.6.15
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.12
Passed on ia64 with linux-2.6.17
Passed on ppc64 with linux-2.6.14
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.19
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.17
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-34.ELsmp
Passed on x86_64 with linux-2.6.9-42.ELsmp

Failed:


From eli at mellanox.co.il  Sun Jun 10 04:00:33 2007
From: eli at mellanox.co.il (Eli Cohen)
Date: Sun, 10 Jun 2007 14:00:33 +0300
Subject: [ofa-general] Re: [PATCH 2/2] IB/mlx4_ib: fix SRQ buffer allocation
In-Reply-To: <ada3b13egl6.fsf@cisco.com>
References: <1181133679.10841.66.camel@mtls03>  <ada3b13egl6.fsf@cisco.com>
Message-ID: <1181473233.11593.7.camel@mtls03>

On Thu, 2007-06-07 at 11:59 -0700, Roland Dreier wrote:
> Thanks... I reworked this a lot and right now I plan to push the
> following (although I'm still testing):
> 
>  
>  static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
> -		       struct mlx4_ib_qp *qp)
> +		       int is_user, int has_srq, struct mlx4_ib_qp *qp)
>  {
>  	/* Sanity check RQ size before proceeding */
>  	if (cap->max_recv_wr  > dev->dev->caps.max_wqes  ||
>  	    cap->max_recv_sge > dev->dev->caps.max_rq_sg)
>  		return -EINVAL;
>  
> -	qp->rq.max = cap->max_recv_wr ? roundup_pow_of_two(cap->max_recv_wr) : 0;
> +	if (has_srq) {
> +		/* QPs attached to an SRQ should have no RQ */
> +		if (cap->max_recv_wr)
> +			return -EINVAL;
>  
> -	qp->rq.wqe_shift = ilog2(roundup_pow_of_two(cap->max_recv_sge *
> -						    sizeof (struct mlx4_wqe_data_seg)));
> -	qp->rq.max_gs    = (1 << qp->rq.wqe_shift) / sizeof (struct mlx4_wqe_data_seg);
> +		qp->rq.max = qp->rq.max_gs = 0;
> +	} else {
> +		/* HW requires >= 1 RQ entry with >= 1 gather entry */
> +		if (is_user && (!cap->max_recv_wr || !cap->max_recv_sge))
> +			return -EINVAL;

I think we may have a problem here: if a user, not being aware of the HW
requirement of none zero length receive queue, creates a QP with zero in
cap->max_recv_sge, the above kernel code will cause a failure since
libmlx4 does not fix the value in this field. So I think this should be
taken care of in libmlx4.


Moreover, I see you did not take the following:

@@ -302,6 +315,10 @@ static int set_kernel_sq_size(struct mlx
 static int set_user_sq_size(struct mlx4_ib_qp *qp,
                            struct mlx4_ib_create_qp *ucmd)
 {
+       /* Sanity check for SQ size */
+       if (ucmd->log_sq_bb_count > 15 || ucmd->log_sq_stride > 11)
+               return -EINVAL;
+

Shouldn't we use a condition like this to prevent misconfiguration of
the QP if libmlx4 passes improper values?


From rdreier at cisco.com  Sun Jun 10 08:52:06 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Sun, 10 Jun 2007 08:52:06 -0700
Subject: [ofa-general] Re: [PATCH 2/2] IB/mlx4_ib: fix SRQ buffer
	allocation
In-Reply-To: <1181473233.11593.7.camel@mtls03> (Eli Cohen's message of "Sun,
	10 Jun 2007 14:00:33 +0300")
References: <1181133679.10841.66.camel@mtls03> <ada3b13egl6.fsf@cisco.com>
	<1181473233.11593.7.camel@mtls03>
Message-ID: <adafy4zbye1.fsf@cisco.com>

 > > +		/* HW requires >= 1 RQ entry with >= 1 gather entry */
 > > +		if (is_user && (!cap->max_recv_wr || !cap->max_recv_sge))
 > > +			return -EINVAL;
 > 
 > I think we may have a problem here: if a user, not being aware of the HW
 > requirement of none zero length receive queue, creates a QP with zero in
 > cap->max_recv_sge, the above kernel code will cause a failure since
 > libmlx4 does not fix the value in this field. So I think this should be
 > taken care of in libmlx4.

OK, I'll add something to make sure max_recv_sge >= 1 to libmlx4.

 > +       /* Sanity check for SQ size */
 > +       if (ucmd->log_sq_bb_count > 15 || ucmd->log_sq_stride > 11)
 > +               return -EINVAL;
 > +
 > 
 > Shouldn't we use a condition like this to prevent misconfiguration of
 > the QP if libmlx4 passes improper values?

Yeah, I guess so.  I dropped that chunk because I didn't like the
hard-coded and unexplained values, but I left the checking on my to do
list.

Thanks...


From rdreier at cisco.com  Sun Jun 10 08:54:19 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Sun, 10 Jun 2007 08:54:19 -0700
Subject: [ofa-general] Re: IPOIB CM (NOSRQ) extension
In-Reply-To: <20070610044146.GA4959@mellanox.co.il> (Michael S. Tsirkin's
	message of "Sun, 10 Jun 2007 07:42:00 +0300")
References: <46687642.8040208@linux.vnet.ibm.com>
	<20070610044146.GA4959@mellanox.co.il>
Message-ID: <adabqfnbyac.fsf@cisco.com>

 > > -		if (ipoib_cm_get(neigh)) {
 > > -			if (ipoib_cm_up(neigh)) {
 > > +		if (ipoib_cm_get(neigh) &&  ipoib_cm_up(neigh) &&
 > > +			test_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags)) {
 > 
 > This adds overhead on xmit datapath (and it's atomics!),
 > which doesn't make me happy at all.

I don't see anything atomic here.

But

	if (ipoib_cm_get(neigh)) {
		if (ipoib_cm_up(neigh)) {
		....
		}
	} else...

is different from

	if (ipoib_cm_get(neigh) && if (ipoib_cm_up(neigh)) {
	....
	} else..

so there is a change in semantics here...


From rdreier at cisco.com  Sun Jun 10 08:57:02 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Sun, 10 Jun 2007 08:57:02 -0700
Subject: [ofa-general] IPOIB CM (NOSRQ)[PATCH V5] patch
In-Reply-To: <46687636.5050101@linux.vnet.ibm.com> (Pradeep Satyanarayana's
	message of "Thu, 07 Jun 2007 14:18:46 -0700")
References: <46687636.5050101@linux.vnet.ibm.com>
Message-ID: <ada7iqbby5t.fsf@cisco.com>

Haven't read very far, but...

 > +#define SIXTY_FOUR_K (1ul << 16)
 > +#define MEGA_BYTE (1ul << 20)

this is really horrible.  There's no point in this type of defines --
a constant should have a name that describes what it's *for*, not what
the value is.  The code above is pretty close to

#define SIXTY_FOUR	64

and I hope it's obvious why that's pointless.

And also

 > +		ipoib_warn(priv, "NOSRQ has reached the configurable limit "
 > +		           "of either %d RC QPs or, max recv buf size of "
 > +			   "0x%lx MB\n", max_rc_qp, max_recv_buf * MEGA_BYTE);

this is buggy -- you print the value as being in MB but then also
multiply by MEGA_BYTE before printing it.


From sagis at voltaire.com  Sun Jun 10 08:59:48 2007
From: sagis at voltaire.com (Sagi  Schlanger)
Date: Sun, 10 Jun 2007 18:59:48 +0300
Subject: [ofa-general] OpenSM Up-Down algorithm
Message-ID: <39C75744D164D948A170E9792AF8E7CA0D2914@exil.voltaire.com>

Hi,
 
I'm looking for some answers on Up-Down routing at OpenSM .
 
Is anybody familiar with a utility/procedure to find credit loops given
a topology and routing settings?
 
Is there a handy spec describing the OpenSM Up-Down algorithm?
What is the scheme through which roots are defined on clos and non
clos/fat tree topologies?
Is this algorithm always credit loop free?
How efficient is using this algorithm on non clos/fat tree topologies?
 
Thanks for your cooperation,
Sagi
____________________________________________________________
Sagi Schlanger  | +972-9-9717651 (o)   |   +972-52-2385154 (m)
Software Engineer, IB Switch
Voltaire - The Grid Backbone
 
 www.voltaire.com <http://www.voltaire.com/> 

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070610/97cce3e4/attachment.html>

From sean.hefty at intel.com  Sun Jun 10 10:28:25 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Sun, 10 Jun 2007 10:28:25 -0700
Subject: [ofa-general] RE: patch for OFED 1.2
In-Reply-To: <20070610063713.GB8249@mellanox.co.il>
Message-ID: <000401c7ab84$c0fbf900$eacc180a@amr.corp.intel.com>

>looks like something we want included in OFED 1.2 is well.
>What do you think?

This should have been pulled in for OFED.

- Sean


From 2asakim5 at netvision.net.il  Sun Jun 10 07:43:13 2007
From: 2asakim5 at netvision.net.il (=?windows-1255?Q?=F2=F1=F7=E9=ED?=)
Date: Sun, 10 Jun 2007 17:43:13 +0300
Subject: [ofa-general] =?windows-1255?b?7O7kIOD65CDs4CDu9uzp5yDs7uvl+D8=?=
Message-ID: <132c1ac2ad955dc34139445300184039@017.net.il>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070610/128901d8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 1620 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070610/128901d8/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 8876 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070610/128901d8/attachment-0001.jpg>

From sashak at voltaire.com  Sun Jun 10 15:31:59 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Mon, 11 Jun 2007 01:31:59 +0300
Subject: [ofa-general] [PATCH] opensm: remove unused state_step_mode
Message-ID: <20070610223159.GB23029@sashak.voltaire.com>


This removes unused state_step_mode and associated flow from
osm_state_mgr_process().

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 opensm/include/opensm/osm_base.h      |   29 ------------
 opensm/include/opensm/osm_state_mgr.h |    2 -
 opensm/opensm/osm_state_mgr.c         |   81 +++------------------------------
 3 files changed, 6 insertions(+), 106 deletions(-)

diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h
index ee280d3..6bdea24 100644
--- a/opensm/include/opensm/osm_base.h
+++ b/opensm/include/opensm/osm_base.h
@@ -768,35 +768,6 @@ typedef enum _osm_sm_state
 typedef uintn_t osm_signal_t;
 /***********/
 
-/****d* OpenSM: Base/osm_state_mgr_mode_t
-* NAME
-*	 osm_state_mgr_mode_t
-*
-* DESCRIPTION
-*	Enumerates the possible state progressing codes used by the OSM 
-*	state manager.
-*
-* SYNOPSIS
-*/
-typedef enum _osm_state_mgr_mode
-{
-  OSM_STATE_STEP_CONTINUOUS = 0,
-  OSM_STATE_STEP_TAKE_ONE,
-  OSM_STATE_STEP_BREAK
-} osm_state_mgr_mode_t;
-/*
-* OSM_STATE_STEP_CONTINUOUS 
-*    normal automatic progress mode
-*
-* OSM_STATE_STEP_TAKE_ONE 
-*    Do one step 
-*
-* OSM_STATE_STEP_BREAK
-*    Stop before taking next step (the while loop in the state 
-*    manager automatically change to this state).
-*
-**********/
-
 /****d* OpenSM: Base/osm_sm_signal_t
 * NAME
 *	osm_sm_signal_t
diff --git a/opensm/include/opensm/osm_state_mgr.h b/opensm/include/opensm/osm_state_mgr.h
index 427b156..6975d18 100644
--- a/opensm/include/opensm/osm_state_mgr.h
+++ b/opensm/include/opensm/osm_state_mgr.h
@@ -118,8 +118,6 @@ typedef struct _osm_state_mgr
   cl_plock_t					*p_lock;
   cl_event_t					*p_subnet_up_event;
   osm_sm_state_t				state;
-  osm_state_mgr_mode_t     state_step_mode;
-  osm_signal_t             next_stage_signal;
 } osm_state_mgr_t;
 /*
 * FIELDS
diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index bcf68f2..893a423 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -153,8 +153,6 @@ osm_state_mgr_init(
    p_mgr->state = OSM_SM_STATE_IDLE;
    p_mgr->p_lock = p_lock;
    p_mgr->p_subnet_up_event = p_subnet_up_event;
-   p_mgr->state_step_mode = OSM_STATE_STEP_CONTINUOUS;
-   p_mgr->next_stage_signal = OSM_SIGNAL_NONE;
 
    status = cl_spinlock_init( &p_mgr->state_lock );
    if( status != CL_SUCCESS )
@@ -2332,21 +2330,8 @@ Idle:
          {
          case OSM_SIGNAL_NO_PENDING_TRANSACTIONS:
          case OSM_SIGNAL_DONE:
-            /* If we run single step we have already done this */
-            if( p_mgr->state_step_mode != OSM_STATE_STEP_TAKE_ONE )
-            {
-               __osm_state_mgr_set_sm_lid_done_msg( p_mgr );
-               __osm_state_mgr_notify_lid_change( p_mgr );
-            }
-
-            /* Break on single step mode - if not continuous */
-            if( p_mgr->state_step_mode == OSM_STATE_STEP_BREAK )
-            {
-               p_mgr->next_stage_signal = signal;
-               signal = OSM_SIGNAL_NONE;
-               break;
-            }
-
+            __osm_state_mgr_set_sm_lid_done_msg( p_mgr );
+            __osm_state_mgr_notify_lid_change( p_mgr );
             p_mgr->state = OSM_SM_STATE_SET_SUBNET_UCAST_LIDS;
             signal = osm_lid_mgr_process_subnet( p_mgr->p_lid_mgr );
             break;
@@ -2422,17 +2407,7 @@ Idle:
              * their destination. */
             __osm_state_mgr_check_tbl_consistency( p_mgr );
 
-            /* If we run single step we have already done this */
-            if( p_mgr->state_step_mode != OSM_STATE_STEP_TAKE_ONE )
-               __osm_state_mgr_lid_assign_msg( p_mgr );
-
-            /* Break on single step mode - just before taking next step */
-            if( p_mgr->state_step_mode == OSM_STATE_STEP_BREAK )
-            {
-               p_mgr->next_stage_signal = signal;
-               signal = OSM_SIGNAL_NONE;
-               break;
-            }
+            __osm_state_mgr_lid_assign_msg( p_mgr );
 
             /*
              * OK, the wire is clear, so proceed with
@@ -2444,12 +2419,6 @@ Idle:
             p_mgr->state = OSM_SM_STATE_SET_UCAST_TABLES;
             signal = osm_ucast_mgr_process( p_mgr->p_ucast_mgr );
 
-            /* Break on single step mode */
-            if( p_mgr->state_step_mode != OSM_STATE_STEP_CONTINUOUS )
-            {
-               p_mgr->next_stage_signal = signal;
-               signal = OSM_SIGNAL_NONE;
-            }
             break;
 
          default:
@@ -2507,17 +2476,7 @@ Idle:
              * take into account these lfts. */
             p_mgr->p_subn->ignore_existing_lfts = FALSE;
 
-            /* If we run single step we have already done this */
-            if( p_mgr->state_step_mode != OSM_STATE_STEP_TAKE_ONE )
-               __osm_state_mgr_switch_config_msg( p_mgr );
-
-            /* Break on single step mode - just before taking next step */
-            if( p_mgr->state_step_mode == OSM_STATE_STEP_BREAK )
-            {
-               p_mgr->next_stage_signal = signal;
-               signal = OSM_SIGNAL_NONE;
-               break;
-            }
+            __osm_state_mgr_switch_config_msg( p_mgr );
 
             if( !p_mgr->p_subn->opt.disable_multicast )
             {
@@ -2582,17 +2541,7 @@ Idle:
          {
          case OSM_SIGNAL_NO_PENDING_TRANSACTIONS:
          case OSM_SIGNAL_DONE:
-            /* If we run single step we have already done this */
-            if( p_mgr->state_step_mode != OSM_STATE_STEP_TAKE_ONE )
-               __osm_state_mgr_multicast_config_msg( p_mgr );
-
-            /* Break on single step mode - just before taking next step */
-            if( p_mgr->state_step_mode == OSM_STATE_STEP_BREAK )
-            {
-               p_mgr->next_stage_signal = signal;
-               signal = OSM_SIGNAL_NONE;
-               break;
-            }
+            __osm_state_mgr_multicast_config_msg( p_mgr );
 
             p_mgr->state = OSM_SM_STATE_SET_LINK_PORTS;
             signal = osm_link_mgr_process( p_mgr->p_link_mgr,
@@ -2714,17 +2663,7 @@ Idle:
          case OSM_SIGNAL_NO_PENDING_TRANSACTIONS:
          case OSM_SIGNAL_DONE:
 
-            /* If we run single step we have already done this */
-            if( p_mgr->state_step_mode != OSM_STATE_STEP_TAKE_ONE )
-               __osm_state_mgr_links_armed_msg( p_mgr );
-
-            /* Break on single step mode - just before taking next step */
-            if( p_mgr->state_step_mode == OSM_STATE_STEP_BREAK )
-            {
-               p_mgr->next_stage_signal = signal;
-               signal = OSM_SIGNAL_NONE;
-               break;
-            }
+            __osm_state_mgr_links_armed_msg( p_mgr );
 
             p_mgr->state = OSM_SM_STATE_SET_ACTIVE;
             signal = osm_link_mgr_process( p_mgr->p_link_mgr,
@@ -2925,14 +2864,6 @@ Idle:
          signal = OSM_SIGNAL_SWEEP;
       }
 
-      /*
-       * for single step mode - some stages need to break only
-       * after evaluating a single step.
-       * For those we track the fact we have already performed
-       * a single loop
-       */
-      if( p_mgr->state_step_mode == OSM_STATE_STEP_TAKE_ONE )
-         p_mgr->state_step_mode = OSM_STATE_STEP_BREAK;
    }
 
    cl_spinlock_release( &p_mgr->state_lock );
-- 
1.5.2.1.137.g426c


From sashak at voltaire.com  Sun Jun 10 15:33:01 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Mon, 11 Jun 2007 01:33:01 +0300
Subject: [ofa-general] [PATCH] opensm: clean unused
	OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED
Message-ID: <20070610223301.GC23029@sashak.voltaire.com>


This removes unused OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED sm signal
enum value.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 opensm/include/opensm/osm_base.h |    1 -
 opensm/opensm/osm_helper.c       |    7 +++----
 opensm/opensm/osm_sm_state_mgr.c |    8 --------
 3 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h
index 6bdea24..9a50d7d 100644
--- a/opensm/include/opensm/osm_base.h
+++ b/opensm/include/opensm/osm_base.h
@@ -788,7 +788,6 @@ typedef enum _osm_sm_signal
   OSM_SM_SIGNAL_HANDOVER_SENT,
   OSM_SM_SIGNAL_ACKNOWLEDGE,
   OSM_SM_SIGNAL_STANDBY,
-  OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED,
   OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED_DONE,
   OSM_SM_SIGNAL_WAIT_FOR_HANDOVER,
   OSM_SM_SIGNAL_MAX
diff --git a/opensm/opensm/osm_helper.c b/opensm/opensm/osm_helper.c
index 3745b55..724ecdf 100644
--- a/opensm/opensm/osm_helper.c
+++ b/opensm/opensm/osm_helper.c
@@ -2501,10 +2501,9 @@ const char* const __osm_sm_mgr_signal_str[] =
   "OSM_SM_SIGNAL_HANDOVER_SENT",         /* 7 */
   "OSM_SM_SIGNAL_ACKNOWLEDGE",           /* 8 */
   "OSM_SM_SIGNAL_STANDBY",               /* 9 */
-  "OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED",        /* 10 */
-  "OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED_DONE",    /* 11 */
-  "OSM_SM_SIGNAL_WAIT_FOR_HANDOVER",     /* 12 */
-  "UNKNOWN STATE!!"                      /* 13 */
+  "OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED_DONE",    /* 10 */
+  "OSM_SM_SIGNAL_WAIT_FOR_HANDOVER",     /* 11 */
+  "UNKNOWN STATE!!"                      /* 12 */
 
 };
 
diff --git a/opensm/opensm/osm_sm_state_mgr.c b/opensm/opensm/osm_sm_state_mgr.c
index 07c2af3..ccfb8b0 100644
--- a/opensm/opensm/osm_sm_state_mgr.c
+++ b/opensm/opensm/osm_sm_state_mgr.c
@@ -575,13 +575,6 @@ osm_sm_state_mgr_process(
           */
          p_sm_mgr->p_subn->master_sm_base_lid = p_sm_mgr->p_subn->sm_base_lid;
          break;
-      case OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED:
-         /*
-          * Stop the discovering
-          */
-         osm_state_mgr_process( p_sm_mgr->p_state_mgr,
-                                OSM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED );
-         break;
       case OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED_DONE:
          /*
           * Finished all discovery actions - move to STANDBY
@@ -813,7 +806,6 @@ osm_sm_state_mgr_check_legality(
       switch ( signal )
       {
       case OSM_SM_SIGNAL_DISCOVERY_COMPLETED:
-      case OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED:
       case OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED_DONE:
       case OSM_SM_SIGNAL_HANDOVER:
          status = IB_SUCCESS;
-- 
1.5.2.1.137.g426c


From jwong at datallegro.com  Sun Jun 10 18:03:23 2007
From: jwong at datallegro.com (Jeffrey Wong)
Date: Sun, 10 Jun 2007 21:03:23 -0400
Subject: [ofa-general] Having trouble pingpong between two nodes.
References: <A382D4292574EB47A85B8159A6AED1A1015EF2BC@FPNYEXCBE02.opus-i.corp>
	<466BABDD.20808@dev.mellanox.co.il>
Message-ID: <A382D4292574EB47A85B8159A6AED1A18305C0@FPNYEXCBE02.opus-i.corp>

Well now it seems to be working after not doing anything at all.  Thanks though for the info.

Jeff


-----Original Message-----
From: Dotan Barak [mailto:dotanb at dev.mellanox.co.il]
Sent: Sun 6/10/2007 3:44 AM
To: Jeffrey Wong
Cc: general at lists.openfabrics.org
Subject: Re: [ofa-general] Having trouble pingpong between two nodes.
 
Jeffrey Wong wrote:
>
> Hello,
>
> I am trying to run a ibv_ud_pingpong between two nodes but I can't 
> seem to get them to communicate. I have used the ping command between 
> the ib interfaces and that works fine, but when I try to use the 
> ibv_ud_ping pong it says the following:
>
> ________________________________________________________________________________
>
> root at centos5:node1 ~]# ibv_ud_pingpong 193.168.10.254
>
> local address: LID 0x0002, QPN 0x0f0406, PSN 0xb067dc
>
> Couldn't connect to 193.168.10.254:18515
>
> ____________________________________________________________________________
>
This is trivial, but did you execute ibv_ud_pingpong as the server in 
193.168.10.254?
(because you give any test parameters to the client, it should be 
executed only with: ibv_ud_pingpong).


Dotan


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070610/15b337c9/attachment.html>

From mst at dev.mellanox.co.il  Sun Jun 10 22:56:10 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 11 Jun 2007 08:56:10 +0300
Subject: [ofa-general] [PATCH trivial] IB/mlx4: include linux/mutex.h from
	mlx4_ib.h
Message-ID: <20070611055610.GC1454@mellanox.co.il>

mlx4_ib.h uses struct mutex, so while
linux/mutex.h seems to be pulled in indirectly
by one of the headers it includes, the right thing
is to include linux/mutex.h directly.

Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>

---

Roland, can you queue this for upstream pls?

Index: ofed_kernel/drivers/infiniband/hw/mlx4/mlx4_ib.h
===================================================================
--- ofed_kernel.orig/drivers/infiniband/hw/mlx4/mlx4_ib.h	2007-05-29 14:50:06.000000000 +0300
+++ ofed_kernel/drivers/infiniband/hw/mlx4/mlx4_ib.h	2007-06-11 08:49:13.000000000 +0300
@@ -35,6 +35,7 @@
 
 #include <linux/compiler.h>
 #include <linux/list.h>
+#include <linux/mutex.h>
 
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_umem.h>
-- 
MST


From mst at dev.mellanox.co.il  Sun Jun 10 23:09:42 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 11 Jun 2007 09:09:42 +0300
Subject: [ofa-general] [PATCH] net/mlx4: include linux/mutex.h from mlx4.h
Message-ID: <20070611060942.GE1454@mellanox.co.il>

mlx4.h uses struct mutex, so while
linux/mutex.h seems to be pulled in indirectly
by one of the headers it includes, the right thing
to do is to include linux/mutex.h directly.

Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>

---

Roland, can you queue this for upstream pls?

Index: ofed_kernel/drivers/net/mlx4/mlx4.h
===================================================================
--- ofed_kernel.orig/drivers/net/mlx4/mlx4.h	2007-05-29 14:50:26.000000000 +0300
+++ ofed_kernel/drivers/net/mlx4/mlx4.h	2007-06-11 09:07:36.000000000 +0300
@@ -37,6 +37,7 @@
 #ifndef MLX4_H
 #define MLX4_H
 
+#include <linux/mutex.h>
 #include <linux/radix-tree.h>
 
 #include <linux/mlx4/device.h>

-- 
MST


From eli at mellanox.co.il  Mon Jun 11 02:26:59 2007
From: eli at mellanox.co.il (Eli Cohen)
Date: Mon, 11 Jun 2007 12:26:59 +0300
Subject: [ofa-general] [PATCH} libmlx4: poll cq tail pointer
Message-ID: <1181554019.12020.3.camel@mtls03>

cast to uint16_t is required before assigning.
Consider the following example:
wqe_index = 0, wq->tail = 0x1ffff. You'd expect wq->tail to be 0x20000
but it will actually be 0x10000. The reason for this is that compiler
upcasts the result of wqe_index - (uint16_t) wq->tail to unsigned which
yields a large number and when added to the original value of tail it
overflows and actually becomes 0x10000.

Signed-off-by: Eli Cohen <eli at mellanox.co.il>

---

diff --git a/src/cq.c b/src/cq.c
index c4a3ca4..7597a5a 100644
--- a/src/cq.c
+++ b/src/cq.c
@@ -238,7 +238,7 @@ static int mlx4_poll_one(struct mlx4_cq *cq,
 	if (is_send) {
 		wq = &(*cur_qp)->sq;
 		wqe_index = ntohs(cqe->wqe_index);
-		wq->tail += wqe_index - (uint16_t) wq->tail;
+		wq->tail += (uint16_t)(wqe_index - (uint16_t) wq->tail);
 		wc->wr_id = wq->wrid[wq->tail & (wq->max - 1)];
 		++wq->tail;
 	} else if ((*cur_qp)->ibv_qp.srq) {


From kliteyn at dev.mellanox.co.il  Mon Jun 11 02:33:08 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Mon, 11 Jun 2007 12:33:08 +0300
Subject: [ofa-general] [PATCH] osm: reading guids file in ucast_mgr
Message-ID: <466D16D4.8000605@dev.mellanox.co.il>

Hi Hal,

This patch removes a code that was reading root guids file in 
osm_ucast_updn.c and replaces it with a more general function 
in osm_ucast_mgr.c

This function will also be used by fat-tree routing.

-- Yevgeny

Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

>From a8d32db1beacf6b42240357ab3e71584daadc791 Mon Sep 17 00:00:00 2001
From: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
Date: Mon, 11 Jun 2007 12:24:12 +0300
Subject: [PATCH 1/1] DELETE: make read_guid_file global

no changes added to commit (use "git add" and/or "git commit -a")

Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 opensm/include/opensm/osm_base.h      |    8 ++--
 opensm/include/opensm/osm_ucast_mgr.h |   36 ++++++++++++++++
 opensm/opensm/osm_ucast_mgr.c         |   74 +++++++++++++++++++++++++++++++++
 opensm/opensm/osm_ucast_updn.c        |   48 +++------------------
 4 files changed, 120 insertions(+), 46 deletions(-)

diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h
index ee280d3..7f043a0 100644
--- a/opensm/include/opensm/osm_base.h
+++ b/opensm/include/opensm/osm_base.h
@@ -844,16 +844,16 @@ typedef enum _osm_mcast_req_type
 }	osm_mcast_req_type_t;
 /***********/
 
-/****s* OpenSM: Base/MAX_UPDN_GUID_FILE_LINE_LENGTH
+/****s* OpenSM: Base/MAX_GUID_FILE_LINE_LENGTH
 * NAME
-*	MAX_UPDN_GUID_FILE_LINE_LENGTH
+*	MAX_GUID_FILE_LINE_LENGTH
 *
 * DESCRIPTION
-*	The maximum line number when reading updn guid file
+*	The maximum line number when reading guid file
 *
 * SYNOPSIS
 */
-#define MAX_UPDN_GUID_FILE_LINE_LENGTH 120
+#define MAX_GUID_FILE_LINE_LENGTH 120
 /**********/
 
 /****s* OpenSM: Base/VendorOUIs
diff --git a/opensm/include/opensm/osm_ucast_mgr.h b/opensm/include/opensm/osm_ucast_mgr.h
index 39bf45a..e003f31 100644
--- a/opensm/include/opensm/osm_ucast_mgr.h
+++ b/opensm/include/opensm/osm_ucast_mgr.h
@@ -293,6 +293,42 @@ osm_ucast_mgr_build_lid_matrices(
 *	Unicast Manager
 *********/
 
+/****f* OpenSM: Unicast Manager/osm_ucast_mgr_read_guid_file
+* NAME
+*	osm_ucast_mgr_read_guid_file
+*
+* DESCRIPTION
+*	Read guid list from file.
+*
+* SYNOPSIS
+*/
+cl_status_t
+osm_ucast_mgr_read_guid_file(
+	IN  osm_ucast_mgr_t * const p_mgr,
+	IN  const char      * guid_file_name,
+	IN  cl_list_t       * p_list );
+/*
+* PARAMETERS
+*	p_mgr
+*		[in] Pointer to an osm_ucast_mgr_t object.
+*
+*	guid_file_name
+*		[in] Name of the file to read.
+*
+*	p_list
+*		[in] Pointer to the list that will be filled with guids.
+*
+* RETURN VALUES
+*	IB_SUCCESS if the file was read successfully.
+*
+* NOTES
+*	This function reads guids from a file and inserts them
+*	into a list.
+*
+* SEE ALSO
+*	Unicast Manager
+*********/
+
 /****f* OpenSM: Unicast Manager/osm_ucast_mgr_process
 * NAME
 *	osm_ucast_mgr_process
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index 9f40242..5182718 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -1044,6 +1044,80 @@ ucast_mgr_setup_all_switches(osm_subn_t *p_subn)
 
 /**********************************************************************
  **********************************************************************/
+
+cl_status_t
+osm_ucast_mgr_read_guid_file(
+  IN  osm_ucast_mgr_t * const p_mgr,
+  IN  const char      * guid_file_name,
+  IN  cl_list_t       * p_list )
+{
+  cl_status_t   status = IB_SUCCESS;
+  FILE        * guid_file;
+  char          line[MAX_GUID_FILE_LINE_LENGTH];
+  char        * endptr;
+  uint64_t    * p_guid;
+
+  OSM_LOG_ENTER(p_mgr->p_log, osm_ucast_mgr_read_guid_file);
+
+  guid_file = fopen(guid_file_name, "r");
+  if (guid_file == NULL)
+  {
+    osm_log( p_mgr->p_log, OSM_LOG_ERROR,
+             "osm_ucast_mgr_read_guid_file: ERR 3A13: "
+             "Failed to open guid list file (%s)\n",
+             guid_file_name );
+    status = IB_NOT_FOUND;
+    goto Exit;
+  }
+
+  while ( fgets(line, MAX_GUID_FILE_LINE_LENGTH, guid_file) )
+  {
+    if (strcspn(line, " ,;.") != strlen(line))
+    {
+      osm_log( p_mgr->p_log, OSM_LOG_ERROR,
+               "osm_ucast_mgr_read_guid_file: ERR 3A14: "
+               "Bad formatted guid in file (%s): %s\n",
+               guid_file_name, line );
+      status = IB_NOT_FOUND;
+      break;
+    }
+
+    /* Skip empty lines anywhere in the file - only one 
+       char means the null termination */
+    if (strlen(line) <= 1)
+      continue;
+
+    p_guid = malloc(sizeof(uint64_t));
+    if (!p_guid)
+    {
+      status = IB_ERROR;
+      goto Exit;
+    }
+
+    *p_guid = strtoull(line, &endptr, 16);
+
+    /* check that the string is a number */
+    if (!(*p_guid) && (*endptr != '\0'))
+    {
+      osm_log( p_mgr->p_log, OSM_LOG_ERROR,
+               "osm_ucast_mgr_read_guid_file: ERR 3A15: "
+               "Bad formatted guid in file (%s): %s\n",
+               guid_file_name, line );
+      status = IB_NOT_FOUND;
+      break;
+    }
+
+    /* store the parsed guid */
+    cl_list_insert_tail(p_list, p_guid);
+  }
+
+Exit :
+  OSM_LOG_EXIT( p_mgr->p_log );
+  return (status);
+}
+
+/**********************************************************************
+ **********************************************************************/
 osm_signal_t
 osm_ucast_mgr_process(
   IN osm_ucast_mgr_t* const p_mgr )
diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c
index 95a0622..23a9db5 100644
--- a/opensm/opensm/osm_ucast_updn.c
+++ b/opensm/opensm/osm_ucast_updn.c
@@ -53,6 +53,7 @@
 #include <complib/cl_qmap.h>
 #include <opensm/osm_switch.h>
 #include <opensm/osm_opensm.h>
+#include <opensm/osm_ucast_mgr.h>
 
 /* //////////////////////////// */
 /*  Local types                 */
@@ -303,9 +304,6 @@ updn_init(
   IN osm_opensm_t *p_osm )
 {
   cl_list_t * p_list;
-  FILE*       p_updn_guid_file;
-  char        line[MAX_UPDN_GUID_FILE_LINE_LENGTH];
-  uint64_t *  p_tmp;
   cl_list_iterator_t guid_iterator;
   ib_api_status_t status = IB_SUCCESS;
 
@@ -332,45 +330,11 @@ updn_init(
   */
   if (p_osm->subn.opt.updn_guid_file)
   {
-    /* Now parse guid from file */
-    p_updn_guid_file = fopen(p_osm->subn.opt.updn_guid_file, "r");
-    if (p_updn_guid_file == NULL)
-    {
-      osm_log( &p_osm->log, OSM_LOG_ERROR,
-               "updn_init: ERR AA02: "
-               "Failed to open guid list file (%s)\n",
-               p_osm->subn.opt.updn_guid_file );
-      status = IB_NOT_FOUND;
-      goto Exit;
-    }
-
-    while ( fgets(line, MAX_UPDN_GUID_FILE_LINE_LENGTH, p_updn_guid_file) )
-    {
-      if (strcspn(line, " ,;.") == strlen(line))
-      {
-        /* Skip empty lines anywhere in the file - only one char means the Null termination */
-        if (strlen(line) > 1)
-        {
-          p_tmp = malloc(sizeof(uint64_t));
-          if (!p_tmp)
-          {
-            status = IB_ERROR;
-            goto Exit;
-          }
-          *p_tmp = strtoull(line, NULL, 16);
-          cl_list_insert_tail(p_updn->p_root_nodes, p_tmp);
-        }
-      }
-      else
-      {
-        osm_log( &p_osm->log, OSM_LOG_ERROR,
-                 "updn_init: ERR AA03: "
-                 "Bad formatted guid in file (%s): %s\n",
-                 p_osm->subn.opt.updn_guid_file, line );
-        status = IB_NOT_FOUND;
-        break;
-      }
-    }
+    status = osm_ucast_mgr_read_guid_file( &p_osm->sm.ucast_mgr,
+                                           p_osm->subn.opt.updn_guid_file,
+                                           p_updn->p_root_nodes );
+    if (status != IB_SUCCESS)
+       goto Exit;
 
     /* For Debug Purposes ... */
     osm_log( &p_osm->log, OSM_LOG_DEBUG,
-- 
1.5.1.4


From vlad at lists.openfabrics.org  Mon Jun 11 02:43:49 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Mon, 11 Jun 2007 02:43:49 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070611-0200 daily build status
Message-ID: <20070611094349.70A2FE6083E@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.15
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.12
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.16
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.15
Passed on x86_64 with linux-2.6.19
Passed on ppc64 with linux-2.6.19
Passed on ia64 with linux-2.6.16
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.12
Passed on powerpc with linux-2.6.17
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.15
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.13
Passed on ia64 with linux-2.6.17
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.16
Passed on x86_64 with linux-2.6.14
Passed on ppc64 with linux-2.6.14
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ppc64 with linux-2.6.13
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on ia64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From mst at dev.mellanox.co.il  Mon Jun 11 02:51:45 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 11 Jun 2007 12:51:45 +0300
Subject: [ofa-general] Re: [PATCH} libmlx4: poll cq tail pointer
In-Reply-To: <1181554019.12020.3.camel@mtls03>
References: <1181554019.12020.3.camel@mtls03>
Message-ID: <20070611095145.GB13815@mellanox.co.il>

> Quoting Eli Cohen <eli at mellanox.co.il>:
> Subject: [PATCH} libmlx4: poll cq tail pointer
> 
> cast to uint16_t is required before assigning.
> Consider the following example:
> wqe_index = 0, wq->tail = 0x1ffff. You'd expect wq->tail to be 0x20000
> but it will actually be 0x10000. The reason for this is that compiler
> upcasts the result of wqe_index - (uint16_t) wq->tail to unsigned which
> yields a large number and when added to the original value of tail it
> overflows and actually becomes 0x10000.
> 
> Signed-off-by: Eli Cohen <eli at mellanox.co.il>

And a similiar patch would be needed for kernel, would it not?
mthca does not seem to affected: it does all math on 32 bit integers.

-- 
MST


From halr at voltaire.com  Mon Jun 11 03:28:15 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 11 Jun 2007 06:28:15 -0400
Subject: [ofa-general] OpenSM Up-Down algorithm
In-Reply-To: <39C75744D164D948A170E9792AF8E7CA0D2914@exil.voltaire.com>
References: <39C75744D164D948A170E9792AF8E7CA0D2914@exil.voltaire.com>
Message-ID: <1181557691.8896.64610.camel@hal.voltaire.com>

Hi Sagi,

On Sun, 2007-06-10 at 11:59, Sagi Schlanger wrote:
> Hi,
>  
> I'm looking for some answers on Up-Down routing at OpenSM .
>  
> Is anybody familiar with a utility/procedure to find credit loops
> given a topology and routing settings?

I know there was at least talk of ibdiagnet (in ibutils) checking this.
Not sure if it is implemented (yet) or if it is routing algorithm
independent. Eitan ?

> Is there a handy spec describing the OpenSM Up-Down algorithm?

The OpenSM up/down routing is based on the following paper:

"Effective Strategy to Compute Forwarding Tables for InfiniBand Networks"
Jose Carlos Sancho, Universidad Politécnica de Valencia
Antonio Robles, Universidad Politécnica de Valencia
Jose Duato, Universidad Politécnica de Valencia

http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/icpp/2001/1257/00/1257toc.xml&DOI=10.1109/ICPP.2001.952046

> What is the scheme through which roots are defined on clos and non
> clos/fat tree topologies?

The admin can supply the roots via -a  <guid_list_file> option when
invoking OpenSM.

       Auto-detect root nodes - based on the CA hop length from any switch
       in  the  subnet,  a statistical histogram is built for each switch (hop
       num vs number of occurrences). If the  histogram  reflects  a  specific
       column  (higher than others) for a certain node, then it is marked as a
       root node. Since the algorithm is statistical, it may not find any root
       nodes.  The  list  of the root nodes found by this auto-detect stage is
       used by the ranking process stage.

           Note 1: The user can override the node list manually.
           Note 2: If this stage cannot find any root nodes, and the user did
                   not specify a guid list file, OpenSM defaults back to the
                   Min Hop routing algorithm.

> Is this algorithm always credit loop free?

It's supposed to be.

> How efficient is using this algorithm on non clos/fat tree topologies?

What do you mean by efficiency ? Also, are you asking about pure fat
tree or non pure fat tree (or both) ?

-- Hal

> Thanks for your cooperation,
> Sagi
> 
> ____________________________________________________________
> Sagi Schlanger  | +972-9-9717651 (o)   |   +972-52-2385154 (m)
> Software Engineer, IB Switch
> Voltaire – The Grid Backbone
>  
>  www.voltaire.com
> 
>  
> 
> ______________________________________________________________________
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From eli at mellanox.co.il  Mon Jun 11 03:52:25 2007
From: eli at mellanox.co.il (Eli Cohen)
Date: Mon, 11 Jun 2007 13:52:25 +0300
Subject: [ofa-general] Re: [PATCH} libmlx4: poll cq tail pointer
In-Reply-To: <20070611095145.GB13815@mellanox.co.il>
References: <1181554019.12020.3.camel@mtls03>
	<20070611095145.GB13815@mellanox.co.il>
Message-ID: <1181559145.16174.0.camel@mtls03>

On Mon, 2007-06-11 at 12:51 +0300, Michael S. Tsirkin wrote:
> And a similiar patch would be needed for kernel, would it not?

Yes, looks like.


From kliteyn at dev.mellanox.co.il  Mon Jun 11 04:02:23 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Mon, 11 Jun 2007 14:02:23 +0300
Subject: [ofa-general] [PATCHv2] osm: reading guids file in ucast_mgr
Message-ID: <466D2BBF.60406@dev.mellanox.co.il>

Hi Hal,

   | [V2] Nothing was changed in the patch, but the previous
   |      mail had some garbage in the explanation text.

This patch removes a code that was reading root guids file in 
osm_ucast_updn.c and replaces it with a more general function 
in osm_ucast_mgr.c

This function will also be used by fat-tree routing.

-- Yevgeny

Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
opensm/include/opensm/osm_base.h      |    8 ++--
opensm/include/opensm/osm_ucast_mgr.h |   36 ++++++++++++++++
opensm/opensm/osm_ucast_mgr.c         |   74 +++++++++++++++++++++++++++++++++
opensm/opensm/osm_ucast_updn.c        |   48 +++------------------
4 files changed, 120 insertions(+), 46 deletions(-)

diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h
index ee280d3..7f043a0 100644
--- a/opensm/include/opensm/osm_base.h
+++ b/opensm/include/opensm/osm_base.h
@@ -844,16 +844,16 @@ typedef enum _osm_mcast_req_type
}	osm_mcast_req_type_t;
/***********/

-/****s* OpenSM: Base/MAX_UPDN_GUID_FILE_LINE_LENGTH
+/****s* OpenSM: Base/MAX_GUID_FILE_LINE_LENGTH
* NAME
-*	MAX_UPDN_GUID_FILE_LINE_LENGTH
+*	MAX_GUID_FILE_LINE_LENGTH
*
* DESCRIPTION
-*	The maximum line number when reading updn guid file
+*	The maximum line number when reading guid file
*
* SYNOPSIS
*/
-#define MAX_UPDN_GUID_FILE_LINE_LENGTH 120
+#define MAX_GUID_FILE_LINE_LENGTH 120
/**********/

/****s* OpenSM: Base/VendorOUIs
diff --git a/opensm/include/opensm/osm_ucast_mgr.h b/opensm/include/opensm/osm_ucast_mgr.h
index 39bf45a..e003f31 100644
--- a/opensm/include/opensm/osm_ucast_mgr.h
+++ b/opensm/include/opensm/osm_ucast_mgr.h
@@ -293,6 +293,42 @@ osm_ucast_mgr_build_lid_matrices(
*	Unicast Manager
*********/

+/****f* OpenSM: Unicast Manager/osm_ucast_mgr_read_guid_file
+* NAME
+*	osm_ucast_mgr_read_guid_file
+*
+* DESCRIPTION
+*	Read guid list from file.
+*
+* SYNOPSIS
+*/
+cl_status_t
+osm_ucast_mgr_read_guid_file(
+	IN  osm_ucast_mgr_t * const p_mgr,
+	IN  const char      * guid_file_name,
+	IN  cl_list_t       * p_list );
+/*
+* PARAMETERS
+*	p_mgr
+*		[in] Pointer to an osm_ucast_mgr_t object.
+*
+*	guid_file_name
+*		[in] Name of the file to read.
+*
+*	p_list
+*		[in] Pointer to the list that will be filled with guids.
+*
+* RETURN VALUES
+*	IB_SUCCESS if the file was read successfully.
+*
+* NOTES
+*	This function reads guids from a file and inserts them
+*	into a list.
+*
+* SEE ALSO
+*	Unicast Manager
+*********/
+
/****f* OpenSM: Unicast Manager/osm_ucast_mgr_process
* NAME
*	osm_ucast_mgr_process
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index 9f40242..5182718 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -1044,6 +1044,80 @@ ucast_mgr_setup_all_switches(osm_subn_t *p_subn)

/**********************************************************************
 **********************************************************************/
+
+cl_status_t
+osm_ucast_mgr_read_guid_file(
+  IN  osm_ucast_mgr_t * const p_mgr,
+  IN  const char      * guid_file_name,
+  IN  cl_list_t       * p_list )
+{
+  cl_status_t   status = IB_SUCCESS;
+  FILE        * guid_file;
+  char          line[MAX_GUID_FILE_LINE_LENGTH];
+  char        * endptr;
+  uint64_t    * p_guid;
+
+  OSM_LOG_ENTER(p_mgr->p_log, osm_ucast_mgr_read_guid_file);
+
+  guid_file = fopen(guid_file_name, "r");
+  if (guid_file == NULL)
+  {
+    osm_log( p_mgr->p_log, OSM_LOG_ERROR,
+             "osm_ucast_mgr_read_guid_file: ERR 3A13: "
+             "Failed to open guid list file (%s)\n",
+             guid_file_name );
+    status = IB_NOT_FOUND;
+    goto Exit;
+  }
+
+  while ( fgets(line, MAX_GUID_FILE_LINE_LENGTH, guid_file) )
+  {
+    if (strcspn(line, " ,;.") != strlen(line))
+    {
+      osm_log( p_mgr->p_log, OSM_LOG_ERROR,
+               "osm_ucast_mgr_read_guid_file: ERR 3A14: "
+               "Bad formatted guid in file (%s): %s\n",
+               guid_file_name, line );
+      status = IB_NOT_FOUND;
+      break;
+    }
+
+    /* Skip empty lines anywhere in the file - only one 
+       char means the null termination */
+    if (strlen(line) <= 1)
+      continue;
+
+    p_guid = malloc(sizeof(uint64_t));
+    if (!p_guid)
+    {
+      status = IB_ERROR;
+      goto Exit;
+    }
+
+    *p_guid = strtoull(line, &endptr, 16);
+
+    /* check that the string is a number */
+    if (!(*p_guid) && (*endptr != '\0'))
+    {
+      osm_log( p_mgr->p_log, OSM_LOG_ERROR,
+               "osm_ucast_mgr_read_guid_file: ERR 3A15: "
+               "Bad formatted guid in file (%s): %s\n",
+               guid_file_name, line );
+      status = IB_NOT_FOUND;
+      break;
+    }
+
+    /* store the parsed guid */
+    cl_list_insert_tail(p_list, p_guid);
+  }
+
+Exit :
+  OSM_LOG_EXIT( p_mgr->p_log );
+  return (status);
+}
+
+/**********************************************************************
+ **********************************************************************/
osm_signal_t
osm_ucast_mgr_process(
  IN osm_ucast_mgr_t* const p_mgr )
diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c
index 95a0622..23a9db5 100644
--- a/opensm/opensm/osm_ucast_updn.c
+++ b/opensm/opensm/osm_ucast_updn.c
@@ -53,6 +53,7 @@
#include <complib/cl_qmap.h>
#include <opensm/osm_switch.h>
#include <opensm/osm_opensm.h>
+#include <opensm/osm_ucast_mgr.h>

/* //////////////////////////// */
/*  Local types                 */
@@ -303,9 +304,6 @@ updn_init(
  IN osm_opensm_t *p_osm )
{
  cl_list_t * p_list;
-  FILE*       p_updn_guid_file;
-  char        line[MAX_UPDN_GUID_FILE_LINE_LENGTH];
-  uint64_t *  p_tmp;
  cl_list_iterator_t guid_iterator;
  ib_api_status_t status = IB_SUCCESS;

@@ -332,45 +330,11 @@ updn_init(
  */
  if (p_osm->subn.opt.updn_guid_file)
  {
-    /* Now parse guid from file */
-    p_updn_guid_file = fopen(p_osm->subn.opt.updn_guid_file, "r");
-    if (p_updn_guid_file == NULL)
-    {
-      osm_log( &p_osm->log, OSM_LOG_ERROR,
-               "updn_init: ERR AA02: "
-               "Failed to open guid list file (%s)\n",
-               p_osm->subn.opt.updn_guid_file );
-      status = IB_NOT_FOUND;
-      goto Exit;
-    }
-
-    while ( fgets(line, MAX_UPDN_GUID_FILE_LINE_LENGTH, p_updn_guid_file) )
-    {
-      if (strcspn(line, " ,;.") == strlen(line))
-      {
-        /* Skip empty lines anywhere in the file - only one char means the Null termination */
-        if (strlen(line) > 1)
-        {
-          p_tmp = malloc(sizeof(uint64_t));
-          if (!p_tmp)
-          {
-            status = IB_ERROR;
-            goto Exit;
-          }
-          *p_tmp = strtoull(line, NULL, 16);
-          cl_list_insert_tail(p_updn->p_root_nodes, p_tmp);
-        }
-      }
-      else
-      {
-        osm_log( &p_osm->log, OSM_LOG_ERROR,
-                 "updn_init: ERR AA03: "
-                 "Bad formatted guid in file (%s): %s\n",
-                 p_osm->subn.opt.updn_guid_file, line );
-        status = IB_NOT_FOUND;
-        break;
-      }
-    }
+    status = osm_ucast_mgr_read_guid_file( &p_osm->sm.ucast_mgr,
+                                           p_osm->subn.opt.updn_guid_file,
+                                           p_updn->p_root_nodes );
+    if (status != IB_SUCCESS)
+       goto Exit;

    /* For Debug Purposes ... */
    osm_log( &p_osm->log, OSM_LOG_DEBUG,
-- 
1.5.1.4


From kliteyn at dev.mellanox.co.il  Mon Jun 11 04:04:47 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Mon, 11 Jun 2007 14:04:47 +0300
Subject: [ofa-general] [PATCH] osm: adding 4 options to ftree routing
Message-ID: <466D2C4F.4050108@dev.mellanox.co.il>

Hi Hal,

Adding four options for fat-tree routing:

	ftree_root_guid_file
		Name of the file that contains list of root guids that
		will be used by fat-tree routing (provided by User)
	ftree_cn_guid_file
		Name of the file that contains list of compute node guids that
		will be used by fat-tree routing (provided by User)
	ftree_include_guid_file
		Name of the file that contains list of node guids that
		will be included when performing fat-tree routing (provided by User)
	ftree_exclude_guid_file
		Name of the file that contains list of node guids that
		will be excluded when performing fat-tree routing (provided by User)

For now, these options are exposed through options file only.

-- Yevgeny

Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 opensm/include/opensm/osm_subnet.h |   20 +++++++++++++++
 opensm/opensm/osm_subnet.c         |   46 ++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
index c62128b..39eed2b 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -279,6 +279,10 @@ typedef struct _osm_subn_opt
   char *                   lid_matrix_dump_file;
   char *                   ucast_dump_file;
   char *                   updn_guid_file;
+  char *                   ftree_root_guid_file;
+  char *                   ftree_cn_guid_file;
+  char *                   ftree_include_guid_file;
+  char *                   ftree_exclude_guid_file;
   char *                   sa_db_file;
   boolean_t                exit_on_fatal;
   boolean_t                honor_guid2lid_file;
@@ -455,6 +459,22 @@ typedef struct _osm_subn_opt
 *	updn_guid_file
 *		Pointer to name of the UPDN guid file given by User
 *
+*	ftree_root_guid_file
+*		Name of the file that contains list of root guids that
+*		will be used by fat-tree routing (provided by User)
+*
+*	ftree_cn_guid_file
+*		Name of the file that contains list of compute node guids that
+*		will be used by fat-tree routing (provided by User)
+*
+*	ftree_include_guid_file
+*		Name of the file that contains list of node guids that
+*		will be included when performing fat-tree routing (provided by User)
+*
+*	ftree_exclude_guid_file
+*		Name of the file that contains list of node guids that
+*		will be excluded when performing fat-tree routing (provided by User)
+*
 *	sa_db_file
 *		Name of the SA database file.
 *
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 736f49a..7219876 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -501,6 +501,10 @@ osm_subn_set_default_opt(
   p_opt->lid_matrix_dump_file = NULL;
   p_opt->ucast_dump_file = NULL;
   p_opt->updn_guid_file = NULL;
+  p_opt->ftree_root_guid_file = NULL;
+  p_opt->ftree_cn_guid_file = NULL;
+  p_opt->ftree_include_guid_file = NULL;
+  p_opt->ftree_exclude_guid_file = NULL;
   p_opt->sa_db_file = NULL;
   p_opt->exit_on_fatal = TRUE;
   p_opt->enable_quirks = FALSE;
@@ -1326,6 +1330,22 @@ osm_subn_parse_conf_file(
         "updn_guid_file",
         p_key, p_val, &p_opts->updn_guid_file);
 
+      __osm_subn_opts_unpack_charp( 
+        "updn_guid_file",
+        p_key, p_val, &p_opts->ftree_root_guid_file);
+
+      __osm_subn_opts_unpack_charp( 
+        "updn_guid_file",
+        p_key, p_val, &p_opts->ftree_cn_guid_file);
+
+      __osm_subn_opts_unpack_charp( 
+        "updn_guid_file",
+        p_key, p_val, &p_opts->ftree_include_guid_file);
+
+      __osm_subn_opts_unpack_charp( 
+        "updn_guid_file",
+        p_key, p_val, &p_opts->ftree_exclude_guid_file);
+
       __osm_subn_opts_unpack_charp(
         "sa_db_file",
         p_key, p_val, &p_opts->sa_db_file);
@@ -1554,6 +1574,32 @@ osm_subn_write_conf_file(
              "# One guid in each line\n"
              "updn_guid_file %s\n\n",
              p_opts->updn_guid_file);
+  if (p_opts->ftree_root_guid_file)
+    fprintf( opts_file,
+             "# The file holding the fat-tree root node guids\n"
+             "# One guid in each line\n"
+             "ftree_root_guid_file %s\n\n",
+             p_opts->ftree_root_guid_file);
+  if (p_opts->ftree_cn_guid_file)
+    fprintf( opts_file,
+             "# The file holding the fat-tree compute node guids\n"
+             "# One guid in each line\n"
+             "ftree_cn_guid_file %s\n\n",
+             p_opts->ftree_cn_guid_file);
+  if (p_opts->ftree_include_guid_file)
+    fprintf( opts_file,
+             "# The file holding the node guids that should be included\n"
+             "# in fat-tree routing balancing\n"
+             "# One guid in each line\n"
+             "ftree_include_guid_file %s\n\n",
+             p_opts->ftree_include_guid_file);
+  if (p_opts->ftree_exclude_guid_file)
+    fprintf( opts_file,
+             "# The file holding the node guids that should be excluded\n"
+             "# from fat-tree routing balancing\n"
+             "# One guid in each line\n"
+             "ftree_exclude_guid_file %s\n\n",
+             p_opts->ftree_exclude_guid_file);
   if (p_opts->sa_db_file)
     fprintf( opts_file,
              "# SA database file name\n"
-- 
1.5.1.4


From sfac at telus.net  Mon Jun 11 06:19:07 2007
From: sfac at telus.net (basis)
Date: Mon, 11 Jun 2007 08:19:07 -0500
Subject: [ofa-general] delve
Message-ID: <466D4BCB.2070806@telus.net>

CAON Now Holds 12 Environmental Patents! Investors Respond!

Chan-On International Inc.
Symbol: CAON
Close: $0.72 UP 4.35%

CAON acquires Harbin Hongbo and its 12 patents. This company's new
direction was released in a fact sheet Friday. Investors are already
jumping all over it. Read the release and get all over CAON first thing
Monday!

All women enjoy the ectsacy of this front-fastening dildo penetrating
them! Infomedia UK Ltd are not responsible for any of the content
displayed on this page or any other part of the live webcams section of
this site.

You cannot buy Dinky Banger cheaper online in the UK! It is just like
watching a small TV screen on your computer. The non-tarnashing nickel
free clip gently squeezes and lifts the clitoris while the crystals move
against the labia driving you both crazy when you make love. Your nipple
is drawn out and held erect thus increasing sexual sensitivity. A
powerful waterproof wireless bullet fits neatly into a ribbed and ridged
sleeve to provide a seriously turbo-charged clitoral stimulator and cock
ring set. It works with your body transmitting small muscle tremors and
contractions via the 'plug' to your Prostate and via the up-turned probe
to your Perineum. Smuggled from the orient, this mysterious recipe has
been poorly copied and black-marketed for decades but only now has
become available to the public in what's believed to be its original
form.

It works with your body transmitting small muscle tremors and
contractions via the 'plug' to your Prostate and via the up-turned probe
to your Perineum.

Both massagers can be turned on separately or simultaneously and can be
enjoyed by both partners or on your own, try putting one in your panty
and one in your bra, then go out and have some fun! From his
star-turning vehicle Stone Fox to the multi-award winning Bolt, Eddie
Stone always delivers. You cannot buy Nipple Enlarger cheaper online in
the UK! You cannot buy Finger Rabbit cheaper online in the UK! The
ribbed part of the shaft that lands right on the clitoris gets you
closer and closer to orgasm as you grind away. You cannot buy Clit Clip
cheaper online in the UK!

The non-tarnashing nickel free clip gently squeezes and lifts the
clitoris while the crystals move against the labia driving you both
crazy when you make love. The inside of the masturbator has a squirmy
action that massages the shaft of your penis while the top of the
masturbator contains a high-powered vibrating bullet.

The non-tarnashing nickel free clip gently squeezes and lifts the clit.

You pump this baby to build your manhood.

Your pleasure is our business Sex toys and lingerie are a fun and safe
way to bring excitement into your love life.

So it is just about legal!

You cannot buy Finger Rabbit cheaper online in the UK! Please note that
you can view without having to download any software. Half of us have
penises, the other half have vaginas, let's get together! It is probably
not politically correct and We don't give a fuck about that.

Many sites make this claim. They just assume that you are daft enough to
take them at their word.

AdultsExoticA is about having a laugh, about fun and about not taking
things too seriously.

The shaft itself is made of firm plastic covered in latex and features a
strengthening rib.

Half of us have penises, the other half have vaginas, let's get
together! In the shower that is! It works with your body transmitting
small muscle tremors and contractions via the 'plug' to your Prostate
and via the up-turned probe to your Perineum.

He was an overnight sensation since his first release, Detention.

It will never subscribe to that narrow minded way of thinking.

Our adult webcam chathosts include girls, guys, gays and lesbians,
couples and groups, transvestites, transsexuals and cross dressers.
So it is just about legal! All women enjoy the ectsacy of this
front-fastening dildo penetrating them!

You cannot buy Finger Rabbit cheaper online in the UK! Allows for skin
on skin contact between partners with nothing to get in the way and
regular use will strengthen the kegel muscles in the vagina which will
help  produce more intense orgasms. Please note our SPECIAL price. Use
with the matching Cyberskin ring and micro-bullet for testicular or
clitoral stimulation. Conquer your lover with unending passion and
pleasure when you massge this secret potion on delicate vaginal walls.

Your pleasure is our business Sex toys and lingerie are a fun and safe
way to bring excitement into your love life. Moulded form Johhny himself
this super life-like, highly detailed dong is over eight inches long and
made from Sensafirm rubber for a smooth comfortable ride. The shaft
itself is made of firm plastic covered in latex and features a
strengthening rib.

Why shop at Bionic Tonic?
And there's a kinky surprise waiting around every corner.


From halr at voltaire.com  Mon Jun 11 06:22:44 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 11 Jun 2007 09:22:44 -0400
Subject: [ofa-general] Re: [PATCHv2] osm: reading guids file in ucast_mgr
In-Reply-To: <466D2BBF.60406@dev.mellanox.co.il>
References: <466D2BBF.60406@dev.mellanox.co.il>
Message-ID: <1181568159.8896.75500.camel@hal.voltaire.com>

Hi Yevgeny,

On Mon, 2007-06-11 at 07:02, Yevgeny Kliteynik wrote:
> Hi Hal,
> 
>    | [V2] Nothing was changed in the patch, but the previous
>    |      mail had some garbage in the explanation text.

This patch version causes: 
File to patch: include/opensm/osm_base.h
patching file include/opensm/osm_base.h
patch: **** malformed patch at line 65: }       osm_mcast_req_type_t;

So I used the original patch with the comments from here.

> This patch removes a code that was reading root guids file in 
> osm_ucast_updn.c and replaces it with a more general function 
> in osm_ucast_mgr.c
> 
> This function will also be used by fat-tree routing.
> 
> -- Yevgeny
> 
> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> ---
> opensm/include/opensm/osm_base.h      |    8 ++--
> opensm/include/opensm/osm_ucast_mgr.h |   36 ++++++++++++++++
> opensm/opensm/osm_ucast_mgr.c         |   74 +++++++++++++++++++++++++++++++++
> opensm/opensm/osm_ucast_updn.c        |   48 +++------------------
> 4 files changed, 120 insertions(+), 46 deletions(-)

Thanks. Applied.

-- Hal


From eitan at mellanox.co.il  Mon Jun 11 06:32:47 2007
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: Mon, 11 Jun 2007 16:32:47 +0300
Subject: [ofa-general] OpenSM Up-Down algorithm
In-Reply-To: <1181557691.8896.64610.camel@hal.voltaire.com>
References: <39C75744D164D948A170E9792AF8E7CA0D2914@exil.voltaire.com>
	<1181557691.8896.64610.camel@hal.voltaire.com>
Message-ID: <6C2C79E72C305246B504CBA17B5500C901AAF78C@mtlexch01.mtl.com>

Hi Hal, Sagi,
> 
> On Sun, 2007-06-10 at 11:59, Sagi Schlanger wrote:
> > Hi,
> >  
> > I'm looking for some answers on Up-Down routing at OpenSM .
> >  
> > Is anybody familiar with a utility/procedure to find credit loops 
> > given a topology and routing settings?
> 
> I know there was at least talk of ibdiagnet (in ibutils) 
> checking this.
> Not sure if it is implemented (yet) or if it is routing 
> algorithm independent. Eitan ?
> 
> > Is there a handy spec describing the OpenSM Up-Down algorithm?
> 
> The OpenSM up/down routing is based on the following paper:
> 
> "Effective Strategy to Compute Forwarding Tables for 
> InfiniBand Networks"
> Jose Carlos Sancho, Universidad Politécnica de Valencia 
> Antonio Robles, Universidad Politécnica de Valencia Jose 
> Duato, Universidad Politécnica de Valencia
> 
> http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/
> dl/proceedings/&toc=comp/proceedings/icpp/2001/1257/00/1257toc
> .xml&DOI=10.1109/ICPP.2001.952046
> 
> > What is the scheme through which roots are defined on clos and non 
> > clos/fat tree topologies?
> 
> The admin can supply the roots via -a  <guid_list_file> 
> option when invoking OpenSM.
> 
>        Auto-detect root nodes - based on the CA hop length 
> from any switch
>        in  the  subnet,  a statistical histogram is built for 
> each switch (hop
>        num vs number of occurrences). If the  histogram  
> reflects  a  specific
>        column  (higher than others) for a certain node, then 
> it is marked as a
>        root node. Since the algorithm is statistical, it may 
> not find any root
>        nodes.  The  list  of the root nodes found by this 
> auto-detect stage is
>        used by the ranking process stage.
> 
>            Note 1: The user can override the node list manually.
>            Note 2: If this stage cannot find any root nodes, 
> and the user did
>                    not specify a guid list file, OpenSM 
> defaults back to the
>                    Min Hop routing algorithm.
> 
> > Is this algorithm always credit loop free?
> 
YES IT IS
> It's supposed to be.
> 
> > How efficient is using this algorithm on non clos/fat tree 
> topologies?
> 
> What do you mean by efficiency ? Also, are you asking about 
> pure fat tree or non pure fat tree (or both) ?
> 
> -- Hal
> 
> > Thanks for your cooperation,
> > Sagi
> > 
> > ____________________________________________________________
> > Sagi Schlanger  | +972-9-9717651 (o)   |   +972-52-2385154 (m)
> > Software Engineer, IB Switch
> > Voltaire - The Grid Backbone
> >  
> >  www.voltaire.com
> > 
> >  
> > 
> > 
> ______________________________________________________________________
> > 
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > 
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 


From kliteyn at dev.mellanox.co.il  Mon Jun 11 06:55:10 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Mon, 11 Jun 2007 16:55:10 +0300
Subject: [ofa-general] [PATCH] osm: adding 2 options to ftree routing
Message-ID: <466D543E.2020209@dev.mellanox.co.il>

Hi Hal,

[this patch replaces the "adding 4 options to ftree routing" patch]

Adding two options for fat-tree routing:

  ftree_root_guid_file
 	Name of the file that contains list of root guids that
 	will be used by fat-tree routing (provided by User)
  ftree_cn_guid_file
 	Name of the file that contains list of compute node guids that
 	will be used by fat-tree routing (provided by User)

For now, these options are exposed through options file only.
 
-- Yevgeny

Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 opensm/include/opensm/osm_subnet.h |   10 ++++++++++
 opensm/opensm/osm_subnet.c         |   22 ++++++++++++++++++++++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
index c62128b..46d90d6 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -279,6 +279,8 @@ typedef struct _osm_subn_opt
   char *                   lid_matrix_dump_file;
   char *                   ucast_dump_file;
   char *                   updn_guid_file;
+  char *                   ftree_root_guid_file;
+  char *                   ftree_cn_guid_file;
   char *                   sa_db_file;
   boolean_t                exit_on_fatal;
   boolean_t                honor_guid2lid_file;
@@ -455,6 +457,14 @@ typedef struct _osm_subn_opt
 *	updn_guid_file
 *		Pointer to name of the UPDN guid file given by User
 *
+*	ftree_root_guid_file
+*		Name of the file that contains list of root guids that
+*		will be used by fat-tree routing (provided by User)
+*
+*	ftree_cn_guid_file
+*		Name of the file that contains list of compute node guids that
+*		will be used by fat-tree routing (provided by User)
+*
 *	sa_db_file
 *		Name of the SA database file.
 *
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 736f49a..a39ada6 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -501,6 +501,8 @@ osm_subn_set_default_opt(
   p_opt->lid_matrix_dump_file = NULL;
   p_opt->ucast_dump_file = NULL;
   p_opt->updn_guid_file = NULL;
+  p_opt->ftree_root_guid_file = NULL;
+  p_opt->ftree_cn_guid_file = NULL;
   p_opt->sa_db_file = NULL;
   p_opt->exit_on_fatal = TRUE;
   p_opt->enable_quirks = FALSE;
@@ -1326,6 +1328,14 @@ osm_subn_parse_conf_file(
         "updn_guid_file",
         p_key, p_val, &p_opts->updn_guid_file);
 
+      __osm_subn_opts_unpack_charp( 
+        "updn_guid_file",
+        p_key, p_val, &p_opts->ftree_root_guid_file);
+
+      __osm_subn_opts_unpack_charp( 
+        "updn_guid_file",
+        p_key, p_val, &p_opts->ftree_cn_guid_file);
+
       __osm_subn_opts_unpack_charp(
         "sa_db_file",
         p_key, p_val, &p_opts->sa_db_file);
@@ -1554,6 +1564,18 @@ osm_subn_write_conf_file(
              "# One guid in each line\n"
              "updn_guid_file %s\n\n",
              p_opts->updn_guid_file);
+  if (p_opts->ftree_root_guid_file)
+    fprintf( opts_file,
+             "# The file holding the fat-tree root node guids\n"
+             "# One guid in each line\n"
+             "ftree_root_guid_file %s\n\n",
+             p_opts->ftree_root_guid_file);
+  if (p_opts->ftree_cn_guid_file)
+    fprintf( opts_file,
+             "# The file holding the fat-tree compute node guids\n"
+             "# One guid in each line\n"
+             "ftree_cn_guid_file %s\n\n",
+             p_opts->ftree_cn_guid_file);
   if (p_opts->sa_db_file)
     fprintf( opts_file,
              "# SA database file name\n"
-- 
1.5.1.4


From kliteyn at dev.mellanox.co.il  Mon Jun 11 07:15:14 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Mon, 11 Jun 2007 17:15:14 +0300
Subject: [ofa-general] [PATCH] osm: up/dn ranking - making code more
	intuitive
In-Reply-To: <20070524225428.GK837@sashak.voltaire.com>
References: <46503064.7010107@dev.mellanox.co.il>
	<20070520161034.GY19271@sashak.voltaire.com>
	<4651557E.2080400@dev.mellanox.co.il>
	<20070524225428.GK837@sashak.voltaire.com>
Message-ID: <466D58F2.1020402@dev.mellanox.co.il>

Hi Hal.

Following up our discussion with Sasha regarding the ranking 
optimization in up/dn routing:

>> I do think that to make the code more "intuitive" we might  
>> want to remove the __updn_update_rank() and do something like this:
>>
>>    if (remote_u->rank > u->rank + 1)
>>    {
>>        remote_u->rank = u->rank + 1;
>>        max_rank = remote_u->rank; 
>>        cl_qlist_insert_tail(&list, &remote_u->list);
>>    }
 
Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 opensm/opensm/osm_ucast_updn.c |   33 ++++++++-------------------------
 1 files changed, 8 insertions(+), 25 deletions(-)

diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c
index 23a9db5..2448246 100644
--- a/opensm/opensm/osm_ucast_updn.c
+++ b/opensm/opensm/osm_ucast_updn.c
@@ -135,23 +135,6 @@ __updn_get_dir(
 }
 
 /**********************************************************************
- **********************************************************************/
-/* This function updates rank value for a node */
-/* Return 0 if no need to further update 1 if determined a new value */
-static int
-__updn_update_rank(
-  IN struct updn_node *u,
-  IN unsigned rank )
-{
-  if (u->rank > rank)
-  {
-    u->rank = rank;
-    return 1;
-  }
-  return 0;
-}
-
-/**********************************************************************
  * This function does the bfs of min hop table calculation by guid index
  * as a starting point.
  **********************************************************************/
@@ -375,7 +358,6 @@ updn_subn_rank(
   osm_switch_t *p_sw;
   osm_physp_t *p_physp, *p_remote_physp;
   cl_qlist_t list;
-  cl_status_t did_cause_update;
   struct updn_node *u, *remote_u;
   uint8_t num_ports, port_num;
   osm_log_t *p_log = &p_updn->p_osm->log;
@@ -403,7 +385,7 @@ updn_subn_rank(
     osm_log( p_log, OSM_LOG_DEBUG,
              "updn_subn_rank: "
              "Ranking root port GUID 0x%" PRIx64 "\n", guid_list[idx] );
-    __updn_update_rank(u, 0);
+    u->rank = 0;
     cl_qlist_insert_tail(&list, &u->list);
   }
 
@@ -438,7 +420,13 @@ updn_subn_rank(
       {
         remote_u = p_remote_physp->p_node->sw->priv;
         port_guid = p_remote_physp->port_guid;
-        did_cause_update = __updn_update_rank(remote_u, u->rank+1);
+
+        if (remote_u->rank > u->rank+1)
+        {
+           remote_u->rank = u->rank + 1;
+           max_rank = remote_u->rank;
+           cl_qlist_insert_tail(&list, &remote_u->list);
+        }
 
         osm_log( p_log, OSM_LOG_DEBUG,
                  "updn_subn_rank: "
@@ -446,11 +434,6 @@ updn_subn_rank(
                  cl_ntoh64(port_guid),
                  remote_u->rank );
 
-        if (did_cause_update)
-        {
-          cl_qlist_insert_tail(&list, &remote_u->list);
-          max_rank = remote_u->rank;
-        }
       }
     }
   }
-- 
1.5.1.4


From kliteyn at dev.mellanox.co.il  Mon Jun 11 07:21:37 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Mon, 11 Jun 2007 17:21:37 +0300
Subject: [ofa-general] [PATCH] osm: TRIVIAL bug fix
Message-ID: <466D5A71.2040301@dev.mellanox.co.il>

Hi Hal,

Fixing a small bug that was "inherited" when moved code that 
reads guid file from osm_ucast_updn.c to osm_ucast_mgr.c - 
closing file descriptor when finished reading the guid file.

-- Yevgeny

Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 opensm/opensm/osm_ucast_mgr.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index b080f59..d855683 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -1052,7 +1052,7 @@ osm_ucast_mgr_read_guid_file(
   IN  cl_list_t       * p_list )
 {
   cl_status_t   status = IB_SUCCESS;
-  FILE        * guid_file;
+  FILE        * guid_file = NULL;
   char          line[MAX_GUID_FILE_LINE_LENGTH];
   char        * endptr;
   uint64_t    * p_guid;
@@ -1112,6 +1112,8 @@ osm_ucast_mgr_read_guid_file(
   }
 
 Exit :
+  if (guid_file)
+    fclose(guid_file);
   OSM_LOG_EXIT( p_mgr->p_log );
   return (status);
 }
-- 
1.5.1.4


From halr at voltaire.com  Mon Jun 11 07:39:19 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 11 Jun 2007 10:39:19 -0400
Subject: [ofa-general] Re: [PATCH] osm: TRIVIAL bug fix
In-Reply-To: <466D5A71.2040301@dev.mellanox.co.il>
References: <466D5A71.2040301@dev.mellanox.co.il>
Message-ID: <1181572757.8896.80271.camel@hal.voltaire.com>

Hi Yevgeny,

On Mon, 2007-06-11 at 10:21, Yevgeny Kliteynik wrote:
> Hi Hal,
> 
> Fixing a small bug that was "inherited" when moved code that 
> reads guid file from osm_ucast_updn.c to osm_ucast_mgr.c - 
> closing file descriptor when finished reading the guid file.
> 
> -- Yevgeny
> 
> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> ---
>  opensm/opensm/osm_ucast_mgr.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
> index b080f59..d855683 100644
> --- a/opensm/opensm/osm_ucast_mgr.c
> +++ b/opensm/opensm/osm_ucast_mgr.c
> @@ -1052,7 +1052,7 @@ osm_ucast_mgr_read_guid_file(
>    IN  cl_list_t       * p_list )
>  {
>    cl_status_t   status = IB_SUCCESS;
> -  FILE        * guid_file;
> +  FILE        * guid_file = NULL;

Is this really needed ? Doesn't fopen return NULL on error ?

-- Hal

>    char          line[MAX_GUID_FILE_LINE_LENGTH];
>    char        * endptr;
>    uint64_t    * p_guid;
> @@ -1112,6 +1112,8 @@ osm_ucast_mgr_read_guid_file(
>    }
>  
>  Exit :
> +  if (guid_file)
> +    fclose(guid_file);
>    OSM_LOG_EXIT( p_mgr->p_log );
>    return (status);
>  }


From kliteyn at dev.mellanox.co.il  Mon Jun 11 07:50:38 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Mon, 11 Jun 2007 17:50:38 +0300
Subject: [ofa-general] Re: [PATCH] osm: TRIVIAL bug fix
In-Reply-To: <1181572757.8896.80271.camel@hal.voltaire.com>
References: <466D5A71.2040301@dev.mellanox.co.il>
	<1181572757.8896.80271.camel@hal.voltaire.com>
Message-ID: <466D613E.8080107@dev.mellanox.co.il>

Hi Hal,

Hal Rosenstock wrote:
> Hi Yevgeny,
> 
> On Mon, 2007-06-11 at 10:21, Yevgeny Kliteynik wrote:
>> Hi Hal,
>>
>> Fixing a small bug that was "inherited" when moved code that 
>> reads guid file from osm_ucast_updn.c to osm_ucast_mgr.c - 
>> closing file descriptor when finished reading the guid file.
>>
>> -- Yevgeny
>>
>> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
>> ---
>>  opensm/opensm/osm_ucast_mgr.c |    4 +++-
>>  1 files changed, 3 insertions(+), 1 deletions(-)
>>
>> diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
>> index b080f59..d855683 100644
>> --- a/opensm/opensm/osm_ucast_mgr.c
>> +++ b/opensm/opensm/osm_ucast_mgr.c
>> @@ -1052,7 +1052,7 @@ osm_ucast_mgr_read_guid_file(
>>    IN  cl_list_t       * p_list )
>>  {
>>    cl_status_t   status = IB_SUCCESS;
>> -  FILE        * guid_file;
>> +  FILE        * guid_file = NULL;
> 
> Is this really needed ? Doesn't fopen return NULL on error ?

You're right, it's not needed.

-- Yevgeny.
 
> -- Hal
> 
>>    char          line[MAX_GUID_FILE_LINE_LENGTH];
>>    char        * endptr;
>>    uint64_t    * p_guid;
>> @@ -1112,6 +1112,8 @@ osm_ucast_mgr_read_guid_file(
>>    }
>>  
>>  Exit :
>> +  if (guid_file)
>> +    fclose(guid_file);
>>    OSM_LOG_EXIT( p_mgr->p_log );
>>    return (status);
>>  }
> 
> 


From jackm at dev.mellanox.co.il  Mon Jun 11 08:09:50 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Mon, 11 Jun 2007 18:09:50 +0300
Subject: [ofa-general] [PATCH] libmlx4: fix problem in post_send error flow
	(inline wqes)
Message-ID: <200706111809.51070.jackm@dev.mellanox.co.il>

Prevents the following error:
caller posts a 2-wqe list, with the second wqe in the list being an
INLINE which is too long.

In this case, post_send goes to "out" with: nreq = 1, inl positive, and size in
the range allowing blueflame. All the blueflame test conditions are met.
However, the cntl pointer now points to the invalid wqe, and this
will be "blueflamed".

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

diff --git a/src/qp.c b/src/qp.c
index 92edec6..7df3311 100644
--- a/src/qp.c
+++ b/src/qp.c
@@ -236,6 +236,7 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
 					inl += len;
 
 					if (inl > qp->max_inline_data) {
+						inl = 0;
 						ret = -1;
 						*bad_wr = wr;
 						goto out;


From halr at voltaire.com  Mon Jun 11 08:10:18 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 11 Jun 2007 11:10:18 -0400
Subject: [ofa-general] Re: [PATCH] osm: TRIVIAL bug fix
In-Reply-To: <466D5A71.2040301@dev.mellanox.co.il>
References: <466D5A71.2040301@dev.mellanox.co.il>
Message-ID: <1181574616.8896.82260.camel@hal.voltaire.com>

Hi Yevgeny,

On Mon, 2007-06-11 at 10:21, Yevgeny Kliteynik wrote:
> Hi Hal,
> 
> Fixing a small bug that was "inherited" when moved code that 
> reads guid file from osm_ucast_updn.c to osm_ucast_mgr.c - 
> closing file descriptor when finished reading the guid file.
> 
> -- Yevgeny
> 
> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Thanks. Applied without the initialization of *guid_file to NULL as
discussed.

-- Hal

> ---
>  opensm/opensm/osm_ucast_mgr.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
> index b080f59..d855683 100644
> --- a/opensm/opensm/osm_ucast_mgr.c
> +++ b/opensm/opensm/osm_ucast_mgr.c
> @@ -1052,7 +1052,7 @@ osm_ucast_mgr_read_guid_file(
>    IN  cl_list_t       * p_list )
>  {
>    cl_status_t   status = IB_SUCCESS;
> -  FILE        * guid_file;
> +  FILE        * guid_file = NULL;
>    char          line[MAX_GUID_FILE_LINE_LENGTH];
>    char        * endptr;
>    uint64_t    * p_guid;
> @@ -1112,6 +1112,8 @@ osm_ucast_mgr_read_guid_file(
>    }
>  
>  Exit :
> +  if (guid_file)
> +    fclose(guid_file);
>    OSM_LOG_EXIT( p_mgr->p_log );
>    return (status);
>  }


From halr at voltaire.com  Mon Jun 11 08:12:24 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 11 Jun 2007 11:12:24 -0400
Subject: [ofa-general] Re: [PATCH] opensm: clean unused
	OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED
In-Reply-To: <20070610223301.GC23029@sashak.voltaire.com>
References: <20070610223301.GC23029@sashak.voltaire.com>
Message-ID: <1181574618.8896.82262.camel@hal.voltaire.com>

On Sun, 2007-06-10 at 18:33, Sasha Khapyorsky wrote:
> This removes unused OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED sm signal
> enum value.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Thanks. Applied.

-- Hal


From tziporet at dev.mellanox.co.il  Mon Jun 11 09:28:57 2007
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Mon, 11 Jun 2007 19:28:57 +0300
Subject: [ofa-general] Re: [ewg] OFED teleconference today - meeting summary
In-Reply-To: <466D667B.8060605@mellanox.co.il>
References: <28F7CA62-7C0E-4B03-A8BC-5AC40C32DC35@cisco.com>
	<466D667B.8060605@mellanox.co.il>
Message-ID: <466D7849.1070707@mellanox.co.il>


> Agenda for the meeting today:
> - Review open bugs and decide on the release
> 567 	blocker 	rolandd at cisco.com 	RHEL5 ppc64 UD verbs failures
> 577 	critical 	ishai at mellanox.co.il 	SRP multipath failover too slow 
> (minutes, not seconds)
> 629 	major 	monis at voltaire.com 	ib-bonding: sometimes slow failover is 
> noticed
> 541 	major 	mst at mellanox.co.il 	slow failover with IPoIB CM 
> bonding/ipoibtools HA
> 642 	major 	pasha at mellanox.co.il 	Failed to build mvapich with PGI 
> compiler
>
>
>
> My suggestion wait only for Bonding and MPI fixes and have RC5 done on 
> Wed.
> This RC5 should become the official release
>
>

In the meeting today we decided the following:
For RC5 we will fix only 2 more issues:
629 - new bonding module is already ready
642 - got approval from OSU so we will enhance MPI to support PGI compiler
558 - Scott should find with Roland if there is a fix for tvflush for 
SLES10 SP1 and if its fixed we can take this one too.

RC5 will be published on Wed June 13, and it is targeted to become the 
GA release.
GA release will be published after a week of QA - target date is June 20.

Tziporet

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070611/8544282b/attachment.html>

From sweitzen at cisco.com  Mon Jun 11 09:30:21 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Mon, 11 Jun 2007 09:30:21 -0700
Subject: [ofa-general] Re: [ewg] OFED teleconference today - meeting
	summary
In-Reply-To: <466D7849.1070707@mellanox.co.il>
References: <28F7CA62-7C0E-4B03-A8BC-5AC40C32DC35@cisco.com><466D667B.8060605@mellanox.co.il>
	<466D7849.1070707@mellanox.co.il>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA303AA5C4F@xmb-sjc-216.amer.cisco.com>

I'm not touching tvflush! :-)


________________________________

	From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of Tziporet
Koren
	Sent: Monday, June 11, 2007 9:29 AM
	To: Tziporet Koren; EWG; OpenFabrics General
	Subject: [ofa-general] Re: [ewg] OFED teleconference today -
meeting summary
	
	
		Agenda for the meeting today:
		- Review open bugs and decide on the release
		
567	 blocker	 rolandd at cisco.com	 RHEL5 ppc64 UD verbs
failures	
577	 critical	 ishai at mellanox.co.il	 SRP multipath failover
too slow (minutes, not seconds)	
629	 major	 monis at voltaire.com	 ib-bonding: sometimes slow
failover is noticed	
541	 major	 mst at mellanox.co.il	 slow failover with IPoIB CM
bonding/ipoibtools HA	
642	 major	 pasha at mellanox.co.il	 Failed to build mvapich with
PGI compiler	


		My suggestion wait only for Bonding and MPI fixes and
have RC5 done on Wed.
		This RC5 should become the official release
		
		
	In the meeting today we decided the following:
	For RC5 we will fix only 2 more issues:
	629 - new bonding module is already ready
	642 - got approval from OSU so we will enhance MPI to support
PGI compiler
	558 - Scott should find with Roland if there is a fix for
tvflush for SLES10 SP1 and if its fixed we can take this one too.
	
	RC5 will be published on Wed June 13, and it is targeted to
become the GA release.
	GA release will be published after a week of QA - target date is
June 20.
	
	Tziporet
	
	
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070611/88a0f809/attachment.html>

From tziporet at mellanox.co.il  Mon Jun 11 09:34:39 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Mon, 11 Jun 2007 19:34:39 +0300
Subject: [ofa-general] Re: [ewg] OFED teleconference today - meeting
	summary
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA303AA5C4F@xmb-sjc-216.amer.cisco.com>
References: <28F7CA62-7C0E-4B03-A8BC-5AC40C32DC35@cisco.com><466D667B.8060605@mellanox.co.il>
	<466D7849.1070707@mellanox.co.il>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303AA5C4F@xmb-sjc-216.amer.cisco.com>
Message-ID: <6C2C79E72C305246B504CBA17B5500C9015635D5@mtlexch01.mtl.com>

Vlad,
please disable tvflush on SLES10 SP1
 
Thanks,
Tziporet

________________________________

From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] 
Sent: Monday, June 11, 2007 7:30 PM
To: Tziporet Koren; Tziporet Koren; EWG; OpenFabrics General
Subject: RE: [ofa-general] Re: [ewg] OFED teleconference today - meeting
summary


I'm not touching tvflush! :-)


________________________________

	From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of Tziporet
Koren
	Sent: Monday, June 11, 2007 9:29 AM
	To: Tziporet Koren; EWG; OpenFabrics General
	Subject: [ofa-general] Re: [ewg] OFED teleconference today -
meeting summary
	
	
		Agenda for the meeting today:
		- Review open bugs and decide on the release
		
567	 blocker	 rolandd at cisco.com	 RHEL5 ppc64 UD verbs
failures	
577	 critical	 ishai at mellanox.co.il	 SRP multipath failover
too slow (minutes, not seconds)	
629	 major	 monis at voltaire.com	 ib-bonding: sometimes slow
failover is noticed	
541	 major	 mst at mellanox.co.il	 slow failover with IPoIB CM
bonding/ipoibtools HA	
642	 major	 pasha at mellanox.co.il	 Failed to build mvapich with
PGI compiler	


		My suggestion wait only for Bonding and MPI fixes and
have RC5 done on Wed.
		This RC5 should become the official release
		
		
	In the meeting today we decided the following:
	For RC5 we will fix only 2 more issues:
	629 - new bonding module is already ready
	642 - got approval from OSU so we will enhance MPI to support
PGI compiler
	558 - Scott should find with Roland if there is a fix for
tvflush for SLES10 SP1 and if its fixed we can take this one too.
	
	RC5 will be published on Wed June 13, and it is targeted to
become the GA release.
	GA release will be published after a week of QA - target date is
June 20.
	
	Tziporet
	
	
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070611/036d9555/attachment.html>

From sweitzen at cisco.com  Mon Jun 11 09:35:30 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Mon, 11 Jun 2007 09:35:30 -0700
Subject: [ofa-general] Re: [ewg] OFED teleconference today - meeting
	summary
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9015635D5@mtlexch01.mtl.com>
References: <28F7CA62-7C0E-4B03-A8BC-5AC40C32DC35@cisco.com><466D667B.8060605@mellanox.co.il>
	<466D7849.1070707@mellanox.co.il>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303AA5C4F@xmb-sjc-216.amer.cisco.com>
	<6C2C79E72C305246B504CBA17B5500C9015635D5@mtlexch01.mtl.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA303AA5C55@xmb-sjc-216.amer.cisco.com>

You missed the joke, it's *tvflash* not *tvflush*.
 
I have asked Roland about tvflash.
 
Scott


________________________________

	From: Tziporet Koren [mailto:tziporet at mellanox.co.il] 
	Sent: Monday, June 11, 2007 9:35 AM
	To: Scott Weitzenkamp (sweitzen); Tziporet Koren; EWG;
OpenFabrics General
	Subject: RE: [ofa-general] Re: [ewg] OFED teleconference today -
meeting summary
	
	
	Vlad,
	please disable tvflush on SLES10 SP1
	 
	Thanks,
	Tziporet

________________________________

	From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] 
	Sent: Monday, June 11, 2007 7:30 PM
	To: Tziporet Koren; Tziporet Koren; EWG; OpenFabrics General
	Subject: RE: [ofa-general] Re: [ewg] OFED teleconference today -
meeting summary
	
	
	I'm not touching tvflush! :-)


________________________________

		From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of Tziporet
Koren
		Sent: Monday, June 11, 2007 9:29 AM
		To: Tziporet Koren; EWG; OpenFabrics General
		Subject: [ofa-general] Re: [ewg] OFED teleconference
today - meeting summary
		
		
			Agenda for the meeting today:
			- Review open bugs and decide on the release
			
567	 blocker	 rolandd at cisco.com	 RHEL5 ppc64 UD verbs
failures	
577	 critical	 ishai at mellanox.co.il	 SRP multipath failover
too slow (minutes, not seconds)	
629	 major	 monis at voltaire.com	 ib-bonding: sometimes slow
failover is noticed	
541	 major	 mst at mellanox.co.il	 slow failover with IPoIB CM
bonding/ipoibtools HA	
642	 major	 pasha at mellanox.co.il	 Failed to build mvapich with
PGI compiler	


			My suggestion wait only for Bonding and MPI
fixes and have RC5 done on Wed.
			This RC5 should become the official release
			
			
		In the meeting today we decided the following:
		For RC5 we will fix only 2 more issues:
		629 - new bonding module is already ready
		642 - got approval from OSU so we will enhance MPI to
support PGI compiler
		558 - Scott should find with Roland if there is a fix
for tvflush for SLES10 SP1 and if its fixed we can take this one too.
		
		RC5 will be published on Wed June 13, and it is targeted
to become the GA release.
		GA release will be published after a week of QA - target
date is June 20.
		
		Tziporet
		
		
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070611/b12195b8/attachment.html>

From jsquyres at cisco.com  Mon Jun 11 09:52:40 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Mon, 11 Jun 2007 12:52:40 -0400
Subject: [ofa-general] New OMPI / MPI_READ release notes patch
Message-ID: <F4B4CFF9-3A0C-4C75-90DD-0566F174599D@cisco.com>

Tziporet --

Here's a new patch for the OMPI release notes based on your current  
git.  It includes updated information for Open MPI and text about mpi- 
selector.

Note that there are a few areas in MPI_README that I need OSU and  
Mellanox to proofread.  It would also be nice if someone else could  
eyeball the mpi-selector text and ensure it makes sense to a naive  
reader.

-- 
Jeff Squyres
Cisco Systems

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ofed-1.2-mpi-docs.patch
Type: application/octet-stream
Size: 17419 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070611/5f38c00c/attachment.obj>
-------------- next part --------------


From pradeeps at linux.vnet.ibm.com  Mon Jun 11 11:08:47 2007
From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana)
Date: Mon, 11 Jun 2007 11:08:47 -0700
Subject: [ofa-general] Re: IPOIB CM (NOSRQ) extension
In-Reply-To: <20070610044146.GA4959@mellanox.co.il>
References: <46687642.8040208@linux.vnet.ibm.com>
	<20070610044146.GA4959@mellanox.co.il>
Message-ID: <466D8FAF.5090800@linux.vnet.ibm.com>

Michael S. Tsirkin wrote:
>> Quoting Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>:
>> Subject: IPOIB CM (NOSRQ) extension
>>
>> This patch handles the corner case of running out of RC QPs. In that
>> case it switches to UD mode. This patch can be used both by NOSRQ and
>> SRQ code.
>>
>> Signed-off-by: Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>
> 
> You don't provide any way to retry going back to connected mode,
> after a failure, which is really intermittent by nature. That's pretty bad.

This node switched to datagram mode, because the passive side was
under a resource crunch (no RC QPs). And, the user is indeed alerted
about this condition. So, yes we do not attempt to go back to connected
mode.

> 
>> ---
>>
>> --- c/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
>> 2007-06-07 11:13:55.000000000 -0400
>> +++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
>> 2007-06-07 11:11:21.000000000 -0400
>> @@ -1383,6 +1383,11 @@ static int ipoib_cm_tx_handler(struct ib
>>  		break;
>>  	case IB_CM_REQ_ERROR:
>>  	case IB_CM_REJ_RECEIVED:
>> +		ipoib_warn(priv, "REJ received\n");
>> +		neigh = tx->neigh;
>> +		if (neigh)
>> +			clear_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags);
>> +		break;
>>  	case IB_CM_TIMEWAIT_EXIT:
>>  		ipoib_dbg(priv, "CM error %d.\n", event->event);
>>  		spin_lock_irq(&priv->tx_lock);
> 
> This has an effect of dropping down to datagram mode
> on errors such as CM timeout, or a reject due to stale connection.
> I think this is a wrong thing to do.

I can make this conditional upon there being no RC QPs. Will code that
up in the next patch.

Pradeep


From mst at dev.mellanox.co.il  Mon Jun 11 11:18:04 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 11 Jun 2007 21:18:04 +0300
Subject: [ofa-general] Re: IPOIB CM (NOSRQ) extension
In-Reply-To: <466D8FAF.5090800@linux.vnet.ibm.com>
References: <46687642.8040208@linux.vnet.ibm.com>
	<20070610044146.GA4959@mellanox.co.il>
	<466D8FAF.5090800@linux.vnet.ibm.com>
Message-ID: <20070611181804.GE6470@mellanox.co.il>

> Quoting Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>:
> Subject: Re: IPOIB CM (NOSRQ) extension
> 
> Michael S. Tsirkin wrote:
> >>Quoting Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>:
> >>Subject: IPOIB CM (NOSRQ) extension
> >>
> >>This patch handles the corner case of running out of RC QPs. In that
> >>case it switches to UD mode. This patch can be used both by NOSRQ and
> >>SRQ code.
> >>
> >>Signed-off-by: Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>
> >
> >You don't provide any way to retry going back to connected mode,
> >after a failure, which is really intermittent by nature. That's pretty bad.
> 
> This node switched to datagram mode, because the passive side was
> under a resource crunch (no RC QPs). And, the user is indeed alerted
> about this condition. So, yes we do not attempt to go back to connected
> mode.

Need to retry switching to datagram mode after a while.

-- 
MST


From mst at dev.mellanox.co.il  Mon Jun 11 11:18:49 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 11 Jun 2007 21:18:49 +0300
Subject: [ofa-general] Re: IPOIB CM (NOSRQ) extension
In-Reply-To: <20070611181804.GE6470@mellanox.co.il>
References: <46687642.8040208@linux.vnet.ibm.com>
	<20070610044146.GA4959@mellanox.co.il>
	<466D8FAF.5090800@linux.vnet.ibm.com>
	<20070611181804.GE6470@mellanox.co.il>
Message-ID: <20070611181849.GF6470@mellanox.co.il>

> Quoting Michael S. Tsirkin <mst at dev.mellanox.co.il>:
> Subject: Re: IPOIB CM (NOSRQ) extension
> 
> > Quoting Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>:
> > Subject: Re: IPOIB CM (NOSRQ) extension
> > 
> > Michael S. Tsirkin wrote:
> > >>Quoting Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>:
> > >>Subject: IPOIB CM (NOSRQ) extension
> > >>
> > >>This patch handles the corner case of running out of RC QPs. In that
> > >>case it switches to UD mode. This patch can be used both by NOSRQ and
> > >>SRQ code.
> > >>
> > >>Signed-off-by: Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>
> > >
> > >You don't provide any way to retry going back to connected mode,
> > >after a failure, which is really intermittent by nature. That's pretty bad.
> > 
> > This node switched to datagram mode, because the passive side was
> > under a resource crunch (no RC QPs). And, the user is indeed alerted
> > about this condition. So, yes we do not attempt to go back to connected
> > mode.
> 
> Need to retry switching to datagram mode after a while.

Sorry, that should have been "switching to connected mode".

-- 
MST


From halr at voltaire.com  Mon Jun 11 11:40:17 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 11 Jun 2007 14:40:17 -0400
Subject: [ofa-general] Re: [PATCH] opensm: remove unused state_step_mode
In-Reply-To: <20070610223159.GB23029@sashak.voltaire.com>
References: <20070610223159.GB23029@sashak.voltaire.com>
Message-ID: <1181587216.8896.95583.camel@hal.voltaire.com>

On Sun, 2007-06-10 at 18:31, Sasha Khapyorsky wrote:
> This removes unused state_step_mode and associated flow from
> osm_state_mgr_process().
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
> ---
>  opensm/include/opensm/osm_base.h      |   29 ------------
>  opensm/include/opensm/osm_state_mgr.h |    2 -
>  opensm/opensm/osm_state_mgr.c         |   81 +++------------------------------

Thanks. Applied.

-- Hal


From pradeeps at linux.vnet.ibm.com  Mon Jun 11 11:44:37 2007
From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana)
Date: Mon, 11 Jun 2007 11:44:37 -0700
Subject: [ofa-general] Re: IPOIB CM (NOSRQ)[PATCH V5] patch
In-Reply-To: <20070610044945.GB4959@mellanox.co.il>
References: <46687636.5050101@linux.vnet.ibm.com>
	<20070610044945.GB4959@mellanox.co.il>
Message-ID: <466D9815.7030009@linux.vnet.ibm.com>

Michael S. Tsirkin wrote:
>> Quoting Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>:
>> Subject: IPOIB CM (NOSRQ)[PATCH V5] patch
>>
>> Here is a fifth version of the IPOIB_CM_NOSRQ patch. This patch will
>> benefit adapters that do not support shared receive queues.
>>
>> This patch incorporates the following review comments and subsequent
>> discussions on this mailing list from v4:
>>
>> 1. Reduce the number of if(srq) tests in the packet receive path
> 
> I could still count at least 2 of these, and I don't see why there can't be just 1,
> or even 0 if the QP pool is hidden under the SRQ interface.

Yes, there are 2 of these now. Previously, only ipoib_poll() needed
to be altered to incorporate this. Now I would have to add 
ipoib_drain_cq() as well.

As previously mentioned we do need to keep in mind the maintainability
aspects and the way it is, all the changes are well contained.

Isn't it time that we should stop quibbling about one extra if(srq)? If
you are so inclined you can submit a patch on top of this one. We can
then debate the merits of that patch and make an appropriate decision.

> 
>> +int current_rc_qp = 0; /* Active RC QPs for NOSRQ */
>>  #define IPOIB_CM_IETF_ID 0x1000000000000000ULL
>>
>>  #define IPOIB_CM_RX_UPDATE_TIME (256 * HZ)
> 
> I don't see any locking for current_rc_qp, which looks wrong.

Yes, I will correct that.

Pradeep


From vu at mellanox.com  Mon Jun 11 12:13:01 2007
From: vu at mellanox.com (Vu Pham)
Date: Mon, 11 Jun 2007 12:13:01 -0700
Subject: [ofa-general] OFED 1.x (Gen 2) based SRP target code released!
In-Reply-To: <465AD2D1.2070100@voltaire.com>
References: <9FA59C95FFCBB34EA5E42C1A8573784F6F91AB@mtiexch01.mti.com>
	<465AD2D1.2070100@voltaire.com>
Message-ID: <466D9EBD.3090809@mellanox.com>

Erez Zilber wrote:
> Sujal Das wrote:
>   
>> Hello all,
>>
>>  
>>
>> Mellanox is pleased to release the OFED 1.x (Gen 2) - based SRP Target
>> source code to the OpenFabrics community, OEMs and end users.  
>>
>>  
>>
>> This release is an upgrade to the previously released SRP Target source
>> code that was based on the Mellanox IBGold driver and Gen 1 software
>> interface.  The code has been tested to work with Mellanox InfiniBand
>> adapters and is available under Open Fabrics open source license terms.
>>
>>     
> I'm trying to build srpt according to the instructions, but it does not get built at all. Here's what I did:
>
> tar xzf OFED-1.2-rc3.tgz
> cd OFED-1.2-rc3/SRPMS
> rpm2cpio ofa_kernel-1.2-rc3.src.rpm |cpio -i
> tar xzf ofa_kernel-1.2.tgz
> cd ofa_kernel-1.2
> patch -p1 < ~/srpt_inc/add_srpt_01.patch
> patch -p1 < ~/srpt_inc/add_srpt_03.patch
>   

You forget to
patch -p1 < ~/srpt_inc/add_srpt_04.patch

-vu

> cp -r ~/srpt drivers/infiniband/ulp/srpt
> ./configure --with-core-mod --with-ipoib-mod --with-srp-target-mod --with-mthca-mod
>   


From rdreier at cisco.com  Mon Jun 11 14:45:20 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 11 Jun 2007 14:45:20 -0700
Subject: [ofa-general] Re: [PATCH} libmlx4: poll cq tail pointer
In-Reply-To: <1181554019.12020.3.camel@mtls03> (Eli Cohen's message of "Mon,
	11 Jun 2007 12:26:59 +0300")
References: <1181554019.12020.3.camel@mtls03>
Message-ID: <adawsya9ndb.fsf@cisco.com>

thanks, applied, and also the following for the kernel:

commit a4668873c7271c4b6c540d6c4e8891c24e8a1736
Author: Roland Dreier <rolandd at cisco.com>
Date:   Mon Jun 11 14:44:42 2007 -0700

    IB/mlx4: Fix handling of wq->tail for send completions
    
    Cast the increment added to wq->tail when send completions are
    processed to u16 to avoid using wrong values caused by standard
    integer promotions.
    
    The same bug was fixed in libmlx4 by Eli Cohen <eli at mellanox.co.il>.
    
    Signed-off-by: Roland Dreier <rolandd at cisco.com>

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index b2a290c..048c527 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -354,7 +354,7 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq,
 	if (is_send) {
 		wq = &(*cur_qp)->sq;
 		wqe_ctr = be16_to_cpu(cqe->wqe_index);
-		wq->tail += wqe_ctr - (u16) wq->tail;
+		wq->tail += (u16) (wqe_ctr - (u16) wq->tail);
 		wc->wr_id = wq->wrid[wq->tail & (wq->max - 1)];
 		++wq->tail;
 	} else if ((*cur_qp)->ibqp.srq) {


From rdreier at cisco.com  Mon Jun 11 14:48:41 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 11 Jun 2007 14:48:41 -0700
Subject: [ofa-general] Re: [PATCH] libmlx4: fix problem in post_send error
	flow (inline wqes)
In-Reply-To: <200706111809.51070.jackm@dev.mellanox.co.il> (Jack Morgenstein's
	message of "Mon, 11 Jun 2007 18:09:50 +0300")
References: <200706111809.51070.jackm@dev.mellanox.co.il>
Message-ID: <adasl8y9n7q.fsf@cisco.com>

thanks, applied.


From rdreier at cisco.com  Mon Jun 11 15:24:06 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 11 Jun 2007 15:24:06 -0700
Subject: [ofa-general] Re: [PATCH] net/mlx4: include linux/mutex.h from
	mlx4.h
In-Reply-To: <20070611060942.GE1454@mellanox.co.il> (Michael S. Tsirkin's
	message of "Mon, 11 Jun 2007 09:09:42 +0300")
References: <20070611060942.GE1454@mellanox.co.il>
Message-ID: <adafy4y9lkp.fsf@cisco.com>

thanks, applied both mutex patches to for-2.6.23


From rdreier at cisco.com  Mon Jun 11 15:30:19 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 11 Jun 2007 15:30:19 -0700
Subject: [ofa-general] [RESEND #2] [GIT PULL] please pull infiniband.git
In-Reply-To: <ada1wgmcyqn.fsf@cisco.com> (Roland Dreier's message of "Fri,
	08 Jun 2007 07:22:24 -0700")
References: <ada1wgmcyqn.fsf@cisco.com>
Message-ID: <adabqfm9lac.fsf@cisco.com>

[Sorry to keep bugging you but I haven't seen this pulled and you
haven't told me that something is wrong with these patches...  is this
getting lost in your queue or are you dropping it intentionally?]

Linus, please pull from

    master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This will get a bunch of fixes to the new mlx4 driver, and one fix for
port assignment by the RDMA CM:

Eli Cohen (1):
      mlx4_core: Fix CQ context layout

Jack Morgenstein (2):
      mlx4_core: Don't set MTT address in dMPT entries with PA set
      IB/mlx4: Fix zeroing of rnr_retry value in ib_modify_qp()

Roland Dreier (5):
      mlx4_core: Initialize ctx_list and ctx_lock earlier
      mlx4_core: Free catastrophic error MSI-X interrupt with correct dev_id
      IB/mthca, mlx4_core: Fix typo in comment
      mlx4_core: Check firmware command interface revision
      IB/mlx4: Make sure RQ allocation is always valid

Sean Hefty (1):
      RDMA/cma: Fix initialization of next_port

 drivers/infiniband/core/cma.c           |    4 +-
 drivers/infiniband/hw/mlx4/qp.c         |   33 ++++++++++++++++++++----------
 drivers/infiniband/hw/mthca/mthca_cmd.c |    2 +-
 drivers/net/mlx4/cq.c                   |    2 +-
 drivers/net/mlx4/eq.c                   |    4 ++-
 drivers/net/mlx4/fw.c                   |   27 ++++++++++++++++++++++--
 drivers/net/mlx4/intf.c                 |    3 --
 drivers/net/mlx4/main.c                 |    2 +
 drivers/net/mlx4/mr.c                   |    8 ++++--
 9 files changed, 60 insertions(+), 25 deletions(-)


diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 2eb52b7..32a0e66 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2773,8 +2773,8 @@ static int cma_init(void)
 	int ret;
 
 	get_random_bytes(&next_port, sizeof next_port);
-	next_port = (next_port % (sysctl_local_port_range[1] -
-				  sysctl_local_port_range[0])) +
+	next_port = ((unsigned int) next_port %
+		    (sysctl_local_port_range[1] - sysctl_local_port_range[0])) +
 		    sysctl_local_port_range[0];
 	cma_wq = create_singlethread_workqueue("rdma_cm");
 	if (!cma_wq)
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index dc137de..5c6d054 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -189,18 +189,28 @@ static int send_wqe_overhead(enum ib_qp_type type)
 }
 
 static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
-		       struct mlx4_ib_qp *qp)
+		       int is_user, int has_srq, struct mlx4_ib_qp *qp)
 {
 	/* Sanity check RQ size before proceeding */
 	if (cap->max_recv_wr  > dev->dev->caps.max_wqes  ||
 	    cap->max_recv_sge > dev->dev->caps.max_rq_sg)
 		return -EINVAL;
 
-	qp->rq.max = cap->max_recv_wr ? roundup_pow_of_two(cap->max_recv_wr) : 0;
+	if (has_srq) {
+		/* QPs attached to an SRQ should have no RQ */
+		if (cap->max_recv_wr)
+			return -EINVAL;
+
+		qp->rq.max = qp->rq.max_gs = 0;
+	} else {
+		/* HW requires >= 1 RQ entry with >= 1 gather entry */
+		if (is_user && (!cap->max_recv_wr || !cap->max_recv_sge))
+			return -EINVAL;
 
-	qp->rq.wqe_shift = ilog2(roundup_pow_of_two(cap->max_recv_sge *
-						    sizeof (struct mlx4_wqe_data_seg)));
-	qp->rq.max_gs    = (1 << qp->rq.wqe_shift) / sizeof (struct mlx4_wqe_data_seg);
+		qp->rq.max	 = roundup_pow_of_two(max(1, cap->max_recv_wr));
+		qp->rq.max_gs	 = roundup_pow_of_two(max(1, cap->max_recv_sge));
+		qp->rq.wqe_shift = ilog2(qp->rq.max_gs * sizeof (struct mlx4_wqe_data_seg));
+	}
 
 	cap->max_recv_wr  = qp->rq.max;
 	cap->max_recv_sge = qp->rq.max_gs;
@@ -285,7 +295,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 	qp->sq.head	    = 0;
 	qp->sq.tail	    = 0;
 
-	err = set_rq_size(dev, &init_attr->cap, qp);
+	err = set_rq_size(dev, &init_attr->cap, !!pd->uobject, !!init_attr->srq, qp);
 	if (err)
 		goto err;
 
@@ -762,11 +772,6 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 		optpar |= MLX4_QP_OPTPAR_PKEY_INDEX;
 	}
 
-	if (attr_mask & IB_QP_RNR_RETRY) {
-		context->params1 |= cpu_to_be32(attr->rnr_retry << 13);
-		optpar |= MLX4_QP_OPTPAR_RNR_RETRY;
-	}
-
 	if (attr_mask & IB_QP_AV) {
 		if (mlx4_set_path(dev, &attr->ah_attr, &context->pri_path,
 				  attr_mask & IB_QP_PORT ? attr->port_num : qp->port)) {
@@ -802,6 +807,12 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 
 	context->pd	    = cpu_to_be32(to_mpd(ibqp->pd)->pdn);
 	context->params1    = cpu_to_be32(MLX4_IB_ACK_REQ_FREQ << 28);
+
+	if (attr_mask & IB_QP_RNR_RETRY) {
+		context->params1 |= cpu_to_be32(attr->rnr_retry << 13);
+		optpar |= MLX4_QP_OPTPAR_RNR_RETRY;
+	}
+
 	if (attr_mask & IB_QP_RETRY_CNT) {
 		context->params1 |= cpu_to_be32(attr->retry_cnt << 16);
 		optpar |= MLX4_QP_OPTPAR_RETRY_COUNT;
diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index 3810252..f40558d 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -772,7 +772,7 @@ int mthca_QUERY_FW(struct mthca_dev *dev, u8 *status)
 
 	MTHCA_GET(dev->fw_ver,   outbox, QUERY_FW_VER_OFFSET);
 	/*
-	 * FW subminor version is at more signifant bits than minor
+	 * FW subminor version is at more significant bits than minor
 	 * version, so swap here.
 	 */
 	dev->fw_ver = (dev->fw_ver & 0xffff00000000ull) |
diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c
index 437d78a..39253d0 100644
--- a/drivers/net/mlx4/cq.c
+++ b/drivers/net/mlx4/cq.c
@@ -61,7 +61,7 @@ struct mlx4_cq_context {
 	__be32			solicit_producer_index;
 	__be32			consumer_index;
 	__be32			producer_index;
-	u8			reserved6[2];
+	u32			reserved6[2];
 	__be64			db_rec_addr;
 };
 
diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c
index 0f11adb..27a82ce 100644
--- a/drivers/net/mlx4/eq.c
+++ b/drivers/net/mlx4/eq.c
@@ -490,9 +490,11 @@ static void mlx4_free_irqs(struct mlx4_dev *dev)
 
 	if (eq_table->have_irq)
 		free_irq(dev->pdev->irq, dev);
-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < MLX4_EQ_CATAS; ++i)
 		if (eq_table->eq[i].have_irq)
 			free_irq(eq_table->eq[i].irq, eq_table->eq + i);
+	if (eq_table->eq[MLX4_EQ_CATAS].have_irq)
+		free_irq(eq_table->eq[MLX4_EQ_CATAS].irq, dev);
 }
 
 static int __devinit mlx4_map_clr_int(struct mlx4_dev *dev)
diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
index cfa5cc0..e7ca118 100644
--- a/drivers/net/mlx4/fw.c
+++ b/drivers/net/mlx4/fw.c
@@ -37,6 +37,10 @@
 #include "fw.h"
 #include "icm.h"
 
+enum {
+	MLX4_COMMAND_INTERFACE_REV	= 1
+};
+
 extern void __buggy_use_of_MLX4_GET(void);
 extern void __buggy_use_of_MLX4_PUT(void);
 
@@ -452,10 +456,12 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev)
 	u32 *outbox;
 	int err = 0;
 	u64 fw_ver;
+	u16 cmd_if_rev;
 	u8 lg;
 
 #define QUERY_FW_OUT_SIZE             0x100
 #define QUERY_FW_VER_OFFSET            0x00
+#define QUERY_FW_CMD_IF_REV_OFFSET     0x0a
 #define QUERY_FW_MAX_CMD_OFFSET        0x0f
 #define QUERY_FW_ERR_START_OFFSET      0x30
 #define QUERY_FW_ERR_SIZE_OFFSET       0x38
@@ -477,21 +483,36 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev)
 
 	MLX4_GET(fw_ver, outbox, QUERY_FW_VER_OFFSET);
 	/*
-	 * FW subminor version is at more signifant bits than minor
+	 * FW subminor version is at more significant bits than minor
 	 * version, so swap here.
 	 */
 	dev->caps.fw_ver = (fw_ver & 0xffff00000000ull) |
 		((fw_ver & 0xffff0000ull) >> 16) |
 		((fw_ver & 0x0000ffffull) << 16);
 
+	MLX4_GET(cmd_if_rev, outbox, QUERY_FW_CMD_IF_REV_OFFSET);
+	if (cmd_if_rev != MLX4_COMMAND_INTERFACE_REV) {
+		mlx4_err(dev, "Installed FW has unsupported "
+			 "command interface revision %d.\n",
+			 cmd_if_rev);
+		mlx4_err(dev, "(Installed FW version is %d.%d.%03d)\n",
+			 (int) (dev->caps.fw_ver >> 32),
+			 (int) (dev->caps.fw_ver >> 16) & 0xffff,
+			 (int) dev->caps.fw_ver & 0xffff);
+		mlx4_err(dev, "This driver version supports only revision %d.\n",
+			 MLX4_COMMAND_INTERFACE_REV);
+		err = -ENODEV;
+		goto out;
+	}
+
 	MLX4_GET(lg, outbox, QUERY_FW_MAX_CMD_OFFSET);
 	cmd->max_cmds = 1 << lg;
 
-	mlx4_dbg(dev, "FW version %d.%d.%03d, max commands %d\n",
+	mlx4_dbg(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n",
 		 (int) (dev->caps.fw_ver >> 32),
 		 (int) (dev->caps.fw_ver >> 16) & 0xffff,
 		 (int) dev->caps.fw_ver & 0xffff,
-		 cmd->max_cmds);
+		 cmd_if_rev, cmd->max_cmds);
 
 	MLX4_GET(fw->catas_offset, outbox, QUERY_FW_ERR_START_OFFSET);
 	MLX4_GET(fw->catas_size,   outbox, QUERY_FW_ERR_SIZE_OFFSET);
diff --git a/drivers/net/mlx4/intf.c b/drivers/net/mlx4/intf.c
index 65854f9..9ae951b 100644
--- a/drivers/net/mlx4/intf.c
+++ b/drivers/net/mlx4/intf.c
@@ -135,9 +135,6 @@ int mlx4_register_device(struct mlx4_dev *dev)
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_interface *intf;
 
-	INIT_LIST_HEAD(&priv->ctx_list);
-	spin_lock_init(&priv->ctx_lock);
-
 	mutex_lock(&intf_mutex);
 
 	list_add_tail(&priv->dev_list, &dev_list);
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 20b8c0d..d417293 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -787,6 +787,8 @@ static int __devinit mlx4_init_one(struct pci_dev *pdev,
 
 	dev       = &priv->dev;
 	dev->pdev = pdev;
+	INIT_LIST_HEAD(&priv->ctx_list);
+	spin_lock_init(&priv->ctx_lock);
 
 	/*
 	 * Now reset the HCA before we touch the PCI capabilities or
diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c
index b33864d..d0808fa 100644
--- a/drivers/net/mlx4/mr.c
+++ b/drivers/net/mlx4/mr.c
@@ -324,15 +324,17 @@ int mlx4_mr_enable(struct mlx4_dev *dev, struct mlx4_mr *mr)
 				       MLX4_MPT_FLAG_MIO	 |
 				       MLX4_MPT_FLAG_REGION	 |
 				       mr->access);
-	if (mr->mtt.order < 0)
-		mpt_entry->flags |= cpu_to_be32(MLX4_MPT_FLAG_PHYSICAL);
 
 	mpt_entry->key	       = cpu_to_be32(key_to_hw_index(mr->key));
 	mpt_entry->pd	       = cpu_to_be32(mr->pd);
 	mpt_entry->start       = cpu_to_be64(mr->iova);
 	mpt_entry->length      = cpu_to_be64(mr->size);
 	mpt_entry->entity_size = cpu_to_be32(mr->mtt.page_shift);
-	mpt_entry->mtt_seg     = cpu_to_be64(mlx4_mtt_addr(dev, &mr->mtt));
+	if (mr->mtt.order < 0) {
+		mpt_entry->flags |= cpu_to_be32(MLX4_MPT_FLAG_PHYSICAL);
+		mpt_entry->mtt_seg = 0;
+	} else
+		mpt_entry->mtt_seg = cpu_to_be64(mlx4_mtt_addr(dev, &mr->mtt));
 
 	err = mlx4_SW2HW_MPT(dev, mailbox,
 			     key_to_hw_index(mr->key) & (dev->caps.num_mpts - 1));


From halr at voltaire.com  Mon Jun 11 15:59:53 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 11 Jun 2007 18:59:53 -0400
Subject: [ofa-general] Re: [PATCH] osm: up/dn ranking - making code more
	intuitive
In-Reply-To: <466D58F2.1020402@dev.mellanox.co.il>
References: <46503064.7010107@dev.mellanox.co.il>
	<20070520161034.GY19271@sashak.voltaire.com>
	<4651557E.2080400@dev.mellanox.co.il>
	<20070524225428.GK837@sashak.voltaire.com>
	<466D58F2.1020402@dev.mellanox.co.il>
Message-ID: <1181602792.5681.1081.camel@hal.voltaire.com>

Hi Yevgeny,

On Mon, 2007-06-11 at 10:15, Yevgeny Kliteynik wrote:
> Hi Hal.
> 
> Following up our discussion with Sasha regarding the ranking 
> optimization in up/dn routing:
> 
> >> I do think that to make the code more "intuitive" we might  
> >> want to remove the __updn_update_rank() and do something like this:
> >>
> >>    if (remote_u->rank > u->rank + 1)
> >>    {
> >>        remote_u->rank = u->rank + 1;
> >>        max_rank = remote_u->rank; 
> >>        cl_qlist_insert_tail(&list, &remote_u->list);
> >>    }
>  
> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Thanks. Applied.

-- Hal


From tziporet at dev.mellanox.co.il  Mon Jun 11 23:47:04 2007
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Tue, 12 Jun 2007 09:47:04 +0300
Subject: [ofa-general] [GIT PULL] OFED 1.2 uDAPL release notes
In-Reply-To: <466718AB.5050507@ichips.intel.com>
References: <466718AB.5050507@ichips.intel.com>
Message-ID: <466E4168.2030206@mellanox.co.il>

Arlin Davis wrote:
> Vlad,  please pull the latest OFED 1.2 release notes from uDAPL 
> project (ofed_1_2 branch)
>
>    dapl/doc/uDAPL_release_notes.txt
>
> Signed-off by: Arlin Davis ardavis at ichips.intel.com
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
>
done
Tziporet


From erezz at voltaire.com  Tue Jun 12 00:27:20 2007
From: erezz at voltaire.com (Erez Zilber)
Date: Tue, 12 Jun 2007 10:27:20 +0300
Subject: [ofa-general] OFED 1.x (Gen 2) based SRP target code released!
In-Reply-To: <466D9EBD.3090809@mellanox.com>
References: <9FA59C95FFCBB34EA5E42C1A8573784F6F91AB@mtiexch01.mti.com>
	<465AD2D1.2070100@voltaire.com> <466D9EBD.3090809@mellanox.com>
Message-ID: <466E4AD8.6090804@voltaire.com>


>>>     
>> I'm trying to build srpt according to the instructions, but it does
>> not get built at all. Here's what I did:
>>
>> tar xzf OFED-1.2-rc3.tgz
>> cd OFED-1.2-rc3/SRPMS
>> rpm2cpio ofa_kernel-1.2-rc3.src.rpm |cpio -i
>> tar xzf ofa_kernel-1.2.tgz
>> cd ofa_kernel-1.2
>> patch -p1 < ~/srpt_inc/add_srpt_01.patch
>> patch -p1 < ~/srpt_inc/add_srpt_03.patch
>>   
>
> You forget to
> patch -p1 < ~/srpt_inc/add_srpt_04.patch
>
> -vu
You may want to add it to the README file (it is not mentioned there).
Is it documented anywhere in openfabrics wiki?

Erez


From tziporet at mellanox.co.il  Tue Jun 12 01:12:58 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Tue, 12 Jun 2007 11:12:58 +0300
Subject: [ofa-general] RE: [ewg] New OMPI / MPI_READ release notes patch
In-Reply-To: <F4B4CFF9-3A0C-4C75-90DD-0566F174599D@cisco.com>
References: <F4B4CFF9-3A0C-4C75-90DD-0566F174599D@cisco.com>
Message-ID: <6C2C79E72C305246B504CBA17B5500C9015635E9@mtlexch01.mtl.com>

Done
Tziporet 

-----Original Message-----
From: ewg-bounces at lists.openfabrics.org
[mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Jeff Squyres
Sent: Monday, June 11, 2007 7:53 PM
To: OpenFabrics General
Cc: OpenFabrics EWG
Subject: [ewg] New OMPI / MPI_READ release notes patch

Tziporet --

Here's a new patch for the OMPI release notes based on your current  
git.  It includes updated information for Open MPI and text about mpi- 
selector.

Note that there are a few areas in MPI_README that I need OSU and  
Mellanox to proofread.  It would also be nice if someone else could  
eyeball the mpi-selector text and ensure it makes sense to a naive  
reader.

-- 
Jeff Squyres
Cisco Systems


From mst at dev.mellanox.co.il  Tue Jun 12 01:41:08 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 12 Jun 2007 11:41:08 +0300
Subject: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
Message-ID: <20070612084108.GK6470@mellanox.co.il>

For whom it may concern,
I have created an ofed git tree updated with kernel bits from 2.6.22-rc4,
and put that up at git://git.openfabrics.org/~mst/ofed_kernel.git

It may be useful to anyone interested in testing 2.6.22-rc4 technology
(such as mlx4) on older kernels, testing SDP with 2.6.22-rc4 bits, etc.
This tree also might (or might not) become a basis for kernel bits
for future ofed kernel releases.

This tree was test-built with ofa cross-build script and builds on as wide the
range of kernels as OFED 1.2 did. No testing was done as yet.

Erez, and other iser maintainers, I had a problem with RHEL4 iscsi backports
(scsi_flush_work isn't exported) I decided that since it isn't
called on older kernels it's reasonably safe to just comment it out,
but would be interested to hear you opinion.
See it in this sub-directory:
kernel_patches/backport/2.6.9_U2/libiscsi_no_flush_to_2_6_9.patch

I went over patches in kernel_patches/fixes/ and tried to remove
only these that were already applied, and update these that weren't.
But I'd like to ask all relevant parties to double-check nothing
that should be there is missing (and, hint hint, maybe think about
pushing the patches upstream).

In particular, there were a ton of ipath patches that it seems were
for the most part applied.
Qlogic maintainers, please help double check that I did not miss something
of value.


-- 
MST


From vlad at lists.openfabrics.org  Tue Jun 12 02:41:41 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Tue, 12 Jun 2007 02:41:41 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070612-0200 daily build status
Message-ID: <20070612094141.E899DE60882@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.15
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.15
Passed on x86_64 with linux-2.6.18
Passed on powerpc with linux-2.6.17
Passed on ia64 with linux-2.6.13
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.14
Passed on ppc64 with linux-2.6.12
Passed on ia64 with linux-2.6.17
Passed on powerpc with linux-2.6.13
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.20
Passed on powerpc with linux-2.6.16
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.13
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.14
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.15
Passed on ppc64 with linux-2.6.15
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on ppc64 with linux-2.6.14
Passed on ppc64 with linux-2.6.13
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.16
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From jsquyres at cisco.com  Tue Jun 12 04:57:03 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Tue, 12 Jun 2007 07:57:03 -0400
Subject: [ofa-general] Re: [ewg] New OMPI / MPI_READ release notes patch
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9015635E9@mtlexch01.mtl.com>
References: <F4B4CFF9-3A0C-4C75-90DD-0566F174599D@cisco.com>
	<6C2C79E72C305246B504CBA17B5500C9015635E9@mtlexch01.mtl.com>
Message-ID: <E4C1D7DA-761A-49FB-BFCF-C43044A35088@cisco.com>

Note that git still shows the following in the ofed_1_2 branch:

Example1: Running the OSU bandwidth:

!!! SOMEONE PLEASE CHECK THESE DIRECTORIES AND EXECUTABLE NAMES
     > cd /usr/mpi/gcc/openmpi-1.2.2-1/tests/osu_benchmarks-2.2
     > mpirun -np <N> -hostfile <HOSTFILE> osu_bw

Example2: Running the Intel MPI Benchmark benchmarks:

!!! SOMEONE PLEASE CHECK THESE DIRECTORIES AND EXECUTABLE NAMES
     > cd /usr/mpi/gcc/openmpi-1.2.2-1/tests/IMB-2.3
     > mpirun -np <N> -hostfile <HOSTFILE> IMB-MPI1

Example3: Running the Presta benchmarks:

!!! SOMEONE PLEASE CHECK THESE DIRECTORIES AND EXECUTABLE NAMES
     > cd /usr/mpi/gcc/openmpi-1.2.2-1/tests/presta-1.4.0
     > mpirun -np <N> -hostfile <HOSTFILE> com -o 100


On Jun 12, 2007, at 4:12 AM, Tziporet Koren wrote:

> Done
> Tziporet
>
> -----Original Message-----
> From: ewg-bounces at lists.openfabrics.org
> [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Jeff Squyres
> Sent: Monday, June 11, 2007 7:53 PM
> To: OpenFabrics General
> Cc: OpenFabrics EWG
> Subject: [ewg] New OMPI / MPI_READ release notes patch
>
> Tziporet --
>
> Here's a new patch for the OMPI release notes based on your current
> git.  It includes updated information for Open MPI and text about mpi-
> selector.
>
> Note that there are a few areas in MPI_README that I need OSU and
> Mellanox to proofread.  It would also be nice if someone else could
> eyeball the mpi-selector text and ensure it makes sense to a naive
> reader.
>
> -- 
> Jeff Squyres
> Cisco Systems


-- 
Jeff Squyres
Cisco Systems


From arthur.jones at qlogic.com  Tue Jun 12 09:27:03 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 12 Jun 2007 09:27:03 -0700
Subject: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
In-Reply-To: <20070612084108.GK6470@mellanox.co.il>
References: <20070612084108.GK6470@mellanox.co.il>
Message-ID: <20070612162703.GA26197@bauxite.pathscale.com>

hi michael, ...

On Tue, Jun 12, 2007 at 11:41:08AM +0300, Michael S. Tsirkin wrote:
> [...]
> In particular, there were a ton of ipath patches that it seems were
> for the most part applied.
> Qlogic maintainers, please help double check that I did not miss something
> of value.

we've amassed a boatload of patches that are
due to go to roland soon.  it's prob best if
we have a look at your repo once these patches
are integrated...

arthur


From swise at opengridcomputing.com  Tue Jun 12 09:48:12 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 12 Jun 2007 11:48:12 -0500
Subject: [ofa-general] IB and iWarp HCA in same node
In-Reply-To: <46674722.6090302@ucla.edu>
References: <46674722.6090302@ucla.edu>
Message-ID: <466ECE4C.1080106@opengridcomputing.com>

Scott A. Friedman wrote:
> I have a working IB cluster where I have added a Chelsio iWarp card to 
> one node. Another node is connected to that with only an identical iWarp 
> card. I cannot seem to get the iWarp cards to come up. They work through 
> regular ethernet just fineand the IB stuff still works as well. But, 
> when I modprobe iw_cxgb3 and iw_cm utilities like ibstat show the 
> following. Which explains why nothing is working.
> 
> Question is, why? Am I missing or forgetting something? I just want to 
> test the two iWarp cards back to back. Not trying to get some kind of 
> auto bridging or routing working.
> 
> # ibstat
> iWARP RNIC 'cxgb3_0'
>         iWARP RNIC type: cxgb3
>         Number of ports: 1
>         Firmware version: T 4.0.0
>         Hardware version: 1
>         Node GUID: 0x0007430506ea0000
>         System image GUID: 0x0007430506ea0000
>         Port 1:
>                 State: Active
>                 Physical state: No state change
>                 Rate: 20
>                 Base lid: 0
>                 LMC: 0
>                 SM lid: 0
>                 Capability mask: 0x009f0000
>                 Port GUID: 0x0000000000000000

This all looks normal.  What application are you trying to run over rdma 
on the chelsio interface?  rping?


From swise at opengridcomputing.com  Tue Jun 12 09:51:03 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 12 Jun 2007 11:51:03 -0500
Subject: [ofa-general] problem with mvapich2 over iwarp
In-Reply-To: <20070607180437.GD16228@osc.edu>
References: <466042AE.4000006@opengridcomputing.com>
	<20070607180437.GD16228@osc.edu>
Message-ID: <466ECEF7.3080504@opengridcomputing.com>

Pete Wyckoff wrote:
> swise at opengridcomputing.com wrote on Fri, 01 Jun 2007 11:00 -0500:
>> I'm helping a customer who is trying to run mvapich2 over chelsio's 
>> rnic.  They're running a simple program that does an mpi init, 1000 
>> barriers, then a finalize.  They're using ofed-1.2-rc3, mpiexec-0.82, 
>> and mvapich2-0.9.8-p2 (not the mvapich2 from the ofed kit).  Also they 
>> aren't using mpd to start up stuff.  They're using pmi I guess (I'm not 
>> sure what pmi is, but the mpiexec has -comm=pmi.  BTW: I can run the 
>> same program fine on my 8 node cluster using mpd and the ofa mvapich2 code.
> 
> Hey Steve.  The "customer" contacted me about helping with the
> mpiexec aspects of things, assuming we're talking about the same
> people.  It's just an alternative to the MPD startup program, but
> uses the same PMI mechanisms under the hood as does MPD.  And it's a
> much better way to launch parallel jobs, but I'm biased since I
> wrote it.  :)
> 
> The hang in rdma_destroy_id() that you describe, does it happen for
> both both mpd and mpiexec startup?
> 
> I doubt that the mpiexec issue would matter, but frequently tell
> people to try it using straight mpirun just to make sure.  The PMI
> protocol under the hood is just a way for processes to exchange
> data---mpiexec doesn't know anything about MPI itself or iwarp, it
> just moves the information around.  So we generally don't see any
> problems with starting up mpich2 programs on all sorts of weird
> hardware.
> 
> Offering to help if you have any more information.  I've asked for
> them to send me debug logs of the mpd and mpiexec startups, but
> don't have an account on their machine yet.
> 
> 		-- Pete

Thanks Pete.

I've been out of town until today.  I think they have it working.  I 
believe the bug they saw was in an older version of mvapich2 that 
Sundeep fixed a while back.  After rebuilding and re-installing, they 
don't seem to hit it anymore.  The symptoms definitely seemed like the 
previous bug he fixed.

Anyway, thanks for helping and explaining mpiexec.  I'll hollar if 
anything else comes up.

Steve.


From pradeeps at linux.vnet.ibm.com  Tue Jun 12 10:11:52 2007
From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana)
Date: Tue, 12 Jun 2007 10:11:52 -0700
Subject: [ofa-general] Re: IPOIB CM (NOSRQ) extension
In-Reply-To: <20070611181849.GF6470@mellanox.co.il>
References: <46687642.8040208@linux.vnet.ibm.com>
	<20070610044146.GA4959@mellanox.co.il>
	<466D8FAF.5090800@linux.vnet.ibm.com>
	<20070611181804.GE6470@mellanox.co.il>
	<20070611181849.GF6470@mellanox.co.il>
Message-ID: <466ED3D8.7000607@linux.vnet.ibm.com>

Michael S. Tsirkin wrote:
>> Quoting Michael S. Tsirkin <mst at dev.mellanox.co.il>:
>> Subject: Re: IPOIB CM (NOSRQ) extension
>>
>>> Quoting Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>:
>>> Subject: Re: IPOIB CM (NOSRQ) extension
>>>
>>> Michael S. Tsirkin wrote:
>>>>> Quoting Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>:
>>>>> Subject: IPOIB CM (NOSRQ) extension
>>>>>
>>>>> This patch handles the corner case of running out of RC QPs. In that
>>>>> case it switches to UD mode. This patch can be used both by NOSRQ and
>>>>> SRQ code.
>>>>>
>>>>> Signed-off-by: Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>
>>>> You don't provide any way to retry going back to connected mode,
>>>> after a failure, which is really intermittent by nature. That's pretty bad.
>>> This node switched to datagram mode, because the passive side was
>>> under a resource crunch (no RC QPs). And, the user is indeed alerted
>>> about this condition. So, yes we do not attempt to go back to connected
>>> mode.
>> Need to retry switching to datagram mode after a while.
> 
> Sorry, that should have been "switching to connected mode".

So, you are suggesting that we ping-pong between datagram mode and
connected mode. In the first place I was opposed to just switching to
datagram mode when there are no RC QPs. This suggestion goes even
further.

We seem to have polar opposite view points on this issue. And rather
than simply persisting with our viewpoints we need to back that up with
more concrete reasoning.

The reason I disagree with this approach is for the following reasons:

1) This switch to datagram mode happens when we are in a resource crunch
kind of situation. The resource crunch should be flagged and corrective
action needs to be taken. Switching to datagram mode simply prolongs the
agony.

2) Ping-Ponging between connected mode and datagram mode makes the
situation even worse. In HPC environments cluster nodes simply do not
appear and disappear. They continue to stay on (in the cluster). So,
trying to switch to connected mode does not achieve any purpose.

Can you tell me why "switching to connected mode" is a must?

Pradeep


From swise at opengridcomputing.com  Tue Jun 12 10:21:45 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 12 Jun 2007 12:21:45 -0500
Subject: [ofa-general] problem with mvapich2 over iwarp
In-Reply-To: <466ECEF7.3080504@opengridcomputing.com>
References: <466042AE.4000006@opengridcomputing.com>	<20070607180437.GD16228@osc.edu>
	<466ECEF7.3080504@opengridcomputing.com>
Message-ID: <466ED629.20208@opengridcomputing.com>

Steve Wise wrote:
> Pete Wyckoff wrote:
>> swise at opengridcomputing.com wrote on Fri, 01 Jun 2007 11:00 -0500:
>>> I'm helping a customer who is trying to run mvapich2 over chelsio's 
>>> rnic.  They're running a simple program that does an mpi init, 1000 
>>> barriers, then a finalize.  They're using ofed-1.2-rc3, mpiexec-0.82, 
>>> and mvapich2-0.9.8-p2 (not the mvapich2 from the ofed kit).  Also 
>>> they aren't using mpd to start up stuff.  They're using pmi I guess 
>>> (I'm not sure what pmi is, but the mpiexec has -comm=pmi.  BTW: I can 
>>> run the same program fine on my 8 node cluster using mpd and the ofa 
>>> mvapich2 code.
>>
>> Hey Steve.  The "customer" contacted me about helping with the
>> mpiexec aspects of things, assuming we're talking about the same
>> people.  It's just an alternative to the MPD startup program, but
>> uses the same PMI mechanisms under the hood as does MPD.  And it's a
>> much better way to launch parallel jobs, but I'm biased since I
>> wrote it.  :)
>>
>> The hang in rdma_destroy_id() that you describe, does it happen for
>> both both mpd and mpiexec startup?
>>
>> I doubt that the mpiexec issue would matter, but frequently tell
>> people to try it using straight mpirun just to make sure.  The PMI
>> protocol under the hood is just a way for processes to exchange
>> data---mpiexec doesn't know anything about MPI itself or iwarp, it
>> just moves the information around.  So we generally don't see any
>> problems with starting up mpich2 programs on all sorts of weird
>> hardware.
>>
>> Offering to help if you have any more information.  I've asked for
>> them to send me debug logs of the mpd and mpiexec startups, but
>> don't have an account on their machine yet.
>>
>>         -- Pete
> 
> Thanks Pete.
> 
> I've been out of town until today.  I think they have it working.  I 
> believe the bug they saw was in an older version of mvapich2 that 
> Sundeep fixed a while back.  After rebuilding and re-installing, they 
> don't seem to hit it anymore.  The symptoms definitely seemed like the 
> previous bug he fixed.
> 
> Anyway, thanks for helping and explaining mpiexec.  I'll hollar if 
> anything else comes up.
> 
> Steve.

Ignore this last reply.  I hadn't caught up on my email for that issue 
and I think maybe there are still problems with all this.

Steve.


From sean.hefty at intel.com  Tue Jun 12 11:03:24 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 12 Jun 2007 11:03:24 -0700
Subject: [ofa-general] crash in ipoib
Message-ID: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>

Copying ofa general list.

We've seen a crash similar to this now a total of 4 times. 

These are x64, 2.6.9-42.EL.  The crashes only seem to occur on a specific set of
systems in our cluster.

The latest crash has a similar stack trace as the one listed below.

badness in 18042_panic_blink drivers/input/serio/18042.c : 992
18042_panic_blink + 485
panic + 445
apic_timer_interrupt + 133
oops_end + 38
oops_end + 65
do_page_fault + 1204
ipoib_cm_send + 433
error_exit
ipoib_ib_completion + 0
ipoib_cm_handle_rx_wc + 239

(the trace goes on and on)

- Sean

>>No known issues with IPoIB. Can you send the command line and all
>>details on the machine you work.
>>Also - do you have the oops printout
>
>Woody will need to provide details on the machine.  Here's what's available
>from the oops printout:  (might not be related to ipoib or cm)
>
>(top portion is cut off)
>badness in 18042_panic_blink drivers/input/serio/18042.c : 992
>18042_panic_blink + 485
>panic + 445
>apic_timer_interrupt + 133
>oops_end + 38
>oops_end + 65
>do_page_fault + 1204
>error_exit
>ipoib_ib_completion
>ipoib_cm_handle_rx_wc + 378
>ipoib_ib_completion + 144
>usb_hcd_irq
>mthca_eq_int + 221
>ret_from_intr
>mthca_tavor_interrupt + 95
>handle_IRQ_event
>do_IRQ
>ret_from_intr
>csum_partial + 725
>skb_checksum + 308
>ip_conntrack:tcp_error + 312
>ip_conntrack_in + 163
>try_to_wake_up + 876
>nf_iterate + 82
>ip_rcv_finish
>ip_rcv + 1119
>net1f_receive_sck + 791
>process_backlog + 136
>net_rx_action
>do_softirq
>do_IRQ
>ret_from_intr
>spin_unlock_irqrestore
>ib_send_cm_rep
>ib_ipoib_cm_rx_handler
>cm_alloc_msg
>ib_send_cm_rtu
>ipoib_cm_rx_event_handler
>ib_find_cached_pkey
>cm_process_work
>cm_req_handler
>cm_work_handler
>cm_work_handler
>worker_thread
>blah blah blah


From pradeeps at linux.vnet.ibm.com  Tue Jun 12 11:04:17 2007
From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana)
Date: Tue, 12 Jun 2007 11:04:17 -0700
Subject: [ofa-general] IPOB CM (NOSRQ) [PATCH V6] patch
Message-ID: <466EE021.30302@linux.vnet.ibm.com>

Here is a sixth version of the IPOIB_CM_NOSRQ patch. This patch will
benefit adapters that do not support shared receive queues.

Changes from V4:
1. Eliminated some redundant #defines and corrected printk
2. Introduced missing spinlock.

This patch has been tested with linux-2.6.22-rc4 derived from Roland's
for-2.6.23 git tree on 06/11 on ppc64 machines

Signed-off-by: Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>
---

--- a/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib.h	2007-05-30 
14:56:25.000000000 -0400
+++ b/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib.h	2007-06-11 
19:24:24.000000000 -0400
@@ -95,11 +95,16 @@ enum {
  	IPOIB_MCAST_FLAG_ATTACHED = 3,
  };

+#define CM_PACKET_SIZE (1ul << 16)
  #define	IPOIB_OP_RECV   (1ul << 31)
  #ifdef CONFIG_INFINIBAND_IPOIB_CM
-#define	IPOIB_CM_OP_SRQ (1ul << 30)
+#define	IPOIB_CM_OP_RECV (1ul << 30)
+
+#define NOSRQ_INDEX_TABLE_SIZE 128
+#define NOSRQ_INDEX_MASK      (NOSRQ_INDEX_TABLE_SIZE -1)
+
  #else
-#define	IPOIB_CM_OP_SRQ (0)
+#define	IPOIB_CM_OP_RECV (0)
  #endif

  /* structs */
@@ -166,11 +171,14 @@ enum ipoib_cm_state {
  };

  struct ipoib_cm_rx {
-	struct ib_cm_id     *id;
-	struct ib_qp        *qp;
-	struct list_head     list;
-	struct net_device   *dev;
-	unsigned long        jiffies;
+	struct ib_cm_id     	*id;
+	struct ib_qp        	*qp;
+	struct ipoib_cm_rx_buf  *rx_ring; /* Used by NOSRQ only */
+	struct list_head     	 list;
+	struct net_device   	*dev;
+	unsigned long        	 jiffies;
+	u32                      index; /* wr_ids are distinguished by index
+					 * to identify the QP -NOSRQ only */
  	enum ipoib_cm_state  state;
  };

@@ -215,6 +223,8 @@ struct ipoib_cm_dev_priv {
  	struct ib_wc            ibwc[IPOIB_NUM_WC];
  	struct ib_sge           rx_sge[IPOIB_CM_RX_SG];
  	struct ib_recv_wr       rx_wr;
+	struct ipoib_cm_rx	**rx_index_table; /* See ipoib_cm_dev_init()
+						   *for usage of this element */
  };

  /*
@@ -564,10 +574,9 @@ static inline void ipoib_cm_skb_too_long
  	dev_kfree_skb_any(skb);
  }

-static inline void ipoib_cm_handle_rx_wc(struct net_device *dev, struct 
ib_wc *wc)
+void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
  {
  }
-
  #endif

  #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
--- a/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
2007-05-30 14:56:25.000000000 -0400
+++ b/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
2007-06-11 19:36:32.000000000 -0400
@@ -49,6 +49,20 @@ MODULE_PARM_DESC(cm_data_debug_level,

  #include "ipoib.h"

+int max_rc_qp = NOSRQ_INDEX_TABLE_SIZE;
+int max_recv_buf = 1024; /* Default is 1024 MB */
+
+module_param_named(nosrq_max_rc_qp, max_rc_qp, int, 0644);
+MODULE_PARM_DESC(nosrq_max_rc_qp, "Max number of NOSRQ RC QPs supported");
+
+module_param_named(max_recieve_buffer, max_recv_buf, int, 0644);
+MODULE_PARM_DESC(max_recieve_buffer, "Max Recieve Buffer Size in MB");
+
+struct ipoib_cm_nosrq_count {
+	spinlock_t lock;
+	int current_rc_qp; /* Active number of RC QPs for NOSRQ */
+} nosrq_count;
+
  #define IPOIB_CM_IETF_ID 0x1000000000000000ULL

  #define IPOIB_CM_RX_UPDATE_TIME (256 * HZ)
@@ -88,20 +102,20 @@ static void ipoib_cm_dma_unmap_rx(struct
  		ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, 
DMA_FROM_DEVICE);
  }

-static int ipoib_cm_post_receive(struct net_device *dev, int id)
+static int post_receive_srq(struct net_device *dev, u64 id)
  {
  	struct ipoib_dev_priv *priv = netdev_priv(dev);
  	struct ib_recv_wr *bad_wr;
  	int i, ret;

-	priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ;
+	priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_RECV;

  	for (i = 0; i < IPOIB_CM_RX_SG; ++i)
  		priv->cm.rx_sge[i].addr = priv->cm.srq_ring[id].mapping[i];

  	ret = ib_post_srq_recv(priv->cm.srq, &priv->cm.rx_wr, &bad_wr);
  	if (unlikely(ret)) {
-		ipoib_warn(priv, "post srq failed for buf %d (%d)\n", id, ret);
+		ipoib_warn(priv, "post srq failed for buf %ld (%d)\n", id, ret);
  		ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1,
  				      priv->cm.srq_ring[id].mapping);
  		dev_kfree_skb_any(priv->cm.srq_ring[id].skb);
@@ -111,12 +125,47 @@ static int ipoib_cm_post_receive(struct
  	return ret;
  }

-static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, 
int id, int frags,
+static int post_receive_nosrq(struct net_device *dev, u64 id)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ib_recv_wr *bad_wr;
+	int i, ret;
+	u32 index;
+	u32 wr_id;
+	struct ipoib_cm_rx *rx_ptr;
+
+	index = id  & NOSRQ_INDEX_MASK ;
+	wr_id = id >> 32;
+
+	rx_ptr = priv->cm.rx_index_table[index];
+
+	priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_RECV;
+
+	for (i = 0; i < IPOIB_CM_RX_SG; ++i)
+		priv->cm.rx_sge[i].addr = rx_ptr->rx_ring[wr_id].mapping[i];
+
+	ret = ib_post_recv(rx_ptr->qp, &priv->cm.rx_wr, &bad_wr);
+	if (unlikely(ret)) {
+		ipoib_warn(priv, "post recv failed for buf %d (%d)\n",
+		           wr_id, ret);
+		ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1,
+		                      rx_ptr->rx_ring[wr_id].mapping);
+		dev_kfree_skb_any(rx_ptr->rx_ring[wr_id].skb);
+		rx_ptr->rx_ring[wr_id].skb = NULL;
+	}
+
+	return ret;
+}
+
+static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, 
u64 id,
+					     int frags,
  					     u64 mapping[IPOIB_CM_RX_SG])
  {
  	struct ipoib_dev_priv *priv = netdev_priv(dev);
  	struct sk_buff *skb;
  	int i;
+	struct ipoib_cm_rx *rx_ptr;
+	u32 index, wr_id;

  	skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12);
  	if (unlikely(!skb))
@@ -148,7 +197,14 @@ static struct sk_buff *ipoib_cm_alloc_rx
  			goto partial_error;
  	}

-	priv->cm.srq_ring[id].skb = skb;
+	if (priv->cm.srq)
+		priv->cm.srq_ring[id].skb = skb;
+	else {
+		index = id  & NOSRQ_INDEX_MASK ;
+		wr_id = id >> 32;
+		rx_ptr = priv->cm.rx_index_table[index];
+		rx_ptr->rx_ring[wr_id].skb = skb;
+	}
  	return skb;

  partial_error:
@@ -205,16 +261,21 @@ static struct ib_qp *ipoib_cm_create_rx_
  {
  	struct ipoib_dev_priv *priv = netdev_priv(dev);
  	struct ib_qp_init_attr attr = {
-		.event_handler = ipoib_cm_rx_event_handler,
  		.send_cq = priv->cq, /* For drain WR */
  		.recv_cq = priv->cq,
  		.srq = priv->cm.srq,
  		.cap.max_send_wr = 1, /* For drain WR */
+		.cap.max_recv_wr = ipoib_recvq_size + 1,
  		.cap.max_send_sge = 1, /* FIXME: 0 Seems not to work */
  		.sq_sig_type = IB_SIGNAL_ALL_WR,
  		.qp_type = IB_QPT_RC,
  		.qp_context = p,
  	};
+	if (!priv->cm.srq) {
+		attr.cap.max_recv_sge = IPOIB_CM_RX_SG;	
+		attr.event_handler = NULL;
+	} else
+		attr.event_handler = ipoib_cm_rx_event_handler;
  	return ib_create_qp(priv->pd, &attr);
  }

@@ -289,12 +350,120 @@ static int ipoib_cm_send_rep(struct net_
  	rep.flow_control = 0;
  	rep.rnr_retry_count = req->rnr_retry_count;
  	rep.target_ack_delay = 20; /* FIXME */
-	rep.srq = 1;
  	rep.qp_num = qp->qp_num;
  	rep.starting_psn = psn;
+	rep.srq	= !!priv->cm.srq;
  	return ib_send_cm_rep(cm_id, &rep);
  }

+static void init_context_and_add_list(struct ib_cm_id *cm_id,
+				    struct ipoib_cm_rx *p,
+				    struct ipoib_dev_priv *priv)
+{
+	cm_id->context = p;
+	p->jiffies = jiffies;
+	spin_lock_irq(&priv->lock);
+	if (list_empty(&priv->cm.passive_ids))
+		queue_delayed_work(ipoib_workqueue,
+				   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
+	list_add(&p->list, &priv->cm.passive_ids);
+	spin_unlock_irq(&priv->lock);
+}
+
+static int allocate_and_post_rbuf_nosrq(struct ib_cm_id *cm_id,
+				        struct ipoib_cm_rx *p, unsigned psn)
+{
+	struct net_device *dev = cm_id->context;
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int ret;
+	u32 qp_num, index;
+	u64 i, recv_mem_used;
+
+	qp_num = p->qp->qp_num;
+
+	/* In the SRQ case there is a common rx buffer called the srq_ring.
+	 * However, for the NOSRQ we create an rx_ring for every
+	 * struct ipoib_cm_rx.
+	 */
+	p->rx_ring = kzalloc(ipoib_recvq_size * sizeof *p->rx_ring, GFP_KERNEL);
+	if (!p->rx_ring) {
+		printk(KERN_WARNING "Failed to allocate rx_ring for 0x%x\n",
+		       qp_num);
+		return -ENOMEM;
+	}
+
+	init_context_and_add_list(cm_id, p, priv);
+	spin_lock_irq(&priv->lock);
+		
+	for (index = 0; index < max_rc_qp; index++)
+		if (priv->cm.rx_index_table[index] == NULL)
+			break;
+
+	spin_lock(&nosrq_count.lock);
+	recv_mem_used = (u64)ipoib_recvq_size * (u64)nosrq_count.current_rc_qp
+			* CM_PACKET_SIZE; /* packets are 64K */
+	spin_unlock(&nosrq_count.lock);
+	if ((index == max_rc_qp) ||
+	( recv_mem_used >= max_recv_buf * (1ul << 20))) {
+		spin_unlock_irq(&priv->lock);
+		ipoib_warn(priv, "NOSRQ has reached the configurable limit "
+		           "of either %d RC QPs or, max recv buf size of "
+			   "0x%lx MB\n", max_rc_qp, max_recv_buf);
+
+		/* We send a REJ to the remote side indicating that we
+		 * have no more free RC QPs and leave it to the remote side
+		 * to take appropriate action. This should leave the
+		 * current set of QPs unaffected and any subsequent REQs
+		 * will be able to use RC QPs if they are available.
+		 */
+		ib_send_cm_rej(cm_id, IB_CM_REJ_NO_QP, NULL, 0, NULL, 0);
+		ret = -EINVAL;
+		goto err_send_rej;
+	}
+
+	priv->cm.rx_index_table[index] = p;
+	spin_unlock_irq(&priv->lock);
+
+	/* We will subsequently use this stored pointer while freeing
+	 * resources in stale task */
+	p->index = index;
+
+	ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
+	if (ret) {
+		ipoib_warn(priv, "ipoib_cm_modify_rx_qp() failed %d\n", ret);
+		ipoib_cm_dev_cleanup(dev);
+		goto err_modify_nosrq;
+	}
+
+	for (i = 0; i < ipoib_recvq_size; ++i) {
+		if (!ipoib_cm_alloc_rx_skb(dev, i << 32 | index,
+					   IPOIB_CM_RX_SG - 1,
+					   p->rx_ring[i].mapping)) {
+			ipoib_warn(priv, "failed to allocate receive "
+			           "buffer %ld\n", i);
+			ipoib_cm_dev_cleanup(dev);
+			ret = -ENOMEM;
+			goto err_alloc_and_post;
+		}
+
+		if (post_receive_nosrq(dev, i << 32 | index)) {
+			ipoib_warn(priv, "post_receive_nosrq "
+			           "failed for  buf %ld\n", i);
+			ipoib_cm_dev_cleanup(dev);
+			ret = -EIO;
+			goto err_alloc_and_post;
+		}
+	}
+
+	return 0;
+
+err_send_rej:
+err_modify_nosrq:
+err_alloc_and_post:
+	kfree(p->rx_ring);
+	return ret;
+}
+
  static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct 
ib_cm_event *event)
  {
  	struct net_device *dev = cm_id->context;
@@ -305,8 +474,11 @@ static int ipoib_cm_req_handler(struct i

  	ipoib_dbg(priv, "REQ arrived\n");
  	p = kzalloc(sizeof *p, GFP_KERNEL);
-	if (!p)
+	if (!p) {
+		printk(KERN_WARNING "Failed to allocate RX control block when "
+		       "REQ arrived\n");
  		return -ENOMEM;
+	}
  	p->dev = dev;
  	p->id = cm_id;
  	p->qp = ipoib_cm_create_rx_qp(dev, p);
@@ -316,9 +488,18 @@ static int ipoib_cm_req_handler(struct i
  	}

  	psn = random32() & 0xffffff;
-	ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
-	if (ret)
-		goto err_modify;
+	if (!priv->cm.srq) {
+		spin_lock(&nosrq_count.lock);
+		nosrq_count.current_rc_qp++;
+		spin_unlock(&nosrq_count.lock);
+		if (ret = allocate_and_post_rbuf_nosrq(cm_id, p, psn))
+			goto err_post_nosrq;
+	} else {
+		p->rx_ring = NULL;
+		ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
+		if (ret)
+			goto err_modify;
+	}

  	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
  	if (ret) {
@@ -326,18 +507,18 @@ static int ipoib_cm_req_handler(struct i
  		goto err_rep;
  	}

-	cm_id->context = p;
-	p->jiffies = jiffies;
-	p->state = IPOIB_CM_RX_LIVE;
-	spin_lock_irq(&priv->lock);
-	if (list_empty(&priv->cm.passive_ids))
-		queue_delayed_work(ipoib_workqueue,
-				   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
-	list_add(&p->list, &priv->cm.passive_ids);
-	spin_unlock_irq(&priv->lock);
+	if (priv->cm.srq) {
+		init_context_and_add_list(cm_id, p, priv);
+		p->state = IPOIB_CM_RX_LIVE;
+	}
  	return 0;

  err_rep:
+err_post_nosrq:
+	list_del_init(&p->list);
+	spin_lock(&nosrq_count.lock);
+	nosrq_count.current_rc_qp--;
+	spin_unlock(&nosrq_count.lock);
  err_modify:
  	ib_destroy_qp(p->qp);
  err_qp:
@@ -401,21 +582,51 @@ static void skb_put_frags(struct sk_buff
  	}
  }

-void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
+static void timer_check_srq(struct ipoib_dev_priv *priv, struct 
ipoib_cm_rx *p)
+{
+	unsigned long flags;
+
+	if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) {
+		spin_lock_irqsave(&priv->lock, flags);
+		p->jiffies = jiffies;
+		/* Move this entry to list head, but do
+		 * not re-add it if it has been removed. */
+		if (p->state == IPOIB_CM_RX_LIVE)
+			list_move(&p->list, &priv->cm.passive_ids);
+		spin_unlock_irqrestore(&priv->lock, flags);
+	}
+}
+
+static void timer_check_nosrq(struct ipoib_dev_priv *priv, struct 
ipoib_cm_rx *p)
+{
+	unsigned long flags;
+
+	if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) {
+		spin_lock_irqsave(&priv->lock, flags);
+		p->jiffies = jiffies;
+		/* Move this entry to list head, but do
+		 * not re-add it if it has been removed. */
+		if (!list_empty(&p->list))	
+			list_move(&p->list, &priv->cm.passive_ids);
+		spin_unlock_irqrestore(&priv->lock, flags);
+	}
+}
+
+void handle_rx_wc_srq(struct net_device *dev, struct ib_wc *wc)
  {
  	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ;
+	u64 wr_id = wc->wr_id & ~IPOIB_CM_OP_RECV;
  	struct sk_buff *skb, *newskb;
  	struct ipoib_cm_rx *p;
  	unsigned long flags;
  	u64 mapping[IPOIB_CM_RX_SG];
-	int frags;
+	int frags, ret;

  	ipoib_dbg_data(priv, "cm recv completion: id %d, status: %d\n",
  		       wr_id, wc->status);

  	if (unlikely(wr_id >= ipoib_recvq_size)) {
-		if (wr_id == (IPOIB_CM_RX_DRAIN_WRID & ~IPOIB_CM_OP_SRQ)) {
+		if (wr_id == (IPOIB_CM_RX_DRAIN_WRID & ~IPOIB_CM_OP_RECV)) {
  			spin_lock_irqsave(&priv->lock, flags);
  			list_splice_init(&priv->cm.rx_drain_list, &priv->cm.rx_reap_list);
  			ipoib_cm_start_rx_drain(priv);
@@ -434,20 +645,12 @@ void ipoib_cm_handle_rx_wc(struct net_de
  			   "(status=%d, wrid=%d vend_err %x)\n",
  			   wc->status, wr_id, wc->vendor_err);
  		++priv->stats.rx_dropped;
-		goto repost;
+		goto repost_srq;
  	}

  	if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) {
  		p = wc->qp->qp_context;
-		if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) {
-			spin_lock_irqsave(&priv->lock, flags);
-			p->jiffies = jiffies;
-			/* Move this entry to list head, but do not re-add it
-			 * if it has been moved out of list. */
-			if (p->state == IPOIB_CM_RX_LIVE)
-				list_move(&p->list, &priv->cm.passive_ids);
-			spin_unlock_irqrestore(&priv->lock, flags);
-		}
+		timer_check_srq(priv, p);
  	}

  	frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len,
@@ -459,13 +662,113 @@ void ipoib_cm_handle_rx_wc(struct net_de
  		 * If we can't allocate a new RX buffer, dump
  		 * this packet and reuse the old buffer.
  		 */
-		ipoib_dbg(priv, "failed to allocate receive buffer %d\n", wr_id);
+		ipoib_dbg(priv, "failed to allocate receive buffer %ld\n", wr_id);
+                ++priv->stats.rx_dropped;
+                goto repost_srq;
+        }
+
+	ipoib_cm_dma_unmap_rx(priv, frags,
+	                      priv->cm.srq_ring[wr_id].mapping);
+	memcpy(priv->cm.srq_ring[wr_id].mapping, mapping,
+	       (frags + 1) * sizeof *mapping);
+	ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n",
+		       wc->byte_len, wc->slid);
+
+	skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len, newskb);
+
+	skb->protocol = ((struct ipoib_header *) skb->data)->proto;
+	skb_reset_mac_header(skb);	
+	skb_pull(skb, IPOIB_ENCAP_LEN);
+
+	dev->last_rx = jiffies;
+	++priv->stats.rx_packets;
+	priv->stats.rx_bytes += skb->len;
+
+	skb->dev = dev;
+	/* XXX get correct PACKET_ type here */
+	skb->pkt_type = PACKET_HOST;
+	netif_rx_ni(skb);
+
+repost_srq:
+	ret = post_receive_srq(dev, wr_id);
+
+	if (unlikely(ret))
+		ipoib_warn(priv, "post_receive_srq failed for buf %ld\n",
+		           wr_id);
+
+}
+
+static void handle_rx_wc_nosrq(struct net_device *dev, struct ib_wc *wc)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct sk_buff *skb, *newskb;
+	u64 mapping[IPOIB_CM_RX_SG], wr_id = wc->wr_id >> 32;
+	u32 index;
+	struct ipoib_cm_rx *p, *rx_ptr;
+	int frags, ret;
+
+
+	ipoib_dbg_data(priv, "cm recv completion: id %d, status: %d\n",
+		       wr_id, wc->status);
+
+	if (unlikely(wr_id >= ipoib_recvq_size)) {
+		ipoib_warn(priv, "cm recv completion event with wrid %d (> %d)\n",
+				   wr_id, ipoib_recvq_size);
+		return;
+	}
+
+	index = (wc->wr_id & ~IPOIB_CM_OP_RECV) & NOSRQ_INDEX_MASK ;
+
+	/* This is the only place where rx_ptr could be a NULL - could
+	 * have just received a packet from a connection that has become
+	 * stale and so is going away. We will simply drop the packet and
+	 * let the hardware (it s IB_QPT_RC) handle the dropped packet.
+	 * In the timer_check() function below, p->jiffies is updated and
+	 * hence the connection will not be stale after that.
+	 */
+	rx_ptr = priv->cm.rx_index_table[index];
+	if (unlikely(!rx_ptr)) {
+		ipoib_warn(priv, "Received packet from a connection "
+		           "that is going away. Hardware will handle it.\n");
+		return;
+	}
+
+	skb = rx_ptr->rx_ring[wr_id].skb;
+
+	if (unlikely(wc->status != IB_WC_SUCCESS)) {
+		ipoib_dbg(priv, "cm recv error "
+			   "(status=%d, wrid=%ld vend_err %x)\n",
+			   wc->status, wr_id, wc->vendor_err);
+		++priv->stats.rx_dropped;
+		goto repost_nosrq;
+	}
+
+	if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) {
+		/* There are no guarantees that wc->qp is not NULL for HCAs
+	 	* that do not support SRQ. */
+		p = rx_ptr;
+		timer_check_nosrq(priv, p);
+	}
+
+	frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len,
+					      (unsigned)IPOIB_CM_HEAD_SIZE)) / PAGE_SIZE;
+
+	newskb = ipoib_cm_alloc_rx_skb(dev, wr_id << 32 | index, frags,
+				       mapping);
+	if (unlikely(!newskb)) {
+		/*
+		 * If we can't allocate a new RX buffer, dump
+		 * this packet and reuse the old buffer.
+		 */
+		ipoib_dbg(priv, "failed to allocate receive buffer %ld\n", wr_id);
  		++priv->stats.rx_dropped;
-		goto repost;
+		goto repost_nosrq;
  	}

-	ipoib_cm_dma_unmap_rx(priv, frags, priv->cm.srq_ring[wr_id].mapping);
-	memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, (frags + 1) * sizeof 
*mapping);
+	ipoib_cm_dma_unmap_rx(priv, frags,
+	                      rx_ptr->rx_ring[wr_id].mapping);
+	memcpy(rx_ptr->rx_ring[wr_id].mapping, mapping,
+	       (frags + 1) * sizeof *mapping);

  	ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n",
  		       wc->byte_len, wc->slid);
@@ -485,10 +788,22 @@ void ipoib_cm_handle_rx_wc(struct net_de
  	skb->pkt_type = PACKET_HOST;
  	netif_receive_skb(skb);

-repost:
-	if (unlikely(ipoib_cm_post_receive(dev, wr_id)))
-		ipoib_warn(priv, "ipoib_cm_post_receive failed "
-			   "for buf %d\n", wr_id);
+repost_nosrq:
+	ret = post_receive_nosrq(dev, wr_id << 32 | index);
+
+	if (unlikely(ret))
+		ipoib_warn(priv, "post_receive_nosrq failed for buf %ld\n",
+		           wr_id);
+}
+
+void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+
+	if (priv->cm.srq)
+		handle_rx_wc_srq(dev, wc);
+	else
+		handle_rx_wc_nosrq(dev, wc);
  }

  static inline int post_send(struct ipoib_dev_priv *priv,
@@ -680,6 +995,44 @@ err_cm:
  	return ret;
  }

+static void free_resources_nosrq(struct ipoib_dev_priv *priv, struct 
ipoib_cm_rx *p)
+{
+	int i;
+
+	for(i = 0; i < ipoib_recvq_size; ++i)
+		if(p->rx_ring[i].skb) {
+			ipoib_cm_dma_unmap_rx(priv,
+				         IPOIB_CM_RX_SG - 1,
+					 p->rx_ring[i].mapping);
+			dev_kfree_skb_any(p->rx_ring[i].skb);
+			p->rx_ring[i].skb = NULL;
+		}
+	kfree(p->rx_ring);
+}
+
+void dev_stop_nosrq(struct ipoib_dev_priv *priv)
+{
+	struct ipoib_cm_rx *p;
+
+	spin_lock_irq(&priv->lock);
+	while (!list_empty(&priv->cm.passive_ids)) {
+		p = list_entry(priv->cm.passive_ids.next, typeof(*p), list);
+		free_resources_nosrq(priv, p);
+		list_del_init(&p->list);
+		spin_unlock_irq(&priv->lock);
+		ib_destroy_cm_id(p->id);
+		ib_destroy_qp(p->qp);
+		spin_lock(&nosrq_count.lock);
+		nosrq_count.current_rc_qp--;
+		spin_unlock(&nosrq_count.lock);
+		kfree(p);
+		spin_lock_irq(&priv->lock);
+	}
+	spin_unlock_irq(&priv->lock);
+
+	cancel_delayed_work(&priv->cm.stale_task);
+}
+
  void ipoib_cm_dev_stop(struct net_device *dev)
  {
  	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -694,6 +1047,11 @@ void ipoib_cm_dev_stop(struct net_device
  	ib_destroy_cm_id(priv->cm.id);
  	priv->cm.id = NULL;

+	if (!priv->cm.srq) {
+		dev_stop_nosrq(priv);
+		return;
+	}
+
  	spin_lock_irq(&priv->lock);
  	while (!list_empty(&priv->cm.passive_ids)) {
  		p = list_entry(priv->cm.passive_ids.next, typeof(*p), list);
@@ -739,6 +1097,7 @@ void ipoib_cm_dev_stop(struct net_device
  		kfree(p);
  	}

+
  	cancel_delayed_work(&priv->cm.stale_task);
  }

@@ -817,7 +1176,9 @@ static struct ib_qp *ipoib_cm_create_tx_
  	attr.recv_cq = priv->cq;
  	attr.srq = priv->cm.srq;
  	attr.cap.max_send_wr = ipoib_sendq_size;
+	attr.cap.max_recv_wr = 1;
  	attr.cap.max_send_sge = 1;
+	attr.cap.max_recv_sge = 1;
  	attr.sq_sig_type = IB_SIGNAL_ALL_WR;
  	attr.qp_type = IB_QPT_RC;
  	attr.send_cq = cq;
@@ -857,7 +1218,7 @@ static int ipoib_cm_send_req(struct net_
  	req.retry_count 	      = 0; /* RFC draft warns against retries */
  	req.rnr_retry_count 	      = 0; /* RFC draft warns against retries */
  	req.max_cm_retries 	      = 15;
-	req.srq 	              = 1;
+	req.srq			      = !!priv->cm.srq;
  	return ib_send_cm_req(id, &req);
  }

@@ -1202,6 +1563,11 @@ static void ipoib_cm_rx_reap(struct work
  	list_for_each_entry_safe(p, n, &list, list) {
  		ib_destroy_cm_id(p->id);
  		ib_destroy_qp(p->qp);
+		if (!priv->cm.srq) {	
+			spin_lock(&nosrq_count.lock);
+			nosrq_count.current_rc_qp--;
+			spin_unlock(&nosrq_count.lock);
+		}
  		kfree(p);
  	}
  }
@@ -1220,12 +1586,19 @@ static void ipoib_cm_stale_task(struct w
  		p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list);
  		if (time_before_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT))
  			break;
-		list_move(&p->list, &priv->cm.rx_error_list);
-		p->state = IPOIB_CM_RX_ERROR;
-		spin_unlock_irq(&priv->lock);
-		ret = ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE);
-		if (ret)
-			ipoib_warn(priv, "unable to move qp to error state: %d\n", ret);
+		if (!priv->cm.srq) {
+			free_resources_nosrq(priv, p);
+			list_del_init(&p->list);
+			priv->cm.rx_index_table[p->index] = NULL;
+			spin_unlock_irq(&priv->lock);
+		} else {
+			list_move(&p->list, &priv->cm.rx_error_list);
+			p->state = IPOIB_CM_RX_ERROR;
+			spin_unlock_irq(&priv->lock);
+			ret = ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE);
+			if (ret)
+				ipoib_warn(priv, "unable to move qp to error state: %d\n", ret);
+		}
  		spin_lock_irq(&priv->lock);
  	}

@@ -1279,16 +1652,40 @@ int ipoib_cm_add_mode_attr(struct net_de
  	return device_create_file(&dev->dev, &dev_attr_mode);
  }

+static int create_srq(struct net_device *dev, struct ipoib_dev_priv *priv)
+{
+	struct ib_srq_init_attr srq_init_attr;
+	int ret;
+
+	srq_init_attr.attr.max_wr = ipoib_recvq_size;
+	srq_init_attr.attr.max_sge = IPOIB_CM_RX_SG;
+
+	priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr);
+	if (IS_ERR(priv->cm.srq)) {
+		ret = PTR_ERR(priv->cm.srq);
+		priv->cm.srq = NULL;
+		return ret;
+	}
+
+	priv->cm.srq_ring = kzalloc(ipoib_recvq_size *
+		                    sizeof *priv->cm.srq_ring,
+			            GFP_KERNEL);
+	if (!priv->cm.srq_ring) {
+		printk(KERN_WARNING "%s: failed to allocate CM ring "
+		       "(%d entries)\n",
+	       	       priv->ca->name, ipoib_recvq_size);
+		ipoib_cm_dev_cleanup(dev);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
  int ipoib_cm_dev_init(struct net_device *dev)
  {
  	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	struct ib_srq_init_attr srq_init_attr = {
-		.attr = {
-			.max_wr  = ipoib_recvq_size,
-			.max_sge = IPOIB_CM_RX_SG
-		}
-	};
  	int ret, i;
+	struct ib_device_attr attr;

  	INIT_LIST_HEAD(&priv->cm.passive_ids);
  	INIT_LIST_HEAD(&priv->cm.reap_list);
@@ -1305,20 +1702,33 @@ int ipoib_cm_dev_init(struct net_device

  	skb_queue_head_init(&priv->cm.skb_queue);

-	priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr);
-	if (IS_ERR(priv->cm.srq)) {
-		ret = PTR_ERR(priv->cm.srq);
-		priv->cm.srq = NULL;
+	if (ret = ib_query_device(priv->ca, &attr))
  		return ret;
-	}

-	priv->cm.srq_ring = kzalloc(ipoib_recvq_size * sizeof *priv->cm.srq_ring,
-				    GFP_KERNEL);
-	if (!priv->cm.srq_ring) {
-		printk(KERN_WARNING "%s: failed to allocate CM ring (%d entries)\n",
-		       priv->ca->name, ipoib_recvq_size);
-		ipoib_cm_dev_cleanup(dev);
-		return -ENOMEM;
+	if (attr.max_srq) {
+		/* This device supports SRQ */
+		if (ret = create_srq(dev, priv))
+			return ret;
+		priv->cm.rx_index_table = NULL;
+	} else {
+		priv->cm.srq = NULL;
+		priv->cm.srq_ring = NULL;
+
+		/* Every new REQ that arrives creates a struct ipoib_cm_rx.
+		 * These structures form a link list starting with the
+		 * passive_ids. For quick and easy access we maintain a table
+		 * of pointers to struct ipoib_cm_rx called the rx_index_table
+		 */
+		priv->cm.rx_index_table = kzalloc(NOSRQ_INDEX_TABLE_SIZE *
+					 sizeof *priv->cm.rx_index_table,
+					 GFP_KERNEL);
+		if (!priv->cm.rx_index_table) {
+			printk(KERN_WARNING "Failed to allocate NOSRQ_INDEX_TABLE\n");
+			return -ENOMEM;
+		}
+
+		spin_lock_init(&nosrq_count.lock);
+		nosrq_count.current_rc_qp = 0;
  	}

  	for (i = 0; i < IPOIB_CM_RX_SG; ++i)
@@ -1331,17 +1741,23 @@ int ipoib_cm_dev_init(struct net_device
  	priv->cm.rx_wr.sg_list = priv->cm.rx_sge;
  	priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG;

-	for (i = 0; i < ipoib_recvq_size; ++i) {
-		if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1,
+	/* One can post receive buffers even before the RX QP is created
+	 * only in the SRQ case. Therefore for NOSRQ we skip the rest of init
+	 * and do that in ipoib_cm_req_handler() */
+
+	if (priv->cm.srq) {
+		for (i = 0; i < ipoib_recvq_size; ++i) {
+			if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1,
  					   priv->cm.srq_ring[i].mapping)) {
-			ipoib_warn(priv, "failed to allocate receive buffer %d\n", i);
-			ipoib_cm_dev_cleanup(dev);
-			return -ENOMEM;
-		}
-		if (ipoib_cm_post_receive(dev, i)) {
-			ipoib_warn(priv, "ipoib_ib_post_receive failed for buf %d\n", i);
-			ipoib_cm_dev_cleanup(dev);
-			return -EIO;
+				ipoib_warn(priv, "failed to allocate receive buffer %d\n", i);
+				ipoib_cm_dev_cleanup(dev);
+				return -ENOMEM;
+			}
+			if (post_receive_srq(dev, i)) {
+				ipoib_warn(priv, "post_receive_srq failed for buf %d\n", i);
+				ipoib_cm_dev_cleanup(dev);
+				return -EIO;
+			}
  		}
  	}

--- a/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
2007-05-30 14:56:25.000000000 -0400
+++ b/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
2007-06-11 16:36:59.000000000 -0400
@@ -299,7 +299,7 @@ int ipoib_poll(struct net_device *dev, i
  		for (i = 0; i < n; ++i) {
  			struct ib_wc *wc = priv->ibwc + i;

-			if (wc->wr_id & IPOIB_CM_OP_SRQ) {
+			if (wc->wr_id & IPOIB_CM_OP_RECV) {
  				++done;
  				--max;
  				ipoib_cm_handle_rx_wc(dev, wc);
@@ -557,7 +557,7 @@ void ipoib_drain_cq(struct net_device *d
  	do {
  		n = ib_poll_cq(priv->cq, IPOIB_NUM_WC, priv->ibwc);
  		for (i = 0; i < n; ++i) {
-			if (priv->ibwc[i].wr_id & IPOIB_CM_OP_SRQ)
+			if (priv->ibwc[i].wr_id & IPOIB_CM_OP_RECV)
  				ipoib_cm_handle_rx_wc(dev, priv->ibwc + i);
  			else if (priv->ibwc[i].wr_id & IPOIB_OP_RECV)
  				ipoib_ib_handle_rx_wc(dev, priv->ibwc + i);
--- a/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 
2007-05-30 14:56:25.000000000 -0400
+++ b/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 
2007-06-11 16:36:59.000000000 -0400
@@ -175,6 +175,15 @@ int ipoib_transport_dev_init(struct net_
  	if (!ret)
  		size += ipoib_recvq_size + 1 /* 1 extra for rx_drain_qp */;

+ 	/* We increase the size of the CQ in the NOSRQ case to prevent CQ
+ 	 * overflow. Every new REQ creates a new RX QP and each QP has an
+ 	 * RX ring associated with it. Therefore we could have
+ 	 * NOSRQ_INDEX_TABLE_SIZE*ipoib_recvq_size + ipoib_sendq_size CQEs
+ 	 * in a CQ.
+ 	 */
+ 	if(!priv->cm.srq)
+ 		size += (NOSRQ_INDEX_TABLE_SIZE -1)* ipoib_recvq_size;
+
  	priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, 
size, 0);
  	if (IS_ERR(priv->cq)) {
  		printk(KERN_WARNING "%s: failed to create CQ\n", ca->name);


From pradeeps at linux.vnet.ibm.com  Tue Jun 12 11:10:59 2007
From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana)
Date: Tue, 12 Jun 2007 11:10:59 -0700
Subject: [ofa-general] IPOIB CM (NOSRQ) extension [PATCH V2] patch
Message-ID: <466EE1B3.5040806@linux.vnet.ibm.com>

This patch handles the corner case of running out of RC QPs. In that
case it switches to UD mode. This patch can be used both by NOSRQ and
SRQ code.

Changes from V1;
1. The switch to datagram mode conditionally happens only when there
no resources (QPs) available on the passive side.

This patch has been tested with linux-2.6.22-rc4 derived from Roland's
for-2.6.23 git tree on 06/11 on ppc64 machines


Signed-off-by: Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>
---

--- c/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
2007-06-12 12:35:07.000000000 -0400
+++ b/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
2007-06-12 12:39:47.000000000 -0400
@@ -1378,8 +1378,18 @@ static int ipoib_cm_tx_handler(struct ib
  			ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED,
  				       NULL, 0, NULL, 0);
  		break;
-	case IB_CM_REQ_ERROR:
  	case IB_CM_REJ_RECEIVED:
+		ipoib_warn(priv, "REJ received\n");
+		spin_lock(&priv->lock);
+		neigh = tx->neigh;
+		spin_unlock(&priv->lock);
+		
+		if ((neigh) && (event->param.rej_rcvd.reason ==
+		   IB_CM_REJ_NO_QP)) {
+			clear_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags);
+			break;
+		}
+	case IB_CM_REQ_ERROR:
  	case IB_CM_TIMEWAIT_EXIT:
  		ipoib_dbg(priv, "CM error %d.\n", event->event);
  		spin_lock_irq(&priv->tx_lock);
--- c/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_main.c 
2007-05-30 14:56:25.000000000 -0400
+++ b/linux-2.6.22-rc4/drivers/infiniband/ulp/ipoib/ipoib_main.c 
2007-06-11 21:08:07.000000000 -0400
@@ -679,11 +679,10 @@ static int ipoib_start_xmit(struct sk_bu

  		neigh = *to_ipoib_neigh(skb->dst->neighbour);

-		if (ipoib_cm_get(neigh)) {
-			if (ipoib_cm_up(neigh)) {
+		if (ipoib_cm_get(neigh) &&  ipoib_cm_up(neigh) &&
+			test_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags)) {
  				ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
  				goto out;
-			}
  		} else if (neigh->ah) {
  			if (unlikely(memcmp(&neigh->dgid.raw,
  					    skb->dst->neighbour->ha + 4,


From mst at dev.mellanox.co.il  Tue Jun 12 11:35:21 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 12 Jun 2007 21:35:21 +0300
Subject: [ofa-general] Re: crash in ipoib
In-Reply-To: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>
References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>
Message-ID: <20070612183521.GC10688@mellanox.co.il>


> Quoting Sean Hefty <sean.hefty at intel.com>:
> Subject: crash in ipoib
> 
> Copying ofa general list.
> 
> We've seen a crash similar to this now a total of 4 times. 
> 
> These are x64, 2.6.9-42.EL.  The crashes only seem to occur on a specific set of
> systems in our cluster.
> 
> The latest crash has a similar stack trace as the one listed below.
> 
> badness in 18042_panic_blink drivers/input/serio/18042.c : 992
> 18042_panic_blink + 485
> panic + 445
> apic_timer_interrupt + 133
> oops_end + 38
> oops_end + 65
> do_page_fault + 1204
> ipoib_cm_send + 433
> error_exit
> ipoib_ib_completion + 0
> ipoib_cm_handle_rx_wc + 239
> 
> (the trace goes on and on)

where in source are

ipoib_cm_send + 433

and

ipoib_cm_handle_rx_wc + 239

on your systems?


-- 
MST


From friedman at ucla.edu  Tue Jun 12 12:10:48 2007
From: friedman at ucla.edu (Scott A. Friedman)
Date: Tue, 12 Jun 2007 12:10:48 -0700
Subject: [ofa-general] Re: IB and iWarp HCA in same node
In-Reply-To: <20070612171709.82A46E60849@openfabrics.org>
References: <20070612171709.82A46E60849@openfabrics.org>
Message-ID: <466EEFB8.8010208@ucla.edu>

> Scott A. Friedman wrote:
>> > I have a working IB cluster where I have added a Chelsio iWarp card to 
>> > one node. Another node is connected to that with only an identical iWarp 
>> > card. I cannot seem to get the iWarp cards to come up. They work through 
>> > regular ethernet just fineand the IB stuff still works as well. But, 
>> > when I modprobe iw_cxgb3 and iw_cm utilities like ibstat show the 
>> > following. Which explains why nothing is working.
>> > 
>> > Question is, why? Am I missing or forgetting something? I just want to 
>> > test the two iWarp cards back to back. Not trying to get some kind of 
>> > auto bridging or routing working.
>> > 
>> > # ibstat
>> > iWARP RNIC 'cxgb3_0'
>> >         iWARP RNIC type: cxgb3
>> >         Number of ports: 1
>> >         Firmware version: T 4.0.0
>> >         Hardware version: 1
>> >         Node GUID: 0x0007430506ea0000
>> >         System image GUID: 0x0007430506ea0000
>> >         Port 1:
>> >                 State: Active
>> >                 Physical state: No state change
>> >                 Rate: 20
>> >                 Base lid: 0
>> >                 LMC: 0
>> >                 SM lid: 0
>> >                 Capability mask: 0x009f0000
>> >                 Port GUID: 0x0000000000000000
> 
> This all looks normal.  What application are you trying to run over rdma 
> on the chelsio interface?  rping?
> 

Yes, rping, anything. It turns out that since I posted this the Chelsio 
people explained ibstat's funny output and suggested using their latest 
release of the cxgb3 driver - and that works (without TOE for now, 
separate issue). The main problem was that the driver that ships with 
OFED would give me 'connection rejected' errors when trying to do 
anything (rdma_cm based), my code, sample code, utilities. Replacing the 
driver made the problem go away. Currently, I am using their 1.0.094 
driver w/o TOE and the OFED-1.2-rc3 iWarp stuff (their suggestion) and 
it appears to work fine so far. Going to just wait for rc5 or final to 
test that with their driver as well as that is what we will want to use 
for the rest of our test cluster using IB.


From sean.hefty at intel.com  Tue Jun 12 12:13:37 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 12 Jun 2007 12:13:37 -0700
Subject: [ofa-general] IPOB CM (NOSRQ) [PATCH V6] patch
In-Reply-To: <466EE021.30302@linux.vnet.ibm.com>
Message-ID: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com>

>+module_param_named(max_recieve_buffer, max_recv_buf, int, 0644);
>+MODULE_PARM_DESC(max_recieve_buffer, "Max Recieve Buffer Size in MB");

nit: receive misspelled

>+static int allocate_and_post_rbuf_nosrq(struct ib_cm_id *cm_id,
>+				        struct ipoib_cm_rx *p, unsigned psn)
>+{
>+	struct net_device *dev = cm_id->context;
>+	struct ipoib_dev_priv *priv = netdev_priv(dev);
>+	int ret;
>+	u32 qp_num, index;
>+	u64 i, recv_mem_used;
>+
>+	qp_num = p->qp->qp_num;
>+
>+	/* In the SRQ case there is a common rx buffer called the srq_ring.
>+	 * However, for the NOSRQ we create an rx_ring for every
>+	 * struct ipoib_cm_rx.
>+	 */
>+	p->rx_ring = kzalloc(ipoib_recvq_size * sizeof *p->rx_ring, GFP_KERNEL);
>+	if (!p->rx_ring) {
>+		printk(KERN_WARNING "Failed to allocate rx_ring for 0x%x\n",
>+		       qp_num);
>+		return -ENOMEM;
>+	}
>+
>+	init_context_and_add_list(cm_id, p, priv);
>+	spin_lock_irq(&priv->lock);
>+
>+	for (index = 0; index < max_rc_qp; index++)
>+		if (priv->cm.rx_index_table[index] == NULL)
>+			break;
>+
>+	spin_lock(&nosrq_count.lock);
>+	recv_mem_used = (u64)ipoib_recvq_size * (u64)nosrq_count.current_rc_qp
>+			* CM_PACKET_SIZE; /* packets are 64K */
>+	spin_unlock(&nosrq_count.lock);

Is a spin lock needed here?  Could you make current_rc_qp an atomic?

>+err_send_rej:
>+err_modify_nosrq:
>+err_alloc_and_post:

Maybe just use a single label?

>@@ -316,9 +488,18 @@ static int ipoib_cm_req_handler(struct i
>  	}
>
>  	psn = random32() & 0xffffff;
>-	ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
>-	if (ret)
>-		goto err_modify;
>+	if (!priv->cm.srq) {
>+		spin_lock(&nosrq_count.lock);
>+		nosrq_count.current_rc_qp++;
>+		spin_unlock(&nosrq_count.lock);
>+		if (ret = allocate_and_post_rbuf_nosrq(cm_id, p, psn))

Use double parens around assignment: if ((ret = ..))

>+			goto err_post_nosrq;
>+	} else {
>+		p->rx_ring = NULL;
>+		ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
>+		if (ret)
>+			goto err_modify;
>+	}
>
>  	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
>  	if (ret) {
>@@ -326,18 +507,18 @@ static int ipoib_cm_req_handler(struct i
>  		goto err_rep;
>  	}
>
>-	cm_id->context = p;
>-	p->jiffies = jiffies;
>-	p->state = IPOIB_CM_RX_LIVE;
>-	spin_lock_irq(&priv->lock);
>-	if (list_empty(&priv->cm.passive_ids))
>-		queue_delayed_work(ipoib_workqueue,
>-				   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
>-	list_add(&p->list, &priv->cm.passive_ids);
>-	spin_unlock_irq(&priv->lock);
>+	if (priv->cm.srq) {
>+		init_context_and_add_list(cm_id, p, priv);
>+		p->state = IPOIB_CM_RX_LIVE;

The order between setting p->state and adding the item to the list changes here.
I don't know if this matters, but it's now possible for the work queue to
execute before p->state is set.

>+	}
>  	return 0;
>
>  err_rep:
>+err_post_nosrq:
>+	list_del_init(&p->list);

Is this correct?  Is p->list on any list at this point?

>+	spin_lock(&nosrq_count.lock);
>+	nosrq_count.current_rc_qp--;
>+	spin_unlock(&nosrq_count.lock);
>  err_modify:
>  	ib_destroy_qp(p->qp);
>  err_qp:
>@@ -401,21 +582,51 @@ static void skb_put_frags(struct sk_buff
>  	}
>  }
>
>-void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
>+static void timer_check_srq(struct ipoib_dev_priv *priv, struct
>ipoib_cm_rx *p)
>+{
>+	unsigned long flags;
>+
>+	if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) {
>+		spin_lock_irqsave(&priv->lock, flags);
>+		p->jiffies = jiffies;
>+		/* Move this entry to list head, but do
>+		 * not re-add it if it has been removed. */

nit: There are several places in the patch where the commenting style needs
updating.

>+		if (p->state == IPOIB_CM_RX_LIVE)
>+			list_move(&p->list, &priv->cm.passive_ids);
>+		spin_unlock_irqrestore(&priv->lock, flags);
>+	}
>+}
>+
>+static void timer_check_nosrq(struct ipoib_dev_priv *priv, struct
>ipoib_cm_rx *p)
>+{
>+	unsigned long flags;
>+
>+	if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) {
>+		spin_lock_irqsave(&priv->lock, flags);
>+		p->jiffies = jiffies;
>+		/* Move this entry to list head, but do
>+		 * not re-add it if it has been removed. */
>+		if (!list_empty(&p->list))

This line is the only difference between this function and the previous one.  Is
it possible to always use the state check?

>+			list_move(&p->list, &priv->cm.passive_ids);
>+		spin_unlock_irqrestore(&priv->lock, flags);
>+	}
>+}


>+static void handle_rx_wc_nosrq(struct net_device *dev, struct ib_wc *wc)
>+{
>+	struct ipoib_dev_priv *priv = netdev_priv(dev);
>+	struct sk_buff *skb, *newskb;
>+	u64 mapping[IPOIB_CM_RX_SG], wr_id = wc->wr_id >> 32;
>+	u32 index;
>+	struct ipoib_cm_rx *p, *rx_ptr;
>+	int frags, ret;
>+
>+
>+	ipoib_dbg_data(priv, "cm recv completion: id %d, status: %d\n",
>+		       wr_id, wc->status);
>+
>+	if (unlikely(wr_id >= ipoib_recvq_size)) {
>+		ipoib_warn(priv, "cm recv completion event with wrid %d (>
%d)\n",
>+				   wr_id, ipoib_recvq_size);
>+		return;
>+	}
>+
>+	index = (wc->wr_id & ~IPOIB_CM_OP_RECV) & NOSRQ_INDEX_MASK ;
>+
>+	/* This is the only place where rx_ptr could be a NULL - could
>+	 * have just received a packet from a connection that has become
>+	 * stale and so is going away. We will simply drop the packet and
>+	 * let the hardware (it s IB_QPT_RC) handle the dropped packet.
>+	 * In the timer_check() function below, p->jiffies is updated and
>+	 * hence the connection will not be stale after that.
>+	 */
>+	rx_ptr = priv->cm.rx_index_table[index];

Is synchronization needed here?

>+	if (unlikely(!rx_ptr)) {
>+		ipoib_warn(priv, "Received packet from a connection "
>+		           "that is going away. Hardware will handle it.\n");
>+		return;
>+	}
>+
>+	skb = rx_ptr->rx_ring[wr_id].skb;
>+
>+	if (unlikely(wc->status != IB_WC_SUCCESS)) {
>+		ipoib_dbg(priv, "cm recv error "
>+			   "(status=%d, wrid=%ld vend_err %x)\n",
>+			   wc->status, wr_id, wc->vendor_err);
>+		++priv->stats.rx_dropped;
>+		goto repost_nosrq;
>+	}
>+
>+	if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) {
>+		/* There are no guarantees that wc->qp is not NULL for HCAs
>+	 	* that do not support SRQ. */
>+		p = rx_ptr;
>+		timer_check_nosrq(priv, p);

This appears to be the only place 'p' is used in this call.  I think we can just
remove it.

>+	}
>+
>+	frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len,
>+					      (unsigned)IPOIB_CM_HEAD_SIZE)) /
PAGE_SIZE;
>+
>+	newskb = ipoib_cm_alloc_rx_skb(dev, wr_id << 32 | index, frags,
>+				       mapping);
>+	if (unlikely(!newskb)) {
>+		/*
>+		 * If we can't allocate a new RX buffer, dump
>+		 * this packet and reuse the old buffer.
>+		 */
>+		ipoib_dbg(priv, "failed to allocate receive buffer %ld\n",
wr_id);
>  		++priv->stats.rx_dropped;
>-		goto repost;
>+		goto repost_nosrq;
>  	}
>
>-	ipoib_cm_dma_unmap_rx(priv, frags, priv->cm.srq_ring[wr_id].mapping);
>-	memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, (frags + 1) * sizeof
>*mapping);
>+	ipoib_cm_dma_unmap_rx(priv, frags,
>+	                      rx_ptr->rx_ring[wr_id].mapping);
>+	memcpy(rx_ptr->rx_ring[wr_id].mapping, mapping,
>+	       (frags + 1) * sizeof *mapping);
>
>  	ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n",
>  		       wc->byte_len, wc->slid);
>@@ -485,10 +788,22 @@ void ipoib_cm_handle_rx_wc(struct net_de
>  	skb->pkt_type = PACKET_HOST;
>  	netif_receive_skb(skb);
>
>-repost:
>-	if (unlikely(ipoib_cm_post_receive(dev, wr_id)))
>-		ipoib_warn(priv, "ipoib_cm_post_receive failed "
>-			   "for buf %d\n", wr_id);
>+repost_nosrq:
>+	ret = post_receive_nosrq(dev, wr_id << 32 | index);
>+
>+	if (unlikely(ret))
>+		ipoib_warn(priv, "post_receive_nosrq failed for buf %ld\n",
>+		           wr_id);
>+}
>+
>+void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
>+{
>+	struct ipoib_dev_priv *priv = netdev_priv(dev);
>+
>+	if (priv->cm.srq)
>+		handle_rx_wc_srq(dev, wc);
>+	else
>+		handle_rx_wc_nosrq(dev, wc);
>  }
>
>  static inline int post_send(struct ipoib_dev_priv *priv,
>@@ -680,6 +995,44 @@ err_cm:
>  	return ret;
>  }
>
>+static void free_resources_nosrq(struct ipoib_dev_priv *priv, struct
>ipoib_cm_rx *p)
>+{
>+	int i;
>+
>+	for(i = 0; i < ipoib_recvq_size; ++i)
>+		if(p->rx_ring[i].skb) {
>+			ipoib_cm_dma_unmap_rx(priv,
>+				         IPOIB_CM_RX_SG - 1,
>+					 p->rx_ring[i].mapping);
>+			dev_kfree_skb_any(p->rx_ring[i].skb);
>+			p->rx_ring[i].skb = NULL;
>+		}
>+	kfree(p->rx_ring);
>+}
>+
>+void dev_stop_nosrq(struct ipoib_dev_priv *priv)
>+{
>+	struct ipoib_cm_rx *p;
>+
>+	spin_lock_irq(&priv->lock);
>+	while (!list_empty(&priv->cm.passive_ids)) {
>+		p = list_entry(priv->cm.passive_ids.next, typeof(*p), list);
>+		free_resources_nosrq(priv, p);
>+		list_del_init(&p->list);

just list_del should work here

>+		spin_unlock_irq(&priv->lock);
>+		ib_destroy_cm_id(p->id);
>+		ib_destroy_qp(p->qp);
>+		spin_lock(&nosrq_count.lock);
>+		nosrq_count.current_rc_qp--;
>+		spin_unlock(&nosrq_count.lock);
>+		kfree(p);
>+		spin_lock_irq(&priv->lock);
>+	}
>+	spin_unlock_irq(&priv->lock);
>+
>+	cancel_delayed_work(&priv->cm.stale_task);
>+}
>+
>  void ipoib_cm_dev_stop(struct net_device *dev)
>  {
>  	struct ipoib_dev_priv *priv = netdev_priv(dev);
>@@ -694,6 +1047,11 @@ void ipoib_cm_dev_stop(struct net_device
>  	ib_destroy_cm_id(priv->cm.id);
>  	priv->cm.id = NULL;
>
>+	if (!priv->cm.srq) {
>+		dev_stop_nosrq(priv);
>+		return;
>+	}
>+
>  	spin_lock_irq(&priv->lock);
>  	while (!list_empty(&priv->cm.passive_ids)) {
>  		p = list_entry(priv->cm.passive_ids.next, typeof(*p), list);
>@@ -739,6 +1097,7 @@ void ipoib_cm_dev_stop(struct net_device
>  		kfree(p);
>  	}
>
>+
>  	cancel_delayed_work(&priv->cm.stale_task);
>  }
>
>@@ -817,7 +1176,9 @@ static struct ib_qp *ipoib_cm_create_tx_
>  	attr.recv_cq = priv->cq;
>  	attr.srq = priv->cm.srq;
>  	attr.cap.max_send_wr = ipoib_sendq_size;
>+	attr.cap.max_recv_wr = 1;
>  	attr.cap.max_send_sge = 1;
>+	attr.cap.max_recv_sge = 1;
>  	attr.sq_sig_type = IB_SIGNAL_ALL_WR;
>  	attr.qp_type = IB_QPT_RC;
>  	attr.send_cq = cq;
>@@ -857,7 +1218,7 @@ static int ipoib_cm_send_req(struct net_
>  	req.retry_count 	      = 0; /* RFC draft warns against retries */
>  	req.rnr_retry_count 	      = 0; /* RFC draft warns against retries */
>  	req.max_cm_retries 	      = 15;
>-	req.srq 	              = 1;
>+	req.srq			      = !!priv->cm.srq;
>  	return ib_send_cm_req(id, &req);
>  }
>
>@@ -1202,6 +1563,11 @@ static void ipoib_cm_rx_reap(struct work
>  	list_for_each_entry_safe(p, n, &list, list) {
>  		ib_destroy_cm_id(p->id);
>  		ib_destroy_qp(p->qp);
>+		if (!priv->cm.srq) {
>+			spin_lock(&nosrq_count.lock);
>+			nosrq_count.current_rc_qp--;
>+			spin_unlock(&nosrq_count.lock);
>+		}
>  		kfree(p);
>  	}
>  }
>@@ -1220,12 +1586,19 @@ static void ipoib_cm_stale_task(struct w
>  		p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list);
>  		if (time_before_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT))
>  			break;
>-		list_move(&p->list, &priv->cm.rx_error_list);
>-		p->state = IPOIB_CM_RX_ERROR;
>-		spin_unlock_irq(&priv->lock);
>-		ret = ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE);
>-		if (ret)
>-			ipoib_warn(priv, "unable to move qp to error state:
%d\n",
>ret);
>+		if (!priv->cm.srq) {
>+			free_resources_nosrq(priv, p);
>+			list_del_init(&p->list);
>+			priv->cm.rx_index_table[p->index] = NULL;
>+			spin_unlock_irq(&priv->lock);
>+		} else {
>+			list_move(&p->list, &priv->cm.rx_error_list);
>+			p->state = IPOIB_CM_RX_ERROR;
>+			spin_unlock_irq(&priv->lock);
>+			ret = ib_modify_qp(p->qp, &ipoib_cm_err_attr,
IB_QP_STATE);
>+			if (ret)
>+				ipoib_warn(priv, "unable to move qp to error
state:
>%d\n", ret);
>+		}
>  		spin_lock_irq(&priv->lock);
>  	}
>
>@@ -1279,16 +1652,40 @@ int ipoib_cm_add_mode_attr(struct net_de
>  	return device_create_file(&dev->dev, &dev_attr_mode);
>  }
>
>+static int create_srq(struct net_device *dev, struct ipoib_dev_priv *priv)
>+{
>+	struct ib_srq_init_attr srq_init_attr;
>+	int ret;
>+
>+	srq_init_attr.attr.max_wr = ipoib_recvq_size;
>+	srq_init_attr.attr.max_sge = IPOIB_CM_RX_SG;
>+
>+	priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr);
>+	if (IS_ERR(priv->cm.srq)) {
>+		ret = PTR_ERR(priv->cm.srq);
>+		priv->cm.srq = NULL;
>+		return ret;

nit: you can just return PTR_ERR here, and remove the ret stack variable

>+	}
>+
>+	priv->cm.srq_ring = kzalloc(ipoib_recvq_size *
>+		                    sizeof *priv->cm.srq_ring,
>+			            GFP_KERNEL);
>+	if (!priv->cm.srq_ring) {
>+		printk(KERN_WARNING "%s: failed to allocate CM ring "
>+		       "(%d entries)\n",
>+	       	       priv->ca->name, ipoib_recvq_size);
>+		ipoib_cm_dev_cleanup(dev);
>+		return -ENOMEM;
>+	}
>+
>+	return 0;
>+}
>+
>  int ipoib_cm_dev_init(struct net_device *dev)
>  {
>  	struct ipoib_dev_priv *priv = netdev_priv(dev);
>-	struct ib_srq_init_attr srq_init_attr = {
>-		.attr = {
>-			.max_wr  = ipoib_recvq_size,
>-			.max_sge = IPOIB_CM_RX_SG
>-		}
>-	};
>  	int ret, i;
>+	struct ib_device_attr attr;
>
>  	INIT_LIST_HEAD(&priv->cm.passive_ids);
>  	INIT_LIST_HEAD(&priv->cm.reap_list);
>@@ -1305,20 +1702,33 @@ int ipoib_cm_dev_init(struct net_device
>
>  	skb_queue_head_init(&priv->cm.skb_queue);
>
>-	priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr);
>-	if (IS_ERR(priv->cm.srq)) {
>-		ret = PTR_ERR(priv->cm.srq);
>-		priv->cm.srq = NULL;
>+	if (ret = ib_query_device(priv->ca, &attr))
>  		return ret;

double parens around assignment - also below

>-	}
>
>-	priv->cm.srq_ring = kzalloc(ipoib_recvq_size * sizeof
*priv->cm.srq_ring,
>-				    GFP_KERNEL);
>-	if (!priv->cm.srq_ring) {
>-		printk(KERN_WARNING "%s: failed to allocate CM ring (%d
>entries)\n",
>-		       priv->ca->name, ipoib_recvq_size);
>-		ipoib_cm_dev_cleanup(dev);
>-		return -ENOMEM;
>+	if (attr.max_srq) {
>+		/* This device supports SRQ */
>+		if (ret = create_srq(dev, priv))
>+			return ret;
>+		priv->cm.rx_index_table = NULL;
>+	} else {
>+		priv->cm.srq = NULL;
>+		priv->cm.srq_ring = NULL;
>+
>+		/* Every new REQ that arrives creates a struct ipoib_cm_rx.
>+		 * These structures form a link list starting with the
>+		 * passive_ids. For quick and easy access we maintain a table
>+		 * of pointers to struct ipoib_cm_rx called the rx_index_table
>+		 */

Why store the structures in a linked list if they're stored in a table?

>+		priv->cm.rx_index_table = kzalloc(NOSRQ_INDEX_TABLE_SIZE *
>+					 sizeof *priv->cm.rx_index_table,
>+					 GFP_KERNEL);
>+		if (!priv->cm.rx_index_table) {
>+			printk(KERN_WARNING "Failed to allocate
>NOSRQ_INDEX_TABLE\n");
>+			return -ENOMEM;
>+		}
>+
>+		spin_lock_init(&nosrq_count.lock);
>+		nosrq_count.current_rc_qp = 0;
>  	}
>
>  	for (i = 0; i < IPOIB_CM_RX_SG; ++i)
>@@ -1331,17 +1741,23 @@ int ipoib_cm_dev_init(struct net_device
>  	priv->cm.rx_wr.sg_list = priv->cm.rx_sge;
>  	priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG;
>
>-	for (i = 0; i < ipoib_recvq_size; ++i) {
>-		if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1,
>+	/* One can post receive buffers even before the RX QP is created
>+	 * only in the SRQ case. Therefore for NOSRQ we skip the rest of init
>+	 * and do that in ipoib_cm_req_handler() */

This is separate from this patch, but why not wait to post receives to a SRQ
only after we've received a REQ?  Would this simplify the code any?

- Sean


From rowland at cse.ohio-state.edu  Tue Jun 12 14:52:43 2007
From: rowland at cse.ohio-state.edu (Shaun Rowland)
Date: Tue, 12 Jun 2007 17:52:43 -0400
Subject: [ofa-general] Re: [ewg] New OMPI / MPI_READ release notes patch
In-Reply-To: <E4C1D7DA-761A-49FB-BFCF-C43044A35088@cisco.com>
References: <F4B4CFF9-3A0C-4C75-90DD-0566F174599D@cisco.com>	<6C2C79E72C305246B504CBA17B5500C9015635E9@mtlexch01.mtl.com>
	<E4C1D7DA-761A-49FB-BFCF-C43044A35088@cisco.com>
Message-ID: <466F15AB.5060406@cse.ohio-state.edu>

Jeff Squyres wrote:
> Note that git still shows the following in the ofed_1_2 branch:
> 
> Example1: Running the OSU bandwidth:
> 
> !!! SOMEONE PLEASE CHECK THESE DIRECTORIES AND EXECUTABLE NAMES
>     > cd /usr/mpi/gcc/openmpi-1.2.2-1/tests/osu_benchmarks-2.2
>     > mpirun -np <N> -hostfile <HOSTFILE> osu_bw
> 
> Example2: Running the Intel MPI Benchmark benchmarks:
> 
> !!! SOMEONE PLEASE CHECK THESE DIRECTORIES AND EXECUTABLE NAMES
>     > cd /usr/mpi/gcc/openmpi-1.2.2-1/tests/IMB-2.3
>     > mpirun -np <N> -hostfile <HOSTFILE> IMB-MPI1
> 
> Example3: Running the Presta benchmarks:
> 
> !!! SOMEONE PLEASE CHECK THESE DIRECTORIES AND EXECUTABLE NAMES
>     > cd /usr/mpi/gcc/openmpi-1.2.2-1/tests/presta-1.4.0
>     > mpirun -np <N> -hostfile <HOSTFILE> com -o 100

The above information is correct for a standard gcc build. I didn't see
that this was answered, but I could have missed that.
-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


From rowland at cse.ohio-state.edu  Tue Jun 12 15:03:56 2007
From: rowland at cse.ohio-state.edu (Shaun Rowland)
Date: Tue, 12 Jun 2007 18:03:56 -0400
Subject: [ofa-general] New OMPI / MPI_READ release notes patch
In-Reply-To: <F4B4CFF9-3A0C-4C75-90DD-0566F174599D@cisco.com>
References: <F4B4CFF9-3A0C-4C75-90DD-0566F174599D@cisco.com>
Message-ID: <466F184C.70606@cse.ohio-state.edu>

Jeff Squyres wrote:
> Tziporet --
> 
> Here's a new patch for the OMPI release notes based on your current 
> git.  It includes updated information for Open MPI and text about 
> mpi-selector.
> 
> Note that there are a few areas in MPI_README that I need OSU and 
> Mellanox to proofread.  It would also be nice if someone else could 
> eyeball the mpi-selector text and ensure it makes sense to a naive reader.

I took a look at the documentation in your patch quickly. I think it
should be clear how this works. Also, I saw that the links to download
were in the current MPI_README.txt, so that should be good.
-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


From mshefty at ichips.intel.com  Tue Jun 12 15:41:30 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Tue, 12 Jun 2007 15:41:30 -0700
Subject: [ofa-general] Re: crash in ipoib
In-Reply-To: <20070612183521.GC10688@mellanox.co.il>
References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>
	<20070612183521.GC10688@mellanox.co.il>
Message-ID: <466F211A.1000005@ichips.intel.com>

They're around

> ipoib_cm_send + 433

if (unlikely(post_send(priv, tx, tx->tx_head...

> ipoib_cm_handle_rx_wc + 239

skb = priv->cm.srq_ring[wr_id].skb
  or
if (unlikely(wc->status != IB_WC_SUCCESS)) {

(This one isn't matching up quite right, but appears to be in this area.)

In my earlier trace, I saw ipoib_cm_handle_rx_wc + 378,  which is around:

if (p->state == IPOIB_CM_RX_LIVE)
->	list_move(&p->list, &priv->cm.passive_ids);

- Sean


From pradeeps at linux.vnet.ibm.com  Tue Jun 12 17:14:00 2007
From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana)
Date: Tue, 12 Jun 2007 17:14:00 -0700
Subject: [ofa-general] IPOB CM (NOSRQ) [PATCH V6] patch
In-Reply-To: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com>
References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com>
Message-ID: <466F36C8.5010507@linux.vnet.ibm.com>

Sean, Thanks for looking through this. My responses below.

Pradeep

Sean Hefty wrote:
>> +module_param_named(max_recieve_buffer, max_recv_buf, int, 0644);
>> +MODULE_PARM_DESC(max_recieve_buffer, "Max Recieve Buffer Size in MB");
> 
> nit: receive misspelled

you are correct.

> 
>> +static int allocate_and_post_rbuf_nosrq(struct ib_cm_id *cm_id,
>> +				        struct ipoib_cm_rx *p, unsigned psn)
>> +{
>> +	struct net_device *dev = cm_id->context;
>> +	struct ipoib_dev_priv *priv = netdev_priv(dev);
>> +	int ret;
>> +	u32 qp_num, index;
>> +	u64 i, recv_mem_used;
>> +
>> +	qp_num = p->qp->qp_num;
>> +
>> +	/* In the SRQ case there is a common rx buffer called the srq_ring.
>> +	 * However, for the NOSRQ we create an rx_ring for every
>> +	 * struct ipoib_cm_rx.
>> +	 */
>> +	p->rx_ring = kzalloc(ipoib_recvq_size * sizeof *p->rx_ring, GFP_KERNEL);
>> +	if (!p->rx_ring) {
>> +		printk(KERN_WARNING "Failed to allocate rx_ring for 0x%x\n",
>> +		       qp_num);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	init_context_and_add_list(cm_id, p, priv);
>> +	spin_lock_irq(&priv->lock);
>> +
>> +	for (index = 0; index < max_rc_qp; index++)
>> +		if (priv->cm.rx_index_table[index] == NULL)
>> +			break;
>> +
>> +	spin_lock(&nosrq_count.lock);
>> +	recv_mem_used = (u64)ipoib_recvq_size * (u64)nosrq_count.current_rc_qp
>> +			* CM_PACKET_SIZE; /* packets are 64K */
>> +	spin_unlock(&nosrq_count.lock);
> 
> Is a spin lock needed here?  Could you make current_rc_qp an atomic?

This function is called only when a REQ is received. Otherwise
current_rc_qp is only used in the error case, or when the connection
is being torn down. Hence I don't think it makes a significant
difference which one is used.

> 
>> +err_send_rej:
>> +err_modify_nosrq:
>> +err_alloc_and_post:
> 
> Maybe just use a single label?

Yes, that is doable

> 
>> @@ -316,9 +488,18 @@ static int ipoib_cm_req_handler(struct i
>>  	}
>>
>>  	psn = random32() & 0xffffff;
>> -	ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
>> -	if (ret)
>> -		goto err_modify;
>> +	if (!priv->cm.srq) {
>> +		spin_lock(&nosrq_count.lock);
>> +		nosrq_count.current_rc_qp++;
>> +		spin_unlock(&nosrq_count.lock);
>> +		if (ret = allocate_and_post_rbuf_nosrq(cm_id, p, psn))
> 
> Use double parens around assignment: if ((ret = ..))

okay

> 
>> +			goto err_post_nosrq;
>> +	} else {
>> +		p->rx_ring = NULL;
>> +		ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
>> +		if (ret)
>> +			goto err_modify;
>> +	}
>>
>>  	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
>>  	if (ret) {
>> @@ -326,18 +507,18 @@ static int ipoib_cm_req_handler(struct i
>>  		goto err_rep;
>>  	}
>>
>> -	cm_id->context = p;
>> -	p->jiffies = jiffies;
>> -	p->state = IPOIB_CM_RX_LIVE;
>> -	spin_lock_irq(&priv->lock);
>> -	if (list_empty(&priv->cm.passive_ids))
>> -		queue_delayed_work(ipoib_workqueue,
>> -				   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
>> -	list_add(&p->list, &priv->cm.passive_ids);
>> -	spin_unlock_irq(&priv->lock);
>> +	if (priv->cm.srq) {
>> +		init_context_and_add_list(cm_id, p, priv);
>> +		p->state = IPOIB_CM_RX_LIVE;
> 
> The order between setting p->state and adding the item to the list changes here.
> I don't know if this matters, but it's now possible for the work queue to
> execute before p->state is set.

You are correct. I need to set p->state first and then call
init_context_and add_list().
> 
>> +	}
>>  	return 0;
>>
>>  err_rep:
>> +err_post_nosrq:
>> +	list_del_init(&p->list);
> 
> Is this correct?  Is p->list on any list at this point?
> 
>> +	spin_lock(&nosrq_count.lock);
>> +	nosrq_count.current_rc_qp--;
>> +	spin_unlock(&nosrq_count.lock);
>>  err_modify:
>>  	ib_destroy_qp(p->qp);
>>  err_qp:
>> @@ -401,21 +582,51 @@ static void skb_put_frags(struct sk_buff
>>  	}
>>  }
>>
>> -void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
>> +static void timer_check_srq(struct ipoib_dev_priv *priv, struct
>> ipoib_cm_rx *p)
>> +{
>> +	unsigned long flags;
>> +
>> +	if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) {
>> +		spin_lock_irqsave(&priv->lock, flags);
>> +		p->jiffies = jiffies;
>> +		/* Move this entry to list head, but do
>> +		 * not re-add it if it has been removed. */
> 
> nit: There are several places in the patch where the commenting style needs
> updating.

Move the closing "*/" to the next line?

> 
>> +		if (p->state == IPOIB_CM_RX_LIVE)
>> +			list_move(&p->list, &priv->cm.passive_ids);
>> +		spin_unlock_irqrestore(&priv->lock, flags);
>> +	}
>> +}
>> +
>> +static void timer_check_nosrq(struct ipoib_dev_priv *priv, struct
>> ipoib_cm_rx *p)
>> +{
>> +	unsigned long flags;
>> +
>> +	if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) {
>> +		spin_lock_irqsave(&priv->lock, flags);
>> +		p->jiffies = jiffies;
>> +		/* Move this entry to list head, but do
>> +		 * not re-add it if it has been removed. */
>> +		if (!list_empty(&p->list))
> 
> This line is the only difference between this function and the previous one.  Is
> it possible to always use the state check?

The state check is only used in the SRQ case.

> 
>> +			list_move(&p->list, &priv->cm.passive_ids);
>> +		spin_unlock_irqrestore(&priv->lock, flags);
>> +	}
>> +}
> 
> 
>> +static void handle_rx_wc_nosrq(struct net_device *dev, struct ib_wc *wc)
>> +{
>> +	struct ipoib_dev_priv *priv = netdev_priv(dev);
>> +	struct sk_buff *skb, *newskb;
>> +	u64 mapping[IPOIB_CM_RX_SG], wr_id = wc->wr_id >> 32;
>> +	u32 index;
>> +	struct ipoib_cm_rx *p, *rx_ptr;
>> +	int frags, ret;
>> +
>> +
>> +	ipoib_dbg_data(priv, "cm recv completion: id %d, status: %d\n",
>> +		       wr_id, wc->status);
>> +
>> +	if (unlikely(wr_id >= ipoib_recvq_size)) {
>> +		ipoib_warn(priv, "cm recv completion event with wrid %d (>
> %d)\n",
>> +				   wr_id, ipoib_recvq_size);
>> +		return;
>> +	}
>> +
>> +	index = (wc->wr_id & ~IPOIB_CM_OP_RECV) & NOSRQ_INDEX_MASK ;
>> +
>> +	/* This is the only place where rx_ptr could be a NULL - could
>> +	 * have just received a packet from a connection that has become
>> +	 * stale and so is going away. We will simply drop the packet and
>> +	 * let the hardware (it s IB_QPT_RC) handle the dropped packet.
>> +	 * In the timer_check() function below, p->jiffies is updated and
>> +	 * hence the connection will not be stale after that.
>> +	 */
>> +	rx_ptr = priv->cm.rx_index_table[index];
> 
> Is synchronization needed here?

No locking required

> 
>> +	if (unlikely(!rx_ptr)) {
>> +		ipoib_warn(priv, "Received packet from a connection "
>> +		           "that is going away. Hardware will handle it.\n");
>> +		return;
>> +	}
>> +
>> +	skb = rx_ptr->rx_ring[wr_id].skb;
>> +
>> +	if (unlikely(wc->status != IB_WC_SUCCESS)) {
>> +		ipoib_dbg(priv, "cm recv error "
>> +			   "(status=%d, wrid=%ld vend_err %x)\n",
>> +			   wc->status, wr_id, wc->vendor_err);
>> +		++priv->stats.rx_dropped;
>> +		goto repost_nosrq;
>> +	}
>> +
>> +	if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) {
>> +		/* There are no guarantees that wc->qp is not NULL for HCAs
>> +	 	* that do not support SRQ. */
>> +		p = rx_ptr;
>> +		timer_check_nosrq(priv, p);
> 
> This appears to be the only place 'p' is used in this call.  I think we can just
> remove it.

correct.

> 
>> +	}
>> +
>> +	frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len,
>> +					      (unsigned)IPOIB_CM_HEAD_SIZE)) /
> PAGE_SIZE;
>> +
>> +	newskb = ipoib_cm_alloc_rx_skb(dev, wr_id << 32 | index, frags,
>> +				       mapping);
>> +	if (unlikely(!newskb)) {
>> +		/*
>> +		 * If we can't allocate a new RX buffer, dump
>> +		 * this packet and reuse the old buffer.
>> +		 */
>> +		ipoib_dbg(priv, "failed to allocate receive buffer %ld\n",
> wr_id);
>>  		++priv->stats.rx_dropped;
>> -		goto repost;
>> +		goto repost_nosrq;
>>  	}
>>
>> -	ipoib_cm_dma_unmap_rx(priv, frags, priv->cm.srq_ring[wr_id].mapping);
>> -	memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, (frags + 1) * sizeof
>> *mapping);
>> +	ipoib_cm_dma_unmap_rx(priv, frags,
>> +	                      rx_ptr->rx_ring[wr_id].mapping);
>> +	memcpy(rx_ptr->rx_ring[wr_id].mapping, mapping,
>> +	       (frags + 1) * sizeof *mapping);
>>
>>  	ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n",
>>  		       wc->byte_len, wc->slid);
>> @@ -485,10 +788,22 @@ void ipoib_cm_handle_rx_wc(struct net_de
>>  	skb->pkt_type = PACKET_HOST;
>>  	netif_receive_skb(skb);
>>
>> -repost:
>> -	if (unlikely(ipoib_cm_post_receive(dev, wr_id)))
>> -		ipoib_warn(priv, "ipoib_cm_post_receive failed "
>> -			   "for buf %d\n", wr_id);
>> +repost_nosrq:
>> +	ret = post_receive_nosrq(dev, wr_id << 32 | index);
>> +
>> +	if (unlikely(ret))
>> +		ipoib_warn(priv, "post_receive_nosrq failed for buf %ld\n",
>> +		           wr_id);
>> +}
>> +
>> +void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
>> +{
>> +	struct ipoib_dev_priv *priv = netdev_priv(dev);
>> +
>> +	if (priv->cm.srq)
>> +		handle_rx_wc_srq(dev, wc);
>> +	else
>> +		handle_rx_wc_nosrq(dev, wc);
>>  }
>>
>>  static inline int post_send(struct ipoib_dev_priv *priv,
>> @@ -680,6 +995,44 @@ err_cm:
>>  	return ret;
>>  }
>>
>> +static void free_resources_nosrq(struct ipoib_dev_priv *priv, struct
>> ipoib_cm_rx *p)
>> +{
>> +	int i;
>> +
>> +	for(i = 0; i < ipoib_recvq_size; ++i)
>> +		if(p->rx_ring[i].skb) {
>> +			ipoib_cm_dma_unmap_rx(priv,
>> +				         IPOIB_CM_RX_SG - 1,
>> +					 p->rx_ring[i].mapping);
>> +			dev_kfree_skb_any(p->rx_ring[i].skb);
>> +			p->rx_ring[i].skb = NULL;
>> +		}
>> +	kfree(p->rx_ring);
>> +}
>> +
>> +void dev_stop_nosrq(struct ipoib_dev_priv *priv)
>> +{
>> +	struct ipoib_cm_rx *p;
>> +
>> +	spin_lock_irq(&priv->lock);
>> +	while (!list_empty(&priv->cm.passive_ids)) {
>> +		p = list_entry(priv->cm.passive_ids.next, typeof(*p), list);
>> +		free_resources_nosrq(priv, p);
>> +		list_del_init(&p->list);
> 
> just list_del should work here
> 
>> +		spin_unlock_irq(&priv->lock);
>> +		ib_destroy_cm_id(p->id);
>> +		ib_destroy_qp(p->qp);
>> +		spin_lock(&nosrq_count.lock);
>> +		nosrq_count.current_rc_qp--;
>> +		spin_unlock(&nosrq_count.lock);
>> +		kfree(p);
>> +		spin_lock_irq(&priv->lock);
>> +	}
>> +	spin_unlock_irq(&priv->lock);
>> +
>> +	cancel_delayed_work(&priv->cm.stale_task);
>> +}
>> +
>>  void ipoib_cm_dev_stop(struct net_device *dev)
>>  {
>>  	struct ipoib_dev_priv *priv = netdev_priv(dev);
>> @@ -694,6 +1047,11 @@ void ipoib_cm_dev_stop(struct net_device
>>  	ib_destroy_cm_id(priv->cm.id);
>>  	priv->cm.id = NULL;
>>
>> +	if (!priv->cm.srq) {
>> +		dev_stop_nosrq(priv);
>> +		return;
>> +	}
>> +
>>  	spin_lock_irq(&priv->lock);
>>  	while (!list_empty(&priv->cm.passive_ids)) {
>>  		p = list_entry(priv->cm.passive_ids.next, typeof(*p), list);
>> @@ -739,6 +1097,7 @@ void ipoib_cm_dev_stop(struct net_device
>>  		kfree(p);
>>  	}
>>
>> +
>>  	cancel_delayed_work(&priv->cm.stale_task);
>>  }
>>
>> @@ -817,7 +1176,9 @@ static struct ib_qp *ipoib_cm_create_tx_
>>  	attr.recv_cq = priv->cq;
>>  	attr.srq = priv->cm.srq;
>>  	attr.cap.max_send_wr = ipoib_sendq_size;
>> +	attr.cap.max_recv_wr = 1;
>>  	attr.cap.max_send_sge = 1;
>> +	attr.cap.max_recv_sge = 1;
>>  	attr.sq_sig_type = IB_SIGNAL_ALL_WR;
>>  	attr.qp_type = IB_QPT_RC;
>>  	attr.send_cq = cq;
>> @@ -857,7 +1218,7 @@ static int ipoib_cm_send_req(struct net_
>>  	req.retry_count 	      = 0; /* RFC draft warns against retries */
>>  	req.rnr_retry_count 	      = 0; /* RFC draft warns against retries */
>>  	req.max_cm_retries 	      = 15;
>> -	req.srq 	              = 1;
>> +	req.srq			      = !!priv->cm.srq;
>>  	return ib_send_cm_req(id, &req);
>>  }
>>
>> @@ -1202,6 +1563,11 @@ static void ipoib_cm_rx_reap(struct work
>>  	list_for_each_entry_safe(p, n, &list, list) {
>>  		ib_destroy_cm_id(p->id);
>>  		ib_destroy_qp(p->qp);
>> +		if (!priv->cm.srq) {
>> +			spin_lock(&nosrq_count.lock);
>> +			nosrq_count.current_rc_qp--;
>> +			spin_unlock(&nosrq_count.lock);
>> +		}
>>  		kfree(p);
>>  	}
>>  }
>> @@ -1220,12 +1586,19 @@ static void ipoib_cm_stale_task(struct w
>>  		p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list);
>>  		if (time_before_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT))
>>  			break;
>> -		list_move(&p->list, &priv->cm.rx_error_list);
>> -		p->state = IPOIB_CM_RX_ERROR;
>> -		spin_unlock_irq(&priv->lock);
>> -		ret = ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE);
>> -		if (ret)
>> -			ipoib_warn(priv, "unable to move qp to error state:
> %d\n",
>> ret);
>> +		if (!priv->cm.srq) {
>> +			free_resources_nosrq(priv, p);
>> +			list_del_init(&p->list);
>> +			priv->cm.rx_index_table[p->index] = NULL;
>> +			spin_unlock_irq(&priv->lock);
>> +		} else {
>> +			list_move(&p->list, &priv->cm.rx_error_list);
>> +			p->state = IPOIB_CM_RX_ERROR;
>> +			spin_unlock_irq(&priv->lock);
>> +			ret = ib_modify_qp(p->qp, &ipoib_cm_err_attr,
> IB_QP_STATE);
>> +			if (ret)
>> +				ipoib_warn(priv, "unable to move qp to error
> state:
>> %d\n", ret);
>> +		}
>>  		spin_lock_irq(&priv->lock);
>>  	}
>>
>> @@ -1279,16 +1652,40 @@ int ipoib_cm_add_mode_attr(struct net_de
>>  	return device_create_file(&dev->dev, &dev_attr_mode);
>>  }
>>
>> +static int create_srq(struct net_device *dev, struct ipoib_dev_priv *priv)
>> +{
>> +	struct ib_srq_init_attr srq_init_attr;
>> +	int ret;
>> +
>> +	srq_init_attr.attr.max_wr = ipoib_recvq_size;
>> +	srq_init_attr.attr.max_sge = IPOIB_CM_RX_SG;
>> +
>> +	priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr);
>> +	if (IS_ERR(priv->cm.srq)) {
>> +		ret = PTR_ERR(priv->cm.srq);
>> +		priv->cm.srq = NULL;
>> +		return ret;
> 
> nit: you can just return PTR_ERR here, and remove the ret stack variable

okay
> 
>> +	}
>> +
>> +	priv->cm.srq_ring = kzalloc(ipoib_recvq_size *
>> +		                    sizeof *priv->cm.srq_ring,
>> +			            GFP_KERNEL);
>> +	if (!priv->cm.srq_ring) {
>> +		printk(KERN_WARNING "%s: failed to allocate CM ring "
>> +		       "(%d entries)\n",
>> +	       	       priv->ca->name, ipoib_recvq_size);
>> +		ipoib_cm_dev_cleanup(dev);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  int ipoib_cm_dev_init(struct net_device *dev)
>>  {
>>  	struct ipoib_dev_priv *priv = netdev_priv(dev);
>> -	struct ib_srq_init_attr srq_init_attr = {
>> -		.attr = {
>> -			.max_wr  = ipoib_recvq_size,
>> -			.max_sge = IPOIB_CM_RX_SG
>> -		}
>> -	};
>>  	int ret, i;
>> +	struct ib_device_attr attr;
>>
>>  	INIT_LIST_HEAD(&priv->cm.passive_ids);
>>  	INIT_LIST_HEAD(&priv->cm.reap_list);
>> @@ -1305,20 +1702,33 @@ int ipoib_cm_dev_init(struct net_device
>>
>>  	skb_queue_head_init(&priv->cm.skb_queue);
>>
>> -	priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr);
>> -	if (IS_ERR(priv->cm.srq)) {
>> -		ret = PTR_ERR(priv->cm.srq);
>> -		priv->cm.srq = NULL;
>> +	if (ret = ib_query_device(priv->ca, &attr))
>>  		return ret;
> 
> double parens around assignment - also below

okay
> 
>> -	}
>>
>> -	priv->cm.srq_ring = kzalloc(ipoib_recvq_size * sizeof
> *priv->cm.srq_ring,
>> -				    GFP_KERNEL);
>> -	if (!priv->cm.srq_ring) {
>> -		printk(KERN_WARNING "%s: failed to allocate CM ring (%d
>> entries)\n",
>> -		       priv->ca->name, ipoib_recvq_size);
>> -		ipoib_cm_dev_cleanup(dev);
>> -		return -ENOMEM;
>> +	if (attr.max_srq) {
>> +		/* This device supports SRQ */
>> +		if (ret = create_srq(dev, priv))
>> +			return ret;
>> +		priv->cm.rx_index_table = NULL;
>> +	} else {
>> +		priv->cm.srq = NULL;
>> +		priv->cm.srq_ring = NULL;
>> +
>> +		/* Every new REQ that arrives creates a struct ipoib_cm_rx.
>> +		 * These structures form a link list starting with the
>> +		 * passive_ids. For quick and easy access we maintain a table
>> +		 * of pointers to struct ipoib_cm_rx called the rx_index_table
>> +		 */
> 
> Why store the structures in a linked list if they're stored in a table?

This linked list is common to both SRQ and NOSRQ. Only the NOSRQ code
uses the table.
> 
>> +		priv->cm.rx_index_table = kzalloc(NOSRQ_INDEX_TABLE_SIZE *
>> +					 sizeof *priv->cm.rx_index_table,
>> +					 GFP_KERNEL);
>> +		if (!priv->cm.rx_index_table) {
>> +			printk(KERN_WARNING "Failed to allocate
>> NOSRQ_INDEX_TABLE\n");
>> +			return -ENOMEM;
>> +		}
>> +
>> +		spin_lock_init(&nosrq_count.lock);
>> +		nosrq_count.current_rc_qp = 0;
>>  	}
>>
>>  	for (i = 0; i < IPOIB_CM_RX_SG; ++i)
>> @@ -1331,17 +1741,23 @@ int ipoib_cm_dev_init(struct net_device
>>  	priv->cm.rx_wr.sg_list = priv->cm.rx_sge;
>>  	priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG;
>>
>> -	for (i = 0; i < ipoib_recvq_size; ++i) {
>> -		if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1,
>> +	/* One can post receive buffers even before the RX QP is created
>> +	 * only in the SRQ case. Therefore for NOSRQ we skip the rest of init
>> +	 * and do that in ipoib_cm_req_handler() */
> 
> This is separate from this patch, but why not wait to post receives to a SRQ
> only after we've received a REQ?  Would this simplify the code any?

Good point. We could think of that in the future.
> 
> - Sean
> 


From sean.hefty at intel.com  Tue Jun 12 18:24:49 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 12 Jun 2007 18:24:49 -0700
Subject: [ofa-general] IPOB CM (NOSRQ) [PATCH V6] patch
In-Reply-To: <466F36C8.5010507@linux.vnet.ibm.com>
Message-ID: <000001c7ad59$a2a93040$8ec8180a@amr.corp.intel.com>

>This function is called only when a REQ is received. Otherwise
>current_rc_qp is only used in the error case, or when the connection
>is being torn down. Hence I don't think it makes a significant
>difference which one is used.

I'm not hung up on this, but it appears that current_rc_qp is being used as an
atomic (read, inc, dec).  Converting it to an atomic seems cleaner.

>Move the closing "*/" to the next line?

The preferred format for multi-line comments is:

/*
 * first line
 * second line
 * etc.
 */

I don't know how well the existing code follows this format...

>>> +	if (p && time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) {
>>> +		spin_lock_irqsave(&priv->lock, flags);
>>> +		p->jiffies = jiffies;
>>> +		/* Move this entry to list head, but do
>>> +		 * not re-add it if it has been removed. */
>>> +		if (!list_empty(&p->list))
>>
>> This line is the only difference between this function and the previous one.
>Is
>> it possible to always use the state check?
>
>The state check is only used in the SRQ case.

I guess I was just asking whether the non-SRQ case could be made to make use of
state as well.  (I'll leave that to you, since I'm not as familiar with the
code.  I was just looking for ways to make the SRQ/no-SRQ code common, but only
if it simplifies the code in the end.)

- Sean


From swise at opengridcomputing.com  Tue Jun 12 18:48:52 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 12 Jun 2007 20:48:52 -0500
Subject: [ofa-general] Re: copyright warning/problem within ofed-1.2
In-Reply-To: <8A71B368A89016469F72CD08050AD33401505886@maui.asicdesigners.com>
References: <466ED3CF.456F.00C7.0@novell.com>
	<8A71B368A89016469F72CD08050AD33401505886@maui.asicdesigners.com>
Message-ID: <466F4D04.70102@opengridcomputing.com>

Done.

Vlad/Tziporet:  Please pull from

git://git.openfabrics.org/~swise/libcxgb3

The changes are only copyright headers/comments.

Thanks,

Steve.


Felix Marti wrote:
> Steve,
> 
> Can you change the offending file to use an appropriate copyright
> statement?
> 
> Thanks,
> felix
> 
>> -----Original Message-----
>> From: Patrick Mullaney [mailto:pmullaney at novell.com]
>> Sent: Tuesday, June 12, 2007 2:12 PM
>> To: tziporet at mellanox.co.il
>> Cc: Felix Marti; Matthias Nagorni; Moiz Kohari
>> Subject: copyright warning/problem within ofed-1.2
>>
>> Hi Tziporet,
>>
>> We just ran across a copyright in libcgxgb3
> library(firmware_exports.h).
>> We may not be able to ship this in its current state - can we get this
>> changed? I looked around and it seems like there was a patch to remove
> it
>> that was submitted but it doesn't seem to have made it to the release.
>>
>> Thanks.
>> Patrick
>>
> 


From jackm at dev.mellanox.co.il  Tue Jun 12 22:35:13 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Wed, 13 Jun 2007 08:35:13 +0300
Subject: [ofa-general] [PATCH 1 of 2] libmlx4: deal with ownership bit
	wraparound when cleaning cq
Message-ID: <200706130835.13642.jackm@dev.mellanox.co.il>

1. ntohl should apply only to cqe->my_qpn.
2. when compacting the cqe's, need to preserve the
   proper ownership value of the cqe in case of wraparound.

Found by Ronni Zimmerman of Mellanox.

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

diff --git a/src/cq.c b/src/cq.c
index a1831ff..ead1004 100644
--- a/src/cq.c
+++ b/src/cq.c
@@ -404,14 +404,24 @@ void mlx4_cq_clean(struct mlx4_cq *cq, uint32_t qpn, struct mlx4_srq *srq)
 	 * that match our QP by copying older entries on top of them.
 	 */
 	while ((int) --prod_index - (int) cq->cons_index >= 0) {
+		struct mlx4_cqe *q;
+		uint8_t own;
 		cqe = get_cqe(cq, prod_index & cq->ibv_cq.cqe);
-		if (ntohl((cqe->my_qpn) & 0xffffff) == qpn) {
+		if ((ntohl(cqe->my_qpn) & 0xffffff) == qpn) {
 			if (srq && !(cqe->owner_sr_opcode & MLX4_CQE_IS_SEND_MASK))
 				mlx4_free_srq_wqe(srq, ntohs(cqe->wqe_index));
 			++nfreed;
-		} else if (nfreed)
-			memcpy(get_cqe(cq, (prod_index + nfreed) & cq->ibv_cq.cqe),
-			       cqe, sizeof *cqe);
+		} else if (nfreed) {
+			/*
+			 * preserve proper ownership bit value in case of
+			 * wraparound.
+			 */
+			q = get_cqe(cq, (prod_index + nfreed) & cq->ibv_cq.cqe);
+			own = q->owner_sr_opcode & MLX4_CQE_OWNER_MASK;
+			memcpy(q, cqe, sizeof *cqe);
+			q->owner_sr_opcode =
+				(q->owner_sr_opcode & ~MLX4_CQE_OWNER_MASK) | own;
+		}
 	}
 
 	if (nfreed) {


From jackm at dev.mellanox.co.il  Tue Jun 12 22:36:24 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Wed, 13 Jun 2007 08:36:24 +0300
Subject: [ofa-general] [PATCH 2 of 2] mlx4: deal with ownership bit
	wraparound when cleaning cq
Message-ID: <200706130836.25074.jackm@dev.mellanox.co.il>

When compacting the cqe's, need to preserve the
proper ownership value of the cqe in case of wraparound.

Found by Ronni Zimmerman of Mellanox.

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 048c527..ced854d 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -496,14 +496,24 @@ void __mlx4_ib_cq_clean(struct mlx4_ib_cq *cq, u32 qpn, struct mlx4_ib_srq *srq)
 	 * that match our QP by copying older entries on top of them.
 	 */
 	while ((int) --prod_index - (int) cq->mcq.cons_index >= 0) {
+		struct mlx4_cqe *q;
+		u8 own;
 		cqe = get_cqe(cq, prod_index & cq->ibcq.cqe);
 		if ((be32_to_cpu(cqe->my_qpn) & 0xffffff) == qpn) {
 			if (srq && !(cqe->owner_sr_opcode & MLX4_CQE_IS_SEND_MASK))
 				mlx4_ib_free_srq_wqe(srq, be16_to_cpu(cqe->wqe_index));
 			++nfreed;
-		} else if (nfreed)
-			memcpy(get_cqe(cq, (prod_index + nfreed) & cq->ibcq.cqe),
-			       cqe, sizeof *cqe);
+		} else if (nfreed) {
+			/*
+			 * preserve proper ownership bit value in case of
+			 * wraparound.
+			 */
+			q = get_cqe(cq, (prod_index + nfreed) & cq->ibcq.cqe);
+			own = q->owner_sr_opcode & MLX4_CQE_OWNER_MASK;
+			memcpy(q, cqe, sizeof *cqe);
+			q->owner_sr_opcode =
+				(q->owner_sr_opcode & ~MLX4_CQE_OWNER_MASK) | own;
+		}
 	}
 
 	if (nfreed) {


From vuhuong at mellanox.com  Wed Jun 13 01:07:33 2007
From: vuhuong at mellanox.com (Vu Pham)
Date: Wed, 13 Jun 2007 01:07:33 -0700
Subject: [ofa-general] OFED 1.x (Gen 2) based SRP target code released!
In-Reply-To: <466E4AD8.6090804@voltaire.com>
References: <9FA59C95FFCBB34EA5E42C1A8573784F6F91AB@mtiexch01.mti.com>
	<465AD2D1.2070100@voltaire.com> <466D9EBD.3090809@mellanox.com>
	<466E4AD8.6090804@voltaire.com>
Message-ID: <466FA5C5.5020006@mellanox.com>

Erez Zilber wrote:
>>>>     
>>> I'm trying to build srpt according to the instructions, but it does
>>> not get built at all. Here's what I did:
>>>
>>> tar xzf OFED-1.2-rc3.tgz
>>> cd OFED-1.2-rc3/SRPMS
>>> rpm2cpio ofa_kernel-1.2-rc3.src.rpm |cpio -i
>>> tar xzf ofa_kernel-1.2.tgz
>>> cd ofa_kernel-1.2
>>> patch -p1 < ~/srpt_inc/add_srpt_01.patch
>>> patch -p1 < ~/srpt_inc/add_srpt_03.patch
>>>   
>> You forget to
>> patch -p1 < ~/srpt_inc/add_srpt_04.patch
>>
>> -vu
> You may want to add it to the README file (it is not mentioned there).


It was not in the original README; however, it is in current 
README in srpt_inc.git


> Is it documented anywhere in openfabrics wiki?
> 


No. It's not in openfabrics wiki

-vu


From vlad at dev.mellanox.co.il  Wed Jun 13 01:33:07 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Wed, 13 Jun 2007 11:33:07 +0300
Subject: [ofa-general] Re: [ewg] Re: copyright warning/problem within
	ofed-1.2
In-Reply-To: <466F4D04.70102@opengridcomputing.com>
References: <466ED3CF.456F.00C7.0@novell.com>	<8A71B368A89016469F72CD08050AD33401505886@maui.asicdesigners.com>
	<466F4D04.70102@opengridcomputing.com>
Message-ID: <466FABC3.6050101@dev.mellanox.co.il>

Done,

Regards,
Vladimir

Steve Wise wrote:
> Done.
> 
> Vlad/Tziporet:  Please pull from
> 
> git://git.openfabrics.org/~swise/libcxgb3
> 
> The changes are only copyright headers/comments.
> 
> Thanks,
> 
> Steve.
> 
> 
> 
> Felix Marti wrote:
>> Steve,
>>
>> Can you change the offending file to use an appropriate copyright
>> statement?
>>
>> Thanks,
>> felix
>>
>>> -----Original Message-----
>>> From: Patrick Mullaney [mailto:pmullaney at novell.com]
>>> Sent: Tuesday, June 12, 2007 2:12 PM
>>> To: tziporet at mellanox.co.il
>>> Cc: Felix Marti; Matthias Nagorni; Moiz Kohari
>>> Subject: copyright warning/problem within ofed-1.2
>>>
>>> Hi Tziporet,
>>>
>>> We just ran across a copyright in libcgxgb3
>> library(firmware_exports.h).
>>> We may not be able to ship this in its current state - can we get this
>>> changed? I looked around and it seems like there was a patch to remove
>> it
>>> that was submitted but it doesn't seem to have made it to the release.
>>>
>>> Thanks.
>>> Patrick
>>>
>>
> 
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> 


From mst at dev.mellanox.co.il  Wed Jun 13 01:45:31 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Jun 2007 11:45:31 +0300
Subject: [ofa-general] Re: crash in ipoib
In-Reply-To: <466F211A.1000005@ichips.intel.com>
References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>
	<20070612183521.GC10688@mellanox.co.il>
	<466F211A.1000005@ichips.intel.com>
Message-ID: <20070613084531.GG1975@mellanox.co.il>

> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [ofa-general] Re: crash in ipoib
> 
> They're around
> 
> >ipoib_cm_send + 433
> 
> if (unlikely(post_send(priv, tx, tx->tx_head...
> 
> >ipoib_cm_handle_rx_wc + 239
> 
> skb = priv->cm.srq_ring[wr_id].skb
>  or
> if (unlikely(wc->status != IB_WC_SUCCESS)) {
> 
> (This one isn't matching up quite right, but appears to be in this area.)
> 
> In my earlier trace, I saw ipoib_cm_handle_rx_wc + 378,  which is around:
> 
> if (p->state == IPOIB_CM_RX_LIVE)
> ->	list_move(&p->list, &priv->cm.passive_ids);
> 
> - Sean

This looks strange. Can you supply some more data please?
Which HCA are you running on?
What test are you running?
What should I do to reproduce this?
Further, could you supply the full oops?

-- 
MST


From vlad at lists.openfabrics.org  Wed Jun 13 02:45:35 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Wed, 13 Jun 2007 02:45:35 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070613-0200 daily build status
Message-ID: <20070613094535.E691CE6089D@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.12
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.16
Passed on ia64 with linux-2.6.13
Passed on ppc64 with linux-2.6.12
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.12
Passed on ppc64 with linux-2.6.18
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.17
Passed on powerpc with linux-2.6.13
Passed on x86_64 with linux-2.6.18
Passed on ppc64 with linux-2.6.15
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.17
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.14
Passed on powerpc with linux-2.6.15
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.14
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.15
Passed on ppc64 with linux-2.6.13
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From tziporet at dev.mellanox.co.il  Wed Jun 13 05:55:42 2007
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Wed, 13 Jun 2007 15:55:42 +0300
Subject: [ofa-general] Re: [ewg] Re: copyright warning/problem within
	ofed-1.2
In-Reply-To: <466FABC3.6050101@dev.mellanox.co.il>
References: <466ED3CF.456F.00C7.0@novell.com>	<8A71B368A89016469F72CD08050AD33401505886@maui.asicdesigners.com>	<466F4D04.70102@opengridcomputing.com>
	<466FABC3.6050101@dev.mellanox.co.il>
Message-ID: <466FE94E.5010301@mellanox.co.il>

Vladimir Sokolovsky wrote:
> Done,
>
> Regards,
> Vladimir
>
> Steve Wise wrote:
>> Done.
>>
>> Vlad/Tziporet:  Please pull from
>>
>> git://git.openfabrics.org/~swise/libcxgb3
>>
>> The changes are only copyright headers/comments.
>>
>> Thanks,
>>
>> Steve.

This will not be in RC5 - only the final release

Tziporet


From tziporet at mellanox.co.il  Wed Jun 13 07:25:53 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Wed, 13 Jun 2007 17:25:53 +0300
Subject: [ofa-general] OFED 1.2 rc5 release
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9015634B7@mtlexch01.mtl.com>
References: <43AA3CB3C1BF5A499F5AAD31CA5023AC06624A26@mtlexch01.mtl.com>
	<6C2C79E72C305246B504CBA17B5500C9015634B7@mtlexch01.mtl.com>
Message-ID: <6C2C79E72C305246B504CBA17B5500C90156362A@mtlexch01.mtl.com>

Hi, 

OFED 1.2-RC5 is available on
http://www.openfabrics.org/builds/ofed-1.2/ 
File: OFED-1.2-rc5.tgz 
To get BUILD_ID run ofed_info 

Please report any issues in bugzilla https://bugs.openfabrics.org/

The GA release is expected next Wed (June 20) based on RC5 tests

Tziporet & Vlad 

========================================================================

Release information: 

OS support: 
Novell: 
    - SLES 9.0 SP3 
    - SLES10 
    - SLES10 SP1 RC5
Redhat: 
    - Redhat EL4 up3, up4 and up5 
    - Redhat EL5 
kernel.org: 
    - 2.6.20 
    - 2.6.19 

Note: Fedora C6 and SuSE Pro 10 are not part of the official list. 
We keep the backport patches for these OSes and make sure OFED compile
and loaded properly but will not do full QA cycle.

Systems: 
    * x86_64 
    * x86 
    * ia64 
    * ppc64 

Main changes from OFED-1.1-rc4: 
===============================
1. Fixed 8 bugs (see attached for fixed issues)
2. Added support for SLES10 SP1 RC5 (tvflash is disabled for now)
3. Added support for iSER on RHEL 4
4. Updated documents - all owners please review to make sure docs of
your component is updated.

See bugzilla for all open issues. 

Tasks that should be completed for the GA release: 
1. Complete all documentation (release notes, README, etc.) 
2. Run all QA tests on all platforms
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rc5_fixed_bugs.csv
Type: application/octet-stream
Size: 719 bytes
Desc: rc5_fixed_bugs.csv
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070613/8afab4e5/attachment.obj>

From jsquyres at cisco.com  Wed Jun 13 07:46:27 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Wed, 13 Jun 2007 10:46:27 -0400
Subject: [ofa-general] OFED 1.2 rc5 release
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90156362A@mtlexch01.mtl.com>
References: <43AA3CB3C1BF5A499F5AAD31CA5023AC06624A26@mtlexch01.mtl.com>
	<6C2C79E72C305246B504CBA17B5500C9015634B7@mtlexch01.mtl.com>
	<6C2C79E72C305246B504CBA17B5500C90156362A@mtlexch01.mtl.com>
Message-ID: <D85B478A-AB52-41B3-BBDC-40B8CD98AADD@cisco.com>

Here is a minor spacing/nits patch for MPI_README.txt.

Additionally, I think that all three "Setup for * MPI..." sections  
should be modified so that they are consistent with each other.   
Specifically, I notice that the MVAPICH and MVAPICH2 sections make  
reference to sourcing shell setup files.  This is obsolete; there is  
a whole section on the mpi-selector that should address setting up  
for Open MPI and MVAPICH*.  See Section 3.1 for what I thought we  
were going to talk about in the "Setup for * MPI ..." sections.   
Regardless of what we decide to discuss, the 3 sections should be  
consistent.

Also, it seems a little odd that the ordering is MVAPICH, OMPI,  
MVAPICH2.  Shouldn't MVAPICH and MVAPICH2 go together?  If we want to  
go alphabetically, we should go in order: MVAPICH, MVAPICH2, OMPI.   
It just seems odd that the 2 MVAPICH sections are not next to each  
other.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: MPI_README.patch
Type: application/octet-stream
Size: 2861 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070613/de28ccaf/attachment.obj>
-------------- next part --------------


On Jun 13, 2007, at 10:25 AM, Tziporet Koren wrote:

> Hi,
>
> OFED 1.2-RC5 is available on
> http://www.openfabrics.org/builds/ofed-1.2/
> File: OFED-1.2-rc5.tgz
> To get BUILD_ID run ofed_info
>
> Please report any issues in bugzilla https://bugs.openfabrics.org/
>
> The GA release is expected next Wed (June 20) based on RC5 tests
>
> Tziporet & Vlad
>
> ====================================================================== 
> ==
>
> Release information:
>
> OS support:
> Novell:
>     - SLES 9.0 SP3
>     - SLES10
>     - SLES10 SP1 RC5
> Redhat:
>     - Redhat EL4 up3, up4 and up5
>     - Redhat EL5
> kernel.org:
>     - 2.6.20
>     - 2.6.19
>
> Note: Fedora C6 and SuSE Pro 10 are not part of the official list.
> We keep the backport patches for these OSes and make sure OFED compile
> and loaded properly but will not do full QA cycle.
>
> Systems:
>     * x86_64
>     * x86
>     * ia64
>     * ppc64
>
> Main changes from OFED-1.1-rc4:
> ===============================
> 1. Fixed 8 bugs (see attached for fixed issues)
> 2. Added support for SLES10 SP1 RC5 (tvflash is disabled for now)
> 3. Added support for iSER on RHEL 4
> 4. Updated documents - all owners please review to make sure docs of
> your component is updated.
>
> See bugzilla for all open issues.
>
> Tasks that should be completed for the GA release:
> 1. Complete all documentation (release notes, README, etc.)
> 2. Run all QA tests on all platforms
> <rc5_fixed_bugs.csv>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> openib-general


-- 
Jeff Squyres
Cisco Systems


From erezz at voltaire.com  Wed Jun 13 07:49:14 2007
From: erezz at voltaire.com (Erez Zilber)
Date: Wed, 13 Jun 2007 17:49:14 +0300
Subject: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
In-Reply-To: <20070612084108.GK6470@mellanox.co.il>
References: <20070612084108.GK6470@mellanox.co.il>
Message-ID: <467003EA.7070901@voltaire.com>


> Erez, and other iser maintainers, I had a problem with RHEL4 iscsi backports
> (scsi_flush_work isn't exported) I decided that since it isn't
> called on older kernels it's reasonably safe to just comment it out,
> but would be interested to hear you opinion.
> See it in this sub-directory:
> kernel_patches/backport/2.6.9_U2/libiscsi_no_flush_to_2_6_9.patch
>   

This leads me to something that I thought about in the past. Old kernels
(i.e. the RH4 kernels) don't have the SCSI work queue. Therefore, I used
schedule_work instead of scsi_queue_work. Now, I cannot replace
scsi_flush_work with flush_workqueue because I'm using a workqueue which
does not belong to me (and, therefore, I cannot flush it).

I'm thinking about adding a backport that will create a workqueue for
each session in open-iscsi. With this, I can queue & flush. Mike - what
do you think about that? I think that creating a workqueue in open-iscsi
per session will be the closer thing to the SCSI workqueue that we have
in new kernels.

Erez


From andrey.slepuhin at t-platforms.ru  Wed Jun 13 07:56:57 2007
From: andrey.slepuhin at t-platforms.ru (Andrey Slepuhin)
Date: Wed, 13 Jun 2007 18:56:57 +0400
Subject: [ofa-general] Problems with mlx4
Message-ID: <467005B9.8070708@t-platforms.ru>

Dear folks,

I just setup a test cluster using ConnectX cards, but I can not get link 
up. I downloaded the kernel from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git

After inserting the modules I see that the card was initialized:

Jun 13 22:17:23 testnode1 kernel: mlx4_core: Mellanox ConnectX core 
driver v0.01 (May 1, 2007)
Jun 13 22:17:23 testnode1 kernel: mlx4_core: Initializing 0000:07:00.0
Jun 13 22:17:23 testnode1 kernel: ACPI: PCI Interrupt 0000:07:00.0[A] -> 
GSI 16 (level, low) -> IRQ 16
Jun 13 22:17:23 testnode1 kernel: PCI: Setting latency timer of device 
0000:07:00.0 to 64

But the link remains in "DOWN" state:

testnode1:~ # /opt/ofed/bin/ibstatus
Infiniband device 'mlx4_0' port 1 status:
        default gid:     fe80:0000:0000:0000:0002:c903:0000:07a1
        base lid:        0x0
        sm lid:          0x0
        state:           1: DOWN
        phys state:      2: Polling
        rate:            20 Gb/sec (4X DDR)

Infiniband device 'mlx4_0' port 2 status:
        default gid:     fe80:0000:0000:0000:0002:c903:0000:07a2
        base lid:        0x0
        sm lid:          0x0
        state:           1: DOWN
        phys state:      2: Polling
        rate:            20 Gb/sec (4X DDR)

I tried different ports and cables but without success. Do you have any 
idea what's going wrong?
The nodes configuration is:
Intel S5000PSL motherboard, 2xXeon 5345, 8GB RAM
All the nodes are connected to Flextronics (Mellanox) 24-port DDR switch.
I'm running SLES10 with the kernel from Roland's tree:
testnode1:~ # uname -a
Linux testnode1 2.6.22-rc3 #1 SMP Wed Jun 6 23:56:36 MSD 2007 x86_64 
x86_64 x86_64 GNU/Linux

Any help will be much appreciated.

Thanks in advance,
Andrey


From minich at ornl.gov  Wed Jun 13 08:05:00 2007
From: minich at ornl.gov (Makia Minich)
Date: Wed, 13 Jun 2007 11:05:00 -0400
Subject: [ofa-general] Problems with mlx4
In-Reply-To: <467005B9.8070708@t-platforms.ru>
References: <467005B9.8070708@t-platforms.ru>
Message-ID: <200706131105.00563.minich@ornl.gov>

Are you running an SM anywhere?  If I remember correctly, the Flextronics 
switch does not have an embeded SM.

On Wednesday 13 June 2007 10:56:57 am Andrey Slepuhin wrote:
> Dear folks,
>
> I just setup a test cluster using ConnectX cards, but I can not get link
> up. I downloaded the kernel from
>
> git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git
>
> After inserting the modules I see that the card was initialized:
>
> Jun 13 22:17:23 testnode1 kernel: mlx4_core: Mellanox ConnectX core
> driver v0.01 (May 1, 2007)
> Jun 13 22:17:23 testnode1 kernel: mlx4_core: Initializing 0000:07:00.0
> Jun 13 22:17:23 testnode1 kernel: ACPI: PCI Interrupt 0000:07:00.0[A] ->
> GSI 16 (level, low) -> IRQ 16
> Jun 13 22:17:23 testnode1 kernel: PCI: Setting latency timer of device
> 0000:07:00.0 to 64
>
> But the link remains in "DOWN" state:
>
> testnode1:~ # /opt/ofed/bin/ibstatus
> Infiniband device 'mlx4_0' port 1 status:
>         default gid:     fe80:0000:0000:0000:0002:c903:0000:07a1
>         base lid:        0x0
>         sm lid:          0x0
>         state:           1: DOWN
>         phys state:      2: Polling
>         rate:            20 Gb/sec (4X DDR)
>
> Infiniband device 'mlx4_0' port 2 status:
>         default gid:     fe80:0000:0000:0000:0002:c903:0000:07a2
>         base lid:        0x0
>         sm lid:          0x0
>         state:           1: DOWN
>         phys state:      2: Polling
>         rate:            20 Gb/sec (4X DDR)
>
> I tried different ports and cables but without success. Do you have any
> idea what's going wrong?
> The nodes configuration is:
> Intel S5000PSL motherboard, 2xXeon 5345, 8GB RAM
> All the nodes are connected to Flextronics (Mellanox) 24-port DDR switch.
> I'm running SLES10 with the kernel from Roland's tree:
> testnode1:~ # uname -a
> Linux testnode1 2.6.22-rc3 #1 SMP Wed Jun 6 23:56:36 MSD 2007 x86_64
> x86_64 x86_64 GNU/Linux
>
> Any help will be much appreciated.
>
> Thanks in advance,
> Andrey
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general

-- 
Makia Minich <minich at ornl.gov>
National Center for Computation Science
Oak Ridge National Laboratory
--*--
Imagine no possessions
I wonder if you can
- John Lennon


From rdreier at cisco.com  Wed Jun 13 08:05:55 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 13 Jun 2007 08:05:55 -0700
Subject: [ofa-general] Problems with mlx4
In-Reply-To: <467005B9.8070708@t-platforms.ru> (Andrey Slepuhin's message of
	"Wed, 13 Jun 2007 18:56:57 +0400")
References: <467005B9.8070708@t-platforms.ru>
Message-ID: <adaejkf7v3g.fsf@cisco.com>

 > I just setup a test cluster using ConnectX cards, but I can not get
 > link up.

Most likely you need to update your switch FW.  You need Anafa2 FW
version 1.0 to negotiate a DDR link with ConnectX.

BTW what firmware version do you have on your HCAs?  You probably want
to update to 2.0.156 (the mlx4 driver won't work with 2.0.158 for a
day or two still) so that you don't have to monkey around with
hard-coding your switch ports to DDR only.

 - R.


From rdreier at cisco.com  Wed Jun 13 08:06:31 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 13 Jun 2007 08:06:31 -0700
Subject: [ofa-general] Problems with mlx4
In-Reply-To: <200706131105.00563.minich@ornl.gov> (Makia Minich's message of
	"Wed, 13 Jun 2007 11:05:00 -0400")
References: <467005B9.8070708@t-platforms.ru>
	<200706131105.00563.minich@ornl.gov>
Message-ID: <adaabv37v2g.fsf@cisco.com>

 > Are you running an SM anywhere?  If I remember correctly, the Flextronics 
 > switch does not have an embeded SM.

Even without an SM the ports will go to INIT (and if the ports are
DOWN then an SM can't do anything to help).

 - R.


From andrey.slepuhin at t-platforms.ru  Wed Jun 13 08:09:10 2007
From: andrey.slepuhin at t-platforms.ru (Andrey Slepuhin)
Date: Wed, 13 Jun 2007 19:09:10 +0400
Subject: [ofa-general] Problems with mlx4
In-Reply-To: <200706131105.00563.minich@ornl.gov>
References: <467005B9.8070708@t-platforms.ru>
	<200706131105.00563.minich@ornl.gov>
Message-ID: <46700896.7090807@t-platforms.ru>

No, I can not start OpenSM  just because the port after loading the 
driver is in the "DOWN" state, not "INIT".

Best regards,
Andrey

Makia Minich wrote:
> Are you running an SM anywhere?  If I remember correctly, the Flextronics 
> switch does not have an embeded SM.
>
> On Wednesday 13 June 2007 10:56:57 am Andrey Slepuhin wrote:
>   
>> Dear folks,
>>
>> I just setup a test cluster using ConnectX cards, but I can not get link
>> up. I downloaded the kernel from
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git
>>
>> After inserting the modules I see that the card was initialized:
>>
>> Jun 13 22:17:23 testnode1 kernel: mlx4_core: Mellanox ConnectX core
>> driver v0.01 (May 1, 2007)
>> Jun 13 22:17:23 testnode1 kernel: mlx4_core: Initializing 0000:07:00.0
>> Jun 13 22:17:23 testnode1 kernel: ACPI: PCI Interrupt 0000:07:00.0[A] ->
>> GSI 16 (level, low) -> IRQ 16
>> Jun 13 22:17:23 testnode1 kernel: PCI: Setting latency timer of device
>> 0000:07:00.0 to 64
>>
>> But the link remains in "DOWN" state:
>>
>> testnode1:~ # /opt/ofed/bin/ibstatus
>> Infiniband device 'mlx4_0' port 1 status:
>>         default gid:     fe80:0000:0000:0000:0002:c903:0000:07a1
>>         base lid:        0x0
>>         sm lid:          0x0
>>         state:           1: DOWN
>>         phys state:      2: Polling
>>         rate:            20 Gb/sec (4X DDR)
>>
>> Infiniband device 'mlx4_0' port 2 status:
>>         default gid:     fe80:0000:0000:0000:0002:c903:0000:07a2
>>         base lid:        0x0
>>         sm lid:          0x0
>>         state:           1: DOWN
>>         phys state:      2: Polling
>>         rate:            20 Gb/sec (4X DDR)
>>
>> I tried different ports and cables but without success. Do you have any
>> idea what's going wrong?
>> The nodes configuration is:
>> Intel S5000PSL motherboard, 2xXeon 5345, 8GB RAM
>> All the nodes are connected to Flextronics (Mellanox) 24-port DDR switch.
>> I'm running SLES10 with the kernel from Roland's tree:
>> testnode1:~ # uname -a
>> Linux testnode1 2.6.22-rc3 #1 SMP Wed Jun 6 23:56:36 MSD 2007 x86_64
>> x86_64 x86_64 GNU/Linux
>>
>> Any help will be much appreciated.
>>
>> Thanks in advance,
>> Andrey
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit
>> http://openib.org/mailman/listinfo/openib-general
>>     
>
>   


From andrey.slepuhin at t-platforms.ru  Wed Jun 13 08:14:37 2007
From: andrey.slepuhin at t-platforms.ru (Andrey Slepuhin)
Date: Wed, 13 Jun 2007 19:14:37 +0400
Subject: [ofa-general] Problems with mlx4
In-Reply-To: <adaejkf7v3g.fsf@cisco.com>
References: <467005B9.8070708@t-platforms.ru> <adaejkf7v3g.fsf@cisco.com>
Message-ID: <467009DD.804@t-platforms.ru>

That's what I afraid of... Ok, I will try to update the switch firmware, 
but do you have a link to ConnectX firmware? It is not present at public 
Mellanox site...

Thanks,
Andrey

Roland Dreier wrote:
>  > I just setup a test cluster using ConnectX cards, but I can not get
>  > link up.
>
> Most likely you need to update your switch FW.  You need Anafa2 FW
> version 1.0 to negotiate a DDR link with ConnectX.
>
> BTW what firmware version do you have on your HCAs?  You probably want
> to update to 2.0.156 (the mlx4 driver won't work with 2.0.158 for a
> day or two still) so that you don't have to monkey around with
> hard-coding your switch ports to DDR only.
>
>  - R.
>   


From landman at scalableinformatics.com  Wed Jun 13 08:15:08 2007
From: landman at scalableinformatics.com (Joe Landman)
Date: Wed, 13 Jun 2007 11:15:08 -0400
Subject: [ofa-general] quick IPoIB config question
Message-ID: <467009FC.3070402@scalableinformatics.com>

Hi folks:

   Built OFED-1.2-rc4 on OpenSuSE 10.2, works fine as long as I turn of 
32-bit build, and update to a 2.6.20 kernel.  Installed the RPMs after 
build, and the system appears to be fine/well behaved.  Is there a 
OFED-specific technique to have the ib0 interface configure at boot 
time, after drivers load?   This might be distribution specific.

I created a file named /etc/sysconfig/network/ifcfg-ib0 which contained

BOOTPROTO='static'
MTU=''
REMOTE_IPADDR=''
STARTMODE='onboot'
USERCONTROL='no'
NETMASK='255.255.0.0'
IPADDR='10.1.32.2'
DEVICE='ib0'

Bringing the interface up with an 'ifconfig ib0 up' doesn't seem to 
assign the IP address and netmask to it.

Hence my question.  Is there an OFED specific method of configuring this 
(e.g. a config file I need to edit/create), or is it distribution 
dependent?

If I force the issue with an ifconfig, it looks like it works fine. 
This is ok as a work around, and I can create an /etc/init.d/ib or 
similar to force the issue.  I would prefer to do this "the right way", 
and if there is someone with guidance/pointers as to what that is, I 
would prefer to follow that.

Thanks.

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From rdreier at cisco.com  Wed Jun 13 08:23:35 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 13 Jun 2007 08:23:35 -0700
Subject: [ofa-general] Problems with mlx4
In-Reply-To: <467009DD.804@t-platforms.ru> (Andrey Slepuhin's message of "Wed,
	13 Jun 2007 19:14:37 +0400")
References: <467005B9.8070708@t-platforms.ru> <adaejkf7v3g.fsf@cisco.com>
	<467009DD.804@t-platforms.ru>
Message-ID: <ada645r7ua0.fsf@cisco.com>

 > That's what I afraid of... Ok, I will try to update the switch
 > firmware, but do you have a link to ConnectX firmware? It is not
 > present at public Mellanox site...

I don't have a link.  I would suggest contacting whoever supplied your
HCAs to you.

 - R.


From minich at ornl.gov  Wed Jun 13 08:23:15 2007
From: minich at ornl.gov (Makia Minich)
Date: Wed, 13 Jun 2007 11:23:15 -0400
Subject: [ofa-general] Problems with mlx4
In-Reply-To: <adaabv37v2g.fsf@cisco.com>
References: <467005B9.8070708@t-platforms.ru>
	<200706131105.00563.minich@ornl.gov> <adaabv37v2g.fsf@cisco.com>
Message-ID: <200706131123.15262.minich@ornl.gov>

You're right ... I was only half paying attention.  I had the same problem 
with these cards, and I needed to upgrade firmware.  Afterwhich, they came up 
and worked.

On Wednesday 13 June 2007 11:06:31 am Roland Dreier wrote:
>  > Are you running an SM anywhere?  If I remember correctly, the
>  > Flextronics switch does not have an embeded SM.
>
> Even without an SM the ports will go to INIT (and if the ports are
> DOWN then an SM can't do anything to help).
>
>  - R.

-- 
Makia Minich <minich at ornl.gov>
National Center for Computation Science
Oak Ridge National Laboratory
--*--
Imagine no possessions
I wonder if you can
- John Lennon


From michaelc at cs.wisc.edu  Wed Jun 13 08:37:11 2007
From: michaelc at cs.wisc.edu (Mike Christie)
Date: Wed, 13 Jun 2007 10:37:11 -0500
Subject: [ofa-general] ANNOUNCE ofed backports for 2.6.22 kernel bits
In-Reply-To: <467003EA.7070901@voltaire.com>
References: <20070612084108.GK6470@mellanox.co.il>
	<467003EA.7070901@voltaire.com>
Message-ID: <46700F27.1000000@cs.wisc.edu>

Erez Zilber wrote:
>> Erez, and other iser maintainers, I had a problem with RHEL4 iscsi backports
>> (scsi_flush_work isn't exported) I decided that since it isn't
>> called on older kernels it's reasonably safe to just comment it out,
>> but would be interested to hear you opinion.
>> See it in this sub-directory:
>> kernel_patches/backport/2.6.9_U2/libiscsi_no_flush_to_2_6_9.patch
>>   
> 
> This leads me to something that I thought about in the past. Old kernels
> (i.e. the RH4 kernels) don't have the SCSI work queue. Therefore, I used
> schedule_work instead of scsi_queue_work. Now, I cannot replace
> scsi_flush_work with flush_workqueue because I'm using a workqueue which
> does not belong to me (and, therefore, I cannot flush it).
> 
> I'm thinking about adding a backport that will create a workqueue for
> each session in open-iscsi. With this, I can queue & flush. Mike - what
> do you think about that? I think that creating a workqueue in open-iscsi
> per session will be the closer thing to the SCSI workqueue that we have
> in new kernels.
> 

Yeah, that sounds fine. Just to be clear, you would want to create the
single threaded work queue (create_singlethread_workqueue) instead of
the normal thread per cpu work queue for each session.


From rdreier at cisco.com  Wed Jun 13 09:24:15 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 13 Jun 2007 09:24:15 -0700
Subject: [ofa-general] [GIT PULL] please pull tvflash.git
Message-ID: <ada1wgf7rgw.fsf@cisco.com>

Vlad, please pull from

    git://staging.openfabrics.org/~rdreier/tvflash.git

to get tvflash updates that will fix problems building on SLES 10 SP1
and Fedora 7 due to linking with libgz or libz
(https://bugs.openfabrics.org/show_bug.cgi?id=558).

Thanks,
  Roland


From mst at dev.mellanox.co.il  Wed Jun 13 09:38:21 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Jun 2007 19:38:21 +0300
Subject: [ofa-general] [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <466F36C8.5010507@linux.vnet.ibm.com>
References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com>
	<466F36C8.5010507@linux.vnet.ibm.com>
Message-ID: <20070613163821.GB12277@mellanox.co.il>

Here's how I would go about emulating SRQ in ehca in software.  I knocked this
out in several hours, so this is completely untested (not even compiled,
that's why there are no Makefile bits), but it seemed an easiest way
to get the message across on what I consider the right way to do it.
Note how this both has no overhead for HCAs with hardware srq
support and is smaller than nosrq patches.

The idea here is that you can emulate enough of the SRQ
interface in ehca to make IPoIB CM work without changes:
keep QPs on a list, and distribute posted WRs between them evenly.

This naturally does not solve the scalability problems
that IPoIB CM without SRQ would have, but at least it contains
them within ehca.

Another advantage of this approach: noSRQ issues are separated out, so we'll be
able to continue working on IPoIB CM without maintaining two code paths.

There are obvious optimizations that can be done (e.g. each wr is copied
twice on data path, we only need a unidirectional list of cqes ...)
hopefully someone at IBM will look into this: I wanted to avoid touching
low-level code I don't understand and can't test, as much as possible.

Known bugs:
Last wqe reached event is missing in this implementation:
I've run out of time, and it's pretty trivial to add anyway,
by adding a per-QP counter of outstanding WRs.
We'll need a tasklet or a thread for the callback though:
is there a tasklet/thread that can be reused for this?

Caveats:
As an optimization, I used a bit in qp_token to signal SRQ presence.
No idea whether this works in practice in your hardware. If not,
another way to detect SRQ WC will have to be found.

Again, hopefully someone at IBM will look into this.

Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>

---

ehca_classes.h |    6 +
ehca_irq.c     |    2
ehca_iverbs.h  |    6 +
ehca_main.c    |    3
ehca_qp.c      |   14 ++-
ehca_reqs.c    |    3
ehca_srq.c     |  237 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ehca_uverbs.c  |    2
8 files changed, 269 insertions(+), 4 deletions(-)


diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h
index 1d286d3..e54bb82 100644
--- a/drivers/infiniband/hw/ehca/ehca_classes.h
+++ b/drivers/infiniband/hw/ehca/ehca_classes.h
@@ -281,6 +281,9 @@ extern spinlock_t hcall_lock;
 extern struct idr ehca_qp_idr;
 extern struct idr ehca_cq_idr;
 
+#define EHCA_QP_TOKEN_SRQ (1 << 31)
+#define EHCA_QP_TOKEN(token) (token & ~EHCA_QP_TOKEN_SRQ)
+
 extern int ehca_static_rate;
 extern int ehca_port_act_time;
 extern int ehca_use_hp_mr;
@@ -344,4 +347,7 @@ int ehca_cq_assign_qp(struct ehca_cq *cq, struct ehca_qp *qp);
 int ehca_cq_unassign_qp(struct ehca_cq *cq, unsigned int qp_num);
 struct ehca_qp* ehca_cq_get_qp(struct ehca_cq *cq, int qp_num);
 
+int ehca_srq_handle_wc(struct ib_wc *wc, unsigned token);
+int ehca_srq_attach(struct ib_srq *srq, struct ib_qp *qp);
+
 #endif
diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c
index 100329b..f3b078c 100644
--- a/drivers/infiniband/hw/ehca/ehca_irq.c
+++ b/drivers/infiniband/hw/ehca/ehca_irq.c
@@ -182,7 +182,7 @@ static void qp_event_callback(struct ehca_shca *shca,
 	u32 token = EHCA_BMASK_GET(EQE_QP_TOKEN, eqe);
 
 	spin_lock_irqsave(&ehca_qp_idr_lock, flags);
-	qp = idr_find(&ehca_qp_idr, token);
+	qp = idr_find(&ehca_qp_idr, EHCA_QP_TOKEN(token));
 	spin_unlock_irqrestore(&ehca_qp_idr_lock, flags);
 
 
diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h
index 37e7fe0..0f530cc 100644
--- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
+++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
@@ -178,4 +178,10 @@ void ehca_free_fw_ctrlblock(void *ptr);
 #define ehca_free_fw_ctrlblock(ptr) free_page((unsigned long)(ptr))
 #endif
 
+struct ib_srq *ehca_create_srq(struct ib_pd *pd,
+			     struct ib_srq_init_attr *srq_init_attr);
+int ehca_destroy_srq(struct ib_srq *srq);
+int ehca_post_srq_recv(struct ib_srq *ib_srq, struct ib_recv_wr *recv_wr,
+		       struct ib_recv_wr **bad_recv_wr);
+
 #endif
diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index c3f99f3..bfab202 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -330,6 +330,9 @@ int ehca_init_device(struct ehca_shca *shca)
 	/* shca->ib_device.modify_ah	    = ehca_modify_ah;	    */
 	shca->ib_device.query_ah	    = ehca_query_ah;
 	shca->ib_device.destroy_ah	    = ehca_destroy_ah;
+	shca->ib_device.create_srq	    = ehca_create_srq;
+	shca->ib_device.destroy_srq	    = ehca_destroy_srq;
+	shca->ib_device.post_srq_recv	    = ehca_post_srq_recv;
 	shca->ib_device.create_qp	    = ehca_create_qp;
 	shca->ib_device.modify_qp	    = ehca_modify_qp;
 	shca->ib_device.query_qp	    = ehca_query_qp;
diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c
index b5bc787..9a14e90 100644
--- a/drivers/infiniband/hw/ehca/ehca_qp.c
+++ b/drivers/infiniband/hw/ehca/ehca_qp.c
@@ -486,6 +486,9 @@ struct ib_qp *ehca_create_qp(struct ib_pd *pd,
 		goto create_qp_exit0;
 	}
 
+	if (init_attr->srq)
+		my_qp->token |= EHCA_QP_TOKEN_SRQ;
+
 	parms.servicetype = ibqptype2servicetype(init_attr->qp_type);
 	if (parms.servicetype < 0) {
 		ret = -EINVAL;
@@ -663,6 +666,13 @@ struct ib_qp *ehca_create_qp(struct ib_pd *pd,
 		}
 	}
 
+	if (my_qp->ib_qp.srq) {
+		ret = ehca_srq_attach(my_qp->ib_qp.srq, my_qp->ib_qp);
+		if (ret)
+			goto create_qp_exit3;
+	}
+
+
 	return &my_qp->ib_qp;
 
 create_qp_exit3:
@@ -674,7 +684,7 @@ create_qp_exit2:
 
 create_qp_exit1:
 	spin_lock_irqsave(&ehca_qp_idr_lock, flags);
-	idr_remove(&ehca_qp_idr, my_qp->token);
+	idr_remove(&ehca_qp_idr, EHCA_QP_TOKEN(my_qp->token));
 	spin_unlock_irqrestore(&ehca_qp_idr_lock, flags);
 
 create_qp_exit0:
@@ -1408,7 +1418,7 @@ int ehca_destroy_qp(struct ib_qp *ibqp)
 	}
 
 	spin_lock_irqsave(&ehca_qp_idr_lock, flags);
-	idr_remove(&ehca_qp_idr, my_qp->token);
+	idr_remove(&ehca_qp_idr, EHCA_QP_TOKEN(my_qp->token));
 	spin_unlock_irqrestore(&ehca_qp_idr_lock, flags);
 
 	h_ret = hipz_h_destroy_qp(shca->ipz_hca_handle, my_qp);
diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c b/drivers/infiniband/hw/ehca/ehca_reqs.c
index caec9de..b151c67 100644
--- a/drivers/infiniband/hw/ehca/ehca_reqs.c
+++ b/drivers/infiniband/hw/ehca/ehca_reqs.c
@@ -601,6 +601,9 @@ poll_cq_one_exit0:
 	if (cqe_count > 0)
 		hipz_update_feca(my_cq, cqe_count);
 
+	if ((wc->opcode & IB_WC_RECV) && (cqe->qp_token & EHCA_QP_TOKEN_SRQ))
+		ret = ehca_srq_handle_wc(wc, cqe->qp_token);
+
 	return ret;
 }
 
diff --git a/drivers/infiniband/hw/ehca/ehca_srq.c b/drivers/infiniband/hw/ehca/ehca_srq.c
new file mode 100644
index 0000000..1e1574a
--- /dev/null
+++ b/drivers/infiniband/hw/ehca/ehca_srq.c
@@ -0,0 +1,237 @@
+/*
+ *  SRQ emulation for ehca.
+ *
+ *  Author: Michael S. Tsirkin <mst at mellanox.co.il>
+ *
+ *  Copyright (c) 2007 Mellanox Technologies. All rights reserved.
+ *
+ *  This source code is distributed under a dual license of GPL v2.0 and OpenIB
+ *  BSD.
+ *
+ * OpenIB BSD License
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * Redistributions of source code must retain the above copyright notice, this
+ * list of conditions and the following disclaimer.
+ *
+ * Redistributions in binary form must reproduce the above copyright notice,
+ * this list of conditions and the following disclaimer in the documentation
+ * and/or other materials
+ * provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
+ * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+
+#include <linux/spinlock.h>
+#include <linux/list.h>
+#include <rdma/ib_verbs.h>
+#include "ehca_classes.h"
+
+#define EHCA_QPS_PER_SRQ 16
+
+struct ehca_srq_cqe {
+	struct list_head list;
+	struct ib_qp *qp;
+};
+
+struct ehca_srq {
+	struct ib_srq ib_srq;
+	struct ib_srq_attr attr;
+	struct spinlock lock;
+
+	struct ib_recv_wr *wrs;
+	struct ehca_srq_cqe *cqes;
+
+	struct ib_recv_wr *first_polled; /* Polled or unused */
+	struct ib_recv_wr *first_posted; /* Posted on SRQ but not on QP */
+
+	struct list_head polled_cqes; /* Polled */
+	struct list_head free_cqes; /* Posted or unused */
+};
+
+static int ehca_srq_repost(struct ehca_srq *srq)
+{
+	struct ib_recv_wr wr, *wrp, *bad_recv_wr;
+	struct ehca_srq_cqe *c, n;
+	unsigned long flags;
+	int rc = 0;
+
+	spin_lock_irqsave(&srq->lock, flags);
+
+	list_for_each_entry_safe(c, n, &srq->polled_cqes, list) {
+		wrp = srq->first_posted;
+		if (!wrp)
+			break;
+		memcpy(&wr, wrp, sizeof wr);
+		wr.next = NULL;
+		wr.wr_id = (u64)wrp;
+		rc = ib_post_recv(c->qp, &wr, &bad_recv_wr);
+		if (rc)
+			break;
+
+		srq->first_posted = wrp->next;
+		wrp->next = NULL;
+		list_del(&c->list);
+	}
+
+	spin_unlock_irqrestore(&srq->lock, flags);
+	return rc;
+}
+
+int ehca_srq_handle_wc(struct ib_wc *wc, unsigned token)
+{
+	struct ehca_qp *qp;
+	struct ehca_srq *srq;
+	struct ehca_srq_cqe *cqe;
+	struct ib_recv_wr *wr;
+
+	spin_lock_irqsave(&ehca_qp_idr_lock, flags);
+	qp = idr_find(&ehca_qp_idr, EHCA_QP_TOKEN(token));
+	spin_unlock_irqrestore(&ehca_qp_idr_lock, flags);
+
+	if (!qp)
+		return -EINVAL;
+
+	wc->qp = &qp->ib_qp;
+	srq = container_of(qp->ib_qp.srq, *srq, ib_srq);
+	spin_lock_irqsave(&srq->lock, flags);
+	BUG_ON(list_empty(&srq->free_cqes));
+	cqe = container_of(srq->free_cqes.next, typeof *cqe, list);
+	cqe->qp = &qp->ib_qp;
+	list_move(&cqe->list, &srq->polled_cqes);
+	wr = (void *)wc->wr_id;
+	wc->wr_id = wr->wr_id;
+	wr->next = srq->first_polled;
+	srq->first_polled = wr;
+	spin_unlock_irqrestore(&srq->lock, flags);
+	return 0;
+}
+
+int ehca_post_srq_recv(struct ib_srq *ib_srq, struct ib_recv_wr *recv_wr,
+		       struct ib_recv_wr **bad_recv_wr);
+{
+	struct ib_recv_wr *wr, *copy;
+	struct ehca_srq *srq;
+
+	srq = container_of(ib_srq, *srq, ib_srq);
+	for (wr = recv_wr; wr; wr = wr->next) {
+		copy = srq->first_polled;
+		if (!copy) {
+			*bad_recv_wr = wr;
+			return -ENOMEM;
+		}
+		srq->first_polled = copy->next;
+
+		memcpy(copy, wr, sizeof *copy);
+		if (wr->num_sge)
+			memcpy(copy->sg_list, wr->sg_list,
+			       wr->num_sge * sizeof *copy->sg_list);
+
+		copy->next = srq->first_posted;
+		srq->first_posted = copy;
+	}
+
+	ehca_srq_repost(srq);
+	return 0;
+}
+
+int ehca_srq_attach(struct ib_srq *ib_srq, struct ib_qp *qp)
+{
+	int i;
+	struct ehca_srq_cqe *cqe;
+	struct ehca_srq *srq;
+
+	srq = container_of(ib_srq, *srq, ib_srq);
+
+	spin_lock_irq(&srq->lock);
+	for (i = 0; i < srq->attr.max_wrs / EHCA_QPS_PER_SRQ; ++i) {
+		if (list_empty(&srq->free_cqes))
+			break;
+		cqe = list_entry(srq->free_cqes.next, typeof *cqe, list);
+		cqe->qp = qp;
+		list_move_tail(&cqe->list, &srq->polled_cqes);
+	}
+	spin_unlock_irq(&srq->lock);
+	if (!i)
+		return -ENOMEM;
+
+	return ehca_srq_repost(srq);
+}
+
+struct ib_srq *ehca_create_srq(struct ib_pd *pd,
+			       struct ib_srq_init_attr *srq_init_attr)
+{
+	struct ehca_srq *srq;
+	int i = 0;
+
+	srq = kmalloc(*srq, GFP_KERNEL);
+	if (!srq)
+		return ERR_PTR(-ENOMEM);
+
+	memcpy(&srq->attr, srq_init_attr, sizeof srq->attr);
+	spin_lock_init(&srq->lock);
+	INIT_LIST_HEAD(&srq->polled_cqes);
+	INIT_LIST_HEAD(&srq->free_cqes);
+	srq->first_posted = NULL;
+	srq->first_polled = NULL;
+
+	srq->wrs = kmalloc(sizeof *srq->wrs * srq->attr.max_wrs, GFP_KERNEL);
+	srq->cqes = kmalloc(sizeof *srq->cqes * srq->attr.max_wrs, GFP_KERNEL);
+	if (!srq->wrs || !srq->cqes)
+		goto err_arrays;
+
+	for(i = 0; i < srq->attr.max_wrs; ++i) {
+		srq->wrs[i] = kmalloc(sizeof srq->wrs[i], GFP_KERNEL);
+		if (!srq->wrs[i])
+			goto err_wr;
+		srq->wrs[i]->sg_list = kmalloc(sizeof srq->wrs[i]->sg_list *
+					       srq->attr.max_sge, GFP_KERNEL);
+		if (!srq->wrs[i]->sg_list) {
+			kfree(srq->wrs[i]);
+			goto err_wr;
+		}
+		list_add(&srq->cqes[i].list, &srq->free_cqes);
+		srq->wrs[i]->next = srq->first_polled;
+		srq->first_polled = srq->wrs[i];
+	}
+
+	return &srq->ib_srq;
+
+err_wr:
+	while(--i >= 0) {
+		kfree(srq->wrs[i]->sg_list);
+		kfree(srq->wrs[i]);
+	}
+
+err_arrays:
+	kfree(srq->wrs);
+	kfree(srq->cqes);
+	return ERR_PTR(-ENOMEM);
+}
+
+int ehca_destroy_srq(struct ib_srq *ib_srq)
+{
+	struct ehca_srq *srq;
+	int i;
+
+	srq = container_of(ib_srq, *srq, ib_srq);
+	for (i = 0; i < srq->attr.max_wrs; ++i) {
+		kfree(srq->wrs[i]->sg_list);
+		kfree(srq->wrs[i]);
+	}
+	kfree(srq->wrs);
+	kfree(srq->cqes);
+}
diff --git a/drivers/infiniband/hw/ehca/ehca_uverbs.c b/drivers/infiniband/hw/ehca/ehca_uverbs.c
index 73db920..a44354c 100644
--- a/drivers/infiniband/hw/ehca/ehca_uverbs.c
+++ b/drivers/infiniband/hw/ehca/ehca_uverbs.c
@@ -289,7 +289,7 @@ int ehca_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
 
 	case 2: /* QP */
 		spin_lock_irqsave(&ehca_qp_idr_lock, flags);
-		qp = idr_find(&ehca_qp_idr, idr_handle);
+		qp = idr_find(&ehca_qp_idr, RHCA_QP_TOKEN(idr_handle));
 		spin_unlock_irqrestore(&ehca_qp_idr_lock, flags);
 
 		/* make sure this mmap really belongs to the authorized user */
-- 
MST


From rdreier at cisco.com  Wed Jun 13 10:29:07 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 13 Jun 2007 10:29:07 -0700
Subject: [ofa-general] [PATCH/RFC] IB/mlx4: Handle new FW requirement for
	send request prefetching
In-Reply-To: <200706051602.14182.jackm@dev.mellanox.co.il> (Jack Morgenstein's
	message of "Tue, 5 Jun 2007 16:02:14 +0300")
References: <200706051602.14182.jackm@dev.mellanox.co.il>
Message-ID: <adaodjj69wc.fsf@cisco.com>

I just queued this patch to handle new FW up.  Please let me know if
it looks OK to you, and I will ask Linus to pull it.

Thanks.

commit f22332295cb218ad12db2b521a34553ff5790c34
Author: Roland Dreier <rolandd at cisco.com>
Date:   Wed Jun 13 10:26:43 2007 -0700

    IB/mlx4: Handle new FW requirement for send request prefetching
    
    New ConnectX firmware introduces FW command interface revision 2,
    which requires that for each QP, a chunk of send queue entries (the
    "headroom") is kept marked as invalid, so that the HCA doesn't get
    confused if it prefetches entries that haven't been posted yet.  Add
    code to the driver to do this, and also update the user ABI so that
    userspace can request that the prefetcher be turned off for userspace
    QPs (we just leave the prefetcher on for all kernel QPs).
    
    Marking send queue entries this way is OK for older firmware too, so
    we change the driver to allow FW command interface revisions 1 and 2.
    
    Based on a patch from Jack Morgenstein <jackm at dev.mellanox.co.il>.
    
    Signed-off-by: Roland Dreier <rolandd at cisco.com>

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 048c527..e940521 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -355,7 +355,7 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq,
 		wq = &(*cur_qp)->sq;
 		wqe_ctr = be16_to_cpu(cqe->wqe_index);
 		wq->tail += (u16) (wqe_ctr - (u16) wq->tail);
-		wc->wr_id = wq->wrid[wq->tail & (wq->max - 1)];
+		wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
 		++wq->tail;
 	} else if ((*cur_qp)->ibqp.srq) {
 		srq = to_msrq((*cur_qp)->ibqp.srq);
@@ -364,7 +364,7 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq,
 		mlx4_ib_free_srq_wqe(srq, wqe_ctr);
 	} else {
 		wq	  = &(*cur_qp)->rq;
-		wc->wr_id = wq->wrid[wq->tail & (wq->max - 1)];
+		wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
 		++wq->tail;
 	}
 
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 93dac71..24ccadd 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -95,7 +95,8 @@ struct mlx4_ib_mr {
 struct mlx4_ib_wq {
 	u64		       *wrid;
 	spinlock_t		lock;
-	int			max;
+	int			wqe_cnt;
+	int			max_post;
 	int			max_gs;
 	int			offset;
 	int			wqe_shift;
@@ -113,6 +114,7 @@ struct mlx4_ib_qp {
 
 	u32			doorbell_qpn;
 	__be32			sq_signal_bits;
+	int			sq_spare_wqes;
 	struct mlx4_ib_wq	sq;
 
 	struct ib_umem	       *umem;
@@ -123,6 +125,7 @@ struct mlx4_ib_qp {
 	u8			alt_port;
 	u8			atomic_rd_en;
 	u8			resp_depth;
+	u8			sq_no_prefetch;
 	u8			state;
 };
 
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 4c15fa3..8fabe0d 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -109,6 +109,20 @@ static void *get_send_wqe(struct mlx4_ib_qp *qp, int n)
 	return get_wqe(qp, qp->sq.offset + (n << qp->sq.wqe_shift));
 }
 
+/*
+ * Stamp a SQ WQE so that it is invalid if prefetched by marking the
+ * first four bytes of every 64 byte chunk with 0xffffffff, except for
+ * the very first chunk of the WQE.
+ */
+static void stamp_send_wqe(struct mlx4_ib_qp *qp, int n)
+{
+	u32 *wqe = get_send_wqe(qp, n);
+	int i;
+
+	for (i = 16; i < 1 << (qp->sq.wqe_shift - 2); i += 16)
+		wqe[i] = 0xffffffff;
+}
+
 static void mlx4_ib_qp_event(struct mlx4_qp *qp, enum mlx4_event type)
 {
 	struct ib_event event;
@@ -201,18 +215,18 @@ static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 		if (cap->max_recv_wr)
 			return -EINVAL;
 
-		qp->rq.max = qp->rq.max_gs = 0;
+		qp->rq.wqe_cnt = qp->rq.max_gs = 0;
 	} else {
 		/* HW requires >= 1 RQ entry with >= 1 gather entry */
 		if (is_user && (!cap->max_recv_wr || !cap->max_recv_sge))
 			return -EINVAL;
 
-		qp->rq.max	 = roundup_pow_of_two(max(1U, cap->max_recv_wr));
+		qp->rq.wqe_cnt	 = roundup_pow_of_two(max(1U, cap->max_recv_wr));
 		qp->rq.max_gs	 = roundup_pow_of_two(max(1U, cap->max_recv_sge));
 		qp->rq.wqe_shift = ilog2(qp->rq.max_gs * sizeof (struct mlx4_wqe_data_seg));
 	}
 
-	cap->max_recv_wr  = qp->rq.max;
+	cap->max_recv_wr  = qp->rq.max_post = qp->rq.wqe_cnt;
 	cap->max_recv_sge = qp->rq.max_gs;
 
 	return 0;
@@ -236,8 +250,6 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 	    cap->max_send_sge + 2 > dev->dev->caps.max_sq_sg)
 		return -EINVAL;
 
-	qp->sq.max = cap->max_send_wr ? roundup_pow_of_two(cap->max_send_wr) : 1;
-
 	qp->sq.wqe_shift = ilog2(roundup_pow_of_two(max(cap->max_send_sge *
 							sizeof (struct mlx4_wqe_data_seg),
 							cap->max_inline_data +
@@ -246,18 +258,25 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 	qp->sq.max_gs    = ((1 << qp->sq.wqe_shift) - send_wqe_overhead(type)) /
 		sizeof (struct mlx4_wqe_data_seg);
 
-	qp->buf_size = (qp->rq.max << qp->rq.wqe_shift) +
-		(qp->sq.max << qp->sq.wqe_shift);
+	/*
+	 * We need to leave 2 KB + 1 WQE of headroom in the SQ to
+	 * allow HW to prefetch.
+	 */
+	qp->sq_spare_wqes = (2048 >> qp->sq.wqe_shift) + 1;
+	qp->sq.wqe_cnt = roundup_pow_of_two(cap->max_send_wr + qp->sq_spare_wqes);
+
+	qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) +
+		(qp->sq.wqe_cnt << qp->sq.wqe_shift);
 	if (qp->rq.wqe_shift > qp->sq.wqe_shift) {
 		qp->rq.offset = 0;
-		qp->sq.offset = qp->rq.max << qp->rq.wqe_shift;
+		qp->sq.offset = qp->rq.wqe_cnt << qp->rq.wqe_shift;
 	} else {
-		qp->rq.offset = qp->sq.max << qp->sq.wqe_shift;
+		qp->rq.offset = qp->sq.wqe_cnt << qp->sq.wqe_shift;
 		qp->sq.offset = 0;
 	}
 
-	cap->max_send_wr     = qp->sq.max;
-	cap->max_send_sge    = qp->sq.max_gs;
+	cap->max_send_wr  = qp->sq.max_post = qp->sq.wqe_cnt - qp->sq_spare_wqes;
+	cap->max_send_sge = qp->sq.max_gs;
 	cap->max_inline_data = (1 << qp->sq.wqe_shift) - send_wqe_overhead(type) -
 		sizeof (struct mlx4_wqe_inline_seg);
 
@@ -267,11 +286,11 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 static int set_user_sq_size(struct mlx4_ib_qp *qp,
 			    struct mlx4_ib_create_qp *ucmd)
 {
-	qp->sq.max       = 1 << ucmd->log_sq_bb_count;
+	qp->sq.wqe_cnt   = 1 << ucmd->log_sq_bb_count;
 	qp->sq.wqe_shift = ucmd->log_sq_stride;
 
-	qp->buf_size = (qp->rq.max << qp->rq.wqe_shift) +
-		(qp->sq.max << qp->sq.wqe_shift);
+	qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) +
+		(qp->sq.wqe_cnt << qp->sq.wqe_shift);
 
 	return 0;
 }
@@ -307,6 +326,8 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 			goto err;
 		}
 
+		qp->sq_no_prefetch = ucmd.sq_no_prefetch;
+
 		err = set_user_sq_size(qp, &ucmd);
 		if (err)
 			goto err;
@@ -334,6 +355,8 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 				goto err_mtt;
 		}
 	} else {
+		qp->sq_no_prefetch = 0;
+
 		err = set_kernel_sq_size(dev, &init_attr->cap, init_attr->qp_type, qp);
 		if (err)
 			goto err;
@@ -360,8 +383,8 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 		if (err)
 			goto err_mtt;
 
-		qp->sq.wrid  = kmalloc(qp->sq.max * sizeof (u64), GFP_KERNEL);
-		qp->rq.wrid  = kmalloc(qp->rq.max * sizeof (u64), GFP_KERNEL);
+		qp->sq.wrid  = kmalloc(qp->sq.wqe_cnt * sizeof (u64), GFP_KERNEL);
+		qp->rq.wrid  = kmalloc(qp->rq.wqe_cnt * sizeof (u64), GFP_KERNEL);
 
 		if (!qp->sq.wrid || !qp->rq.wrid) {
 			err = -ENOMEM;
@@ -743,14 +766,17 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 		context->mtu_msgmax = (attr->path_mtu << 5) | 31;
 	}
 
-	if (qp->rq.max)
-		context->rq_size_stride = ilog2(qp->rq.max) << 3;
+	if (qp->rq.wqe_cnt)
+		context->rq_size_stride = ilog2(qp->rq.wqe_cnt) << 3;
 	context->rq_size_stride |= qp->rq.wqe_shift - 4;
 
-	if (qp->sq.max)
-		context->sq_size_stride = ilog2(qp->sq.max) << 3;
+	if (qp->sq.wqe_cnt)
+		context->sq_size_stride = ilog2(qp->sq.wqe_cnt) << 3;
 	context->sq_size_stride |= qp->sq.wqe_shift - 4;
 
+	if (cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT)
+		context->sq_size_stride |= !!qp->sq_no_prefetch << 7;
+
 	if (qp->ibqp.uobject)
 		context->usr_page = cpu_to_be32(to_mucontext(ibqp->uobject->context)->uar.index);
 	else
@@ -884,16 +910,19 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 
 	/*
 	 * Before passing a kernel QP to the HW, make sure that the
-	 * ownership bits of the send queue are set so that the
-	 * hardware doesn't start processing stale work requests.
+	 * ownership bits of the send queue are set and the SQ
+	 * headroom is stamped so that the hardware doesn't start
+	 * processing stale work requests.
 	 */
 	if (!ibqp->uobject && cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT) {
 		struct mlx4_wqe_ctrl_seg *ctrl;
 		int i;
 
-		for (i = 0; i < qp->sq.max; ++i) {
+		for (i = 0; i < qp->sq.wqe_cnt; ++i) {
 			ctrl = get_send_wqe(qp, i);
 			ctrl->owner_opcode = cpu_to_be32(1 << 31);
+
+			stamp_send_wqe(qp, i);
 		}
 	}
 
@@ -1124,7 +1153,7 @@ static int mlx4_wq_overflow(struct mlx4_ib_wq *wq, int nreq, struct ib_cq *ib_cq
 	struct mlx4_ib_cq *cq;
 
 	cur = wq->head - wq->tail;
-	if (likely(cur + nreq < wq->max))
+	if (likely(cur + nreq < wq->max_post))
 		return 0;
 
 	cq = to_mcq(ib_cq);
@@ -1132,7 +1161,7 @@ static int mlx4_wq_overflow(struct mlx4_ib_wq *wq, int nreq, struct ib_cq *ib_cq
 	cur = wq->head - wq->tail;
 	spin_unlock(&cq->lock);
 
-	return cur + nreq >= wq->max;
+	return cur + nreq >= wq->max_post;
 }
 
 int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
@@ -1165,8 +1194,8 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 			goto out;
 		}
 
-		ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.max - 1));
-		qp->sq.wrid[ind & (qp->sq.max - 1)] = wr->wr_id;
+		ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.wqe_cnt - 1));
+		qp->sq.wrid[ind & (qp->sq.wqe_cnt - 1)] = wr->wr_id;
 
 		ctrl->srcrb_flags =
 			(wr->send_flags & IB_SEND_SIGNALED ?
@@ -1301,7 +1330,16 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		}
 
 		ctrl->owner_opcode = mlx4_ib_opcode[wr->opcode] |
-			(ind & qp->sq.max ? cpu_to_be32(1 << 31) : 0);
+			(ind & qp->sq.wqe_cnt ? cpu_to_be32(1 << 31) : 0);
+
+		/*
+		 * We can improve latency by not stamping the last
+		 * send queue WQE until after ringing the doorbell, so
+		 * only stamp here if there are still more WQEs to post.
+		 */
+		if (wr->next)
+			stamp_send_wqe(qp, (ind + qp->sq_spare_wqes) &
+				       (qp->sq.wqe_cnt - 1));
 
 		++ind;
 	}
@@ -1324,6 +1362,9 @@ out:
 		 * and reach the HCA out of order.
 		 */
 		mmiowb();
+
+		stamp_send_wqe(qp, (ind + qp->sq_spare_wqes - 1) &
+			       (qp->sq.wqe_cnt - 1));
 	}
 
 	spin_unlock_irqrestore(&qp->rq.lock, flags);
@@ -1344,7 +1385,7 @@ int mlx4_ib_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
 
 	spin_lock_irqsave(&qp->rq.lock, flags);
 
-	ind = qp->rq.head & (qp->rq.max - 1);
+	ind = qp->rq.head & (qp->rq.wqe_cnt - 1);
 
 	for (nreq = 0; wr; ++nreq, wr = wr->next) {
 		if (mlx4_wq_overflow(&qp->rq, nreq, qp->ibqp.send_cq)) {
@@ -1375,7 +1416,7 @@ int mlx4_ib_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
 
 		qp->rq.wrid[ind] = wr->wr_id;
 
-		ind = (ind + 1) & (qp->rq.max - 1);
+		ind = (ind + 1) & (qp->rq.wqe_cnt - 1);
 	}
 
 out:
diff --git a/drivers/infiniband/hw/mlx4/user.h b/drivers/infiniband/hw/mlx4/user.h
index 88c72d5..e2d11be 100644
--- a/drivers/infiniband/hw/mlx4/user.h
+++ b/drivers/infiniband/hw/mlx4/user.h
@@ -39,7 +39,7 @@
  * Increment this value if any changes that break userspace ABI
  * compatibility are made.
  */
-#define MLX4_IB_UVERBS_ABI_VERSION	2
+#define MLX4_IB_UVERBS_ABI_VERSION	3
 
 /*
  * Make sure that all structs defined in this file remain laid out so
@@ -87,9 +87,10 @@ struct mlx4_ib_create_srq_resp {
 struct mlx4_ib_create_qp {
 	__u64	buf_addr;
 	__u64	db_addr;
-        __u8	log_sq_bb_count;
-        __u8	log_sq_stride;
-        __u8	reserved[6];
+	__u8	log_sq_bb_count;
+	__u8	log_sq_stride;
+	__u8	sq_no_prefetch;
+	__u8	reserved[5];
 };
 
 #endif /* MLX4_IB_USER_H */
diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
index e7ca118..1a7e52d 100644
--- a/drivers/net/mlx4/fw.c
+++ b/drivers/net/mlx4/fw.c
@@ -38,7 +38,8 @@
 #include "icm.h"
 
 enum {
-	MLX4_COMMAND_INTERFACE_REV	= 1
+	MLX4_COMMAND_INTERFACE_MIN_REV		= 1,
+	MLX4_COMMAND_INTERFACE_MAX_REV		= 2,
 };
 
 extern void __buggy_use_of_MLX4_GET(void);
@@ -491,7 +492,8 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev)
 		((fw_ver & 0x0000ffffull) << 16);
 
 	MLX4_GET(cmd_if_rev, outbox, QUERY_FW_CMD_IF_REV_OFFSET);
-	if (cmd_if_rev != MLX4_COMMAND_INTERFACE_REV) {
+	if (cmd_if_rev < MLX4_COMMAND_INTERFACE_MIN_REV ||
+	    cmd_if_rev > MLX4_COMMAND_INTERFACE_MAX_REV) {
 		mlx4_err(dev, "Installed FW has unsupported "
 			 "command interface revision %d.\n",
 			 cmd_if_rev);
@@ -499,8 +501,8 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev)
 			 (int) (dev->caps.fw_ver >> 32),
 			 (int) (dev->caps.fw_ver >> 16) & 0xffff,
 			 (int) dev->caps.fw_ver & 0xffff);
-		mlx4_err(dev, "This driver version supports only revision %d.\n",
-			 MLX4_COMMAND_INTERFACE_REV);
+		mlx4_err(dev, "This driver version supports only revisions %d to %d.\n",
+			 MLX4_COMMAND_INTERFACE_MIN_REV, MLX4_COMMAND_INTERFACE_MAX_REV);
 		err = -ENODEV;
 		goto out;
 	}


From rdreier at cisco.com  Wed Jun 13 10:34:39 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 13 Jun 2007 10:34:39 -0700
Subject: [ofa-general] [PATCH/RFC] libmlx4: Handle new FW requirement for
	send request prefetching
In-Reply-To: <adaodjj69wc.fsf@cisco.com> (Roland Dreier's message of "Wed,
	13 Jun 2007 10:29:07 -0700")
References: <200706051602.14182.jackm@dev.mellanox.co.il>
	<adaodjj69wc.fsf@cisco.com>
Message-ID: <adak5u769n4.fsf_-_@cisco.com>

Similarly I just added this to libmlx4.  The change to handle alignment
for inline send segments will be a separate patch, and I'm still
cleaning it up.  Anyway, let me know if you see any problems with
this.

BTW, with FW 2.0.158, I am seeing the HCA FW crash after running
ibv_srq_pingpong with default parameters.  Not sure if this is a
driver bug (I am using my latest kernel driver and libmlx4) or a
firmware problem.

commit 561da8d10e419ffb333fe6faf05004d9a3670e7a
Author: Roland Dreier <rolandd at cisco.com>
Date:   Wed Jun 13 10:31:16 2007 -0700

    Handle new FW requirement for send request prefetching
    
    New ConnectX firmware introduces FW command interface revision 2,
    which requires that for each QP, a chunk of send queue entries (the
    "headroom") is kept marked as invalid, so that the HCA doesn't get
    confused if it prefetches entries that haven't been posted yet.  Add
    code to libmlx4 to do this.
    
    Also, handle the new kernel ABI that adds the sq_no_prefetch parameter
    to the create QP operation.  We just hard-code sq_no_prefetch to 0 and
    always provide the full SQ headroom for now.
    
    Based on a patch from Jack Morgenstein <jackm at dev.mellanox.co.il>.
    
    Signed-off-by: Roland Dreier <rolandd at cisco.com>

diff --git a/src/cq.c b/src/cq.c
index a1831ff..f3e3e3c 100644
--- a/src/cq.c
+++ b/src/cq.c
@@ -239,7 +239,7 @@ static int mlx4_poll_one(struct mlx4_cq *cq,
 		wq = &(*cur_qp)->sq;
 		wqe_index = ntohs(cqe->wqe_index);
 		wq->tail += (uint16_t) (wqe_index - (uint16_t) wq->tail);
-		wc->wr_id = wq->wrid[wq->tail & (wq->max - 1)];
+		wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
 		++wq->tail;
 	} else if ((*cur_qp)->ibv_qp.srq) {
 		srq = to_msrq((*cur_qp)->ibv_qp.srq);
@@ -248,7 +248,7 @@ static int mlx4_poll_one(struct mlx4_cq *cq,
 		mlx4_free_srq_wqe(srq, wqe_index);
 	} else {
 		wq = &(*cur_qp)->rq;
-		wc->wr_id = wq->wrid[wq->tail & (wq->max - 1)];
+		wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
 		++wq->tail;
 	}
 
diff --git a/src/mlx4-abi.h b/src/mlx4-abi.h
index 97f5dcd..20a40c9 100644
--- a/src/mlx4-abi.h
+++ b/src/mlx4-abi.h
@@ -36,7 +36,7 @@
 #include <infiniband/kern-abi.h>
 
 #define MLX4_UVERBS_MIN_ABI_VERSION	2
-#define MLX4_UVERBS_MAX_ABI_VERSION	2
+#define MLX4_UVERBS_MAX_ABI_VERSION	3
 
 struct mlx4_alloc_ucontext_resp {
 	struct ibv_get_context_resp	ibv_resp;
@@ -86,7 +86,8 @@ struct mlx4_create_qp {
 	__u64				db_addr;
 	__u8				log_sq_bb_count;
 	__u8				log_sq_stride;
-	__u8				reserved[6];
+	__u8				sq_no_prefetch;	/* was reserved in ABI 2 */
+	__u8				reserved[5];
 };
 
 #endif /* MLX4_ABI_H */
diff --git a/src/mlx4.h b/src/mlx4.h
index e29f456..3710a17 100644
--- a/src/mlx4.h
+++ b/src/mlx4.h
@@ -200,7 +200,8 @@ struct mlx4_srq {
 struct mlx4_wq {
 	uint64_t		       *wrid;
 	pthread_spinlock_t		lock;
-	int				max;
+	int				wqe_cnt;
+	int				max_post;
 	unsigned			head;
 	unsigned			tail;
 	int				max_gs;
@@ -216,6 +217,7 @@ struct mlx4_qp {
 
 	uint32_t			doorbell_qpn;
 	uint32_t			sq_signal_bits;
+	int				sq_spare_wqes;
 	struct mlx4_wq			sq;
 
 	uint32_t		       *db;
@@ -342,6 +344,8 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
 			  struct ibv_send_wr **bad_wr);
 int mlx4_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr,
 			  struct ibv_recv_wr **bad_wr);
+void mlx4_calc_sq_wqe_size(struct ibv_qp_cap *cap, enum ibv_qp_type type,
+			   struct mlx4_qp *qp);
 int mlx4_alloc_qp_buf(struct ibv_pd *pd, struct ibv_qp_cap *cap,
 		       enum ibv_qp_type type, struct mlx4_qp *qp);
 void mlx4_set_sq_sizes(struct mlx4_qp *qp, struct ibv_qp_cap *cap,
diff --git a/src/qp.c b/src/qp.c
index 7df3311..301f7cb 100644
--- a/src/qp.c
+++ b/src/qp.c
@@ -65,6 +65,20 @@ static void *get_send_wqe(struct mlx4_qp *qp, int n)
 	return qp->buf.buf + qp->sq.offset + (n << qp->sq.wqe_shift);
 }
 
+/*
+ * Stamp a SQ WQE so that it is invalid if prefetched by marking the
+ * first four bytes of every 64 byte chunk with 0xffffffff, except for
+ * the very first chunk of the WQE.
+ */
+static void stamp_send_wqe(struct mlx4_qp *qp, int n)
+{
+	uint32_t *wqe = get_send_wqe(qp, n);
+	int i;
+
+	for (i = 16; i < 1 << (qp->sq.wqe_shift - 2); i += 16)
+		wqe[i] = 0xffffffff;
+}
+
 void mlx4_init_qp_indices(struct mlx4_qp *qp)
 {
 	qp->sq.head	 = 0;
@@ -78,9 +92,11 @@ void mlx4_qp_init_sq_ownership(struct mlx4_qp *qp)
 	struct mlx4_wqe_ctrl_seg *ctrl;
 	int i;
 
-	for (i = 0; i < qp->sq.max; ++i) {
+	for (i = 0; i < qp->sq.wqe_cnt; ++i) {
 		ctrl = get_send_wqe(qp, i);
 		ctrl->owner_opcode = htonl(1 << 31);
+
+		stamp_send_wqe(qp, i);
 	}
 }
 
@@ -89,14 +105,14 @@ static int wq_overflow(struct mlx4_wq *wq, int nreq, struct mlx4_cq *cq)
 	unsigned cur;
 
 	cur = wq->head - wq->tail;
-	if (cur + nreq < wq->max)
+	if (cur + nreq < wq->max_post)
 		return 0;
 
 	pthread_spin_lock(&cq->lock);
 	cur = wq->head - wq->tail;
 	pthread_spin_unlock(&cq->lock);
 
-	return cur + nreq >= wq->max;
+	return cur + nreq >= wq->max_post;
 }
 
 int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
@@ -138,8 +154,8 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
 			goto out;
 		}
 
-		ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.max - 1));
-		qp->sq.wrid[ind & (qp->sq.max - 1)] = wr->wr_id;
+		ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.wqe_cnt - 1));
+		qp->sq.wrid[ind & (qp->sq.wqe_cnt - 1)] = wr->wr_id;
 
 		ctrl->srcrb_flags =
 			(wr->send_flags & IBV_SEND_SIGNALED ?
@@ -274,7 +290,16 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
 		wmb();
 
 		ctrl->owner_opcode = htonl(mlx4_ib_opcode[wr->opcode]) |
-			(ind & qp->sq.max ? htonl(1 << 31) : 0);
+			(ind & qp->sq.wqe_cnt ? htonl(1 << 31) : 0);
+
+		/*
+		 * We can improve latency by not stamping the last
+		 * send queue WQE until after ringing the doorbell, so
+		 * only stamp here if there are still more WQEs to post.
+		 */
+		if (wr->next)
+			stamp_send_wqe(qp, (ind + qp->sq_spare_wqes) &
+				       (qp->sq.wqe_cnt - 1));
 
 		++ind;
 	}
@@ -313,6 +338,10 @@ out:
 		*(uint32_t *) (ctx->uar + MLX4_SEND_DOORBELL) = qp->doorbell_qpn;
 	}
 
+	if (nreq)
+		stamp_send_wqe(qp, (ind + qp->sq_spare_wqes - 1) &
+			       (qp->sq.wqe_cnt - 1));
+
 	pthread_spin_unlock(&qp->sq.lock);
 
 	return ret;
@@ -332,7 +361,7 @@ int mlx4_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr,
 
 	/* XXX check that state is OK to post receive */
 
-	ind = qp->rq.head & (qp->rq.max - 1);
+	ind = qp->rq.head & (qp->rq.wqe_cnt - 1);
 
 	for (nreq = 0; wr; ++nreq, wr = wr->next) {
 		if (wq_overflow(&qp->rq, nreq, to_mcq(qp->ibv_qp.recv_cq))) {
@@ -363,7 +392,7 @@ int mlx4_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr,
 
 		qp->rq.wrid[ind] = wr->wr_id;
 
-		ind = (ind + 1) & (qp->rq.max - 1);
+		ind = (ind + 1) & (qp->rq.wqe_cnt - 1);
 	}
 
 out:
@@ -384,36 +413,17 @@ out:
 	return ret;
 }
 
-int mlx4_alloc_qp_buf(struct ibv_pd *pd, struct ibv_qp_cap *cap,
-		       enum ibv_qp_type type, struct mlx4_qp *qp)
+void mlx4_calc_sq_wqe_size(struct ibv_qp_cap *cap, enum ibv_qp_type type,
+			   struct mlx4_qp *qp)
 {
 	int size;
 	int max_sq_sge;
 
-	qp->rq.max_gs	 = cap->max_recv_sge;
 	max_sq_sge	 = align(cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg),
 				 sizeof (struct mlx4_wqe_data_seg)) / sizeof (struct mlx4_wqe_data_seg);
 	if (max_sq_sge < cap->max_send_sge)
 		max_sq_sge = cap->max_send_sge;
 
-	qp->sq.wrid = malloc(qp->sq.max * sizeof (uint64_t));
-	if (!qp->sq.wrid)
-		return -1;
-
-	if (qp->rq.max) {
-		qp->rq.wrid = malloc(qp->rq.max * sizeof (uint64_t));
-		if (!qp->rq.wrid) {
-			free(qp->sq.wrid);
-			return -1;
-		}
-	}
-
-	size = qp->rq.max_gs * sizeof (struct mlx4_wqe_data_seg);
-
-	for (qp->rq.wqe_shift = 4; 1 << qp->rq.wqe_shift < size;
-	     qp->rq.wqe_shift++)
-		; /* nothing */
-
 	size = max_sq_sge * sizeof (struct mlx4_wqe_data_seg);
 	switch (type) {
 	case IBV_QPT_UD:
@@ -451,14 +461,37 @@ int mlx4_alloc_qp_buf(struct ibv_pd *pd, struct ibv_qp_cap *cap,
 	for (qp->sq.wqe_shift = 6; 1 << qp->sq.wqe_shift < size;
 	     qp->sq.wqe_shift++)
 		; /* nothing */
+}
+
+int mlx4_alloc_qp_buf(struct ibv_pd *pd, struct ibv_qp_cap *cap,
+		       enum ibv_qp_type type, struct mlx4_qp *qp)
+{
+	qp->rq.max_gs	 = cap->max_recv_sge;
+
+	qp->sq.wrid = malloc(qp->sq.wqe_cnt * sizeof (uint64_t));
+	if (!qp->sq.wrid)
+		return -1;
+
+	if (qp->rq.wqe_cnt) {
+		qp->rq.wrid = malloc(qp->rq.wqe_cnt * sizeof (uint64_t));
+		if (!qp->rq.wrid) {
+			free(qp->sq.wrid);
+			return -1;
+		}
+	}
+
+	for (qp->rq.wqe_shift = 4;
+	     1 << qp->rq.wqe_shift < qp->rq.max_gs * sizeof (struct mlx4_wqe_data_seg);
+	     qp->rq.wqe_shift++)
+		; /* nothing */
 
-	qp->buf_size = (qp->rq.max << qp->rq.wqe_shift) +
-		(qp->sq.max << qp->sq.wqe_shift);
+	qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) +
+		(qp->sq.wqe_cnt << qp->sq.wqe_shift);
 	if (qp->rq.wqe_shift > qp->sq.wqe_shift) {
 		qp->rq.offset = 0;
-		qp->sq.offset = qp->rq.max << qp->rq.wqe_shift;
+		qp->sq.offset = qp->rq.wqe_cnt << qp->rq.wqe_shift;
 	} else {
-		qp->rq.offset = qp->sq.max << qp->sq.wqe_shift;
+		qp->rq.offset = qp->sq.wqe_cnt << qp->sq.wqe_shift;
 		qp->sq.offset = 0;
 	}
 
@@ -499,6 +532,8 @@ void mlx4_set_sq_sizes(struct mlx4_qp *qp, struct ibv_qp_cap *cap,
 	cap->max_send_sge    = qp->sq.max_gs;
 	qp->max_inline_data  = wqe_size - sizeof (struct mlx4_wqe_inline_seg);
 	cap->max_inline_data = qp->max_inline_data;
+	qp->sq.max_post	     = qp->sq.wqe_cnt - qp->sq_spare_wqes;
+	cap->max_send_wr     = qp->sq.max_post;
 }
 
 struct mlx4_qp *mlx4_find_qp(struct mlx4_context *ctx, uint32_t qpn)
diff --git a/src/verbs.c b/src/verbs.c
index 52ca0c8..2243b6c 100644
--- a/src/verbs.c
+++ b/src/verbs.c
@@ -355,11 +355,18 @@ struct ibv_qp *mlx4_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr)
 	if (!qp)
 		return NULL;
 
-	qp->sq.max = align_queue_size(attr->cap.max_send_wr);
-	qp->rq.max = align_queue_size(attr->cap.max_recv_wr);
+	mlx4_calc_sq_wqe_size(&attr->cap, attr->qp_type, qp);
+
+	/*
+	 * We need to leave 2 KB + 1 WQE of headroom in the SQ to
+	 * allow HW to prefetch.
+	 */
+	qp->sq_spare_wqes = (2048 >> qp->sq.wqe_shift) + 1;
+	qp->sq.wqe_cnt = align_queue_size(attr->cap.max_send_wr + qp->sq_spare_wqes);
+	qp->rq.wqe_cnt = align_queue_size(attr->cap.max_recv_wr);
 
 	if (attr->srq)
-		attr->cap.max_recv_wr = qp->rq.max = 0;
+		attr->cap.max_recv_wr = qp->rq.wqe_cnt = 0;
 	else if (attr->cap.max_recv_sge < 1)
 		attr->cap.max_recv_sge = 1;
 
@@ -387,9 +394,10 @@ struct ibv_qp *mlx4_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr)
 		cmd.db_addr = (uintptr_t) qp->db;
 	cmd.log_sq_stride   = qp->sq.wqe_shift;
 	for (cmd.log_sq_bb_count = 0;
-	     qp->sq.max > 1 << cmd.log_sq_bb_count;
+	     qp->sq.wqe_cnt > 1 << cmd.log_sq_bb_count;
 	     ++cmd.log_sq_bb_count)
 		; /* nothing */
+	cmd.sq_no_prefetch = 0;	/* OK for ABI 2: just a reserved field */
 	memset(cmd.reserved, 0, sizeof cmd.reserved);
 
 	ret = ibv_cmd_create_qp(pd, &qp->ibv_qp, attr, &cmd.ibv_cmd, sizeof cmd,
@@ -401,8 +409,8 @@ struct ibv_qp *mlx4_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr)
 	if (ret)
 		goto err_destroy;
 
-	qp->rq.max    = attr->cap.max_recv_wr;
-	qp->rq.max_gs = attr->cap.max_recv_sge;
+	qp->rq.wqe_cnt = qp->rq.max_post = attr->cap.max_recv_wr;
+	qp->rq.max_gs  = attr->cap.max_recv_sge;
 	mlx4_set_sq_sizes(qp, &attr->cap, attr->qp_type);
 
 	qp->doorbell_qpn    = htonl(qp->ibv_qp.qp_num << 8);
@@ -422,7 +430,7 @@ err_rq_db:
 
 err_free:
 	free(qp->sq.wrid);
-	if (qp->rq.max)
+	if (qp->rq.wqe_cnt)
 		free(qp->rq.wrid);
 	mlx4_free_buf(&qp->buf);
 
@@ -527,7 +535,7 @@ int mlx4_destroy_qp(struct ibv_qp *ibqp)
 	if (!ibqp->srq)
 		mlx4_free_db(to_mctx(ibqp->context), MLX4_DB_TYPE_RQ, qp->db);
 	free(qp->sq.wrid);
-	if (qp->rq.max)
+	if (qp->rq.wqe_cnt)
 		free(qp->rq.wrid);
 	mlx4_free_buf(&qp->buf);
 	free(qp);


From rdreier at cisco.com  Wed Jun 13 10:37:27 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 13 Jun 2007 10:37:27 -0700
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <20070613163821.GB12277@mellanox.co.il> (Michael S. Tsirkin's
	message of "Wed, 13 Jun 2007 19:38:21 +0300")
References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com>
	<466F36C8.5010507@linux.vnet.ibm.com>
	<20070613163821.GB12277@mellanox.co.il>
Message-ID: <adafy4v69ig.fsf@cisco.com>

Not sure I follow how this code works.  What happens if I attach 100
QPs to an SRQ and then post only 50 receives?

 - R.


From mst at dev.mellanox.co.il  Wed Jun 13 10:49:30 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Jun 2007 20:49:30 +0300
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <adafy4v69ig.fsf@cisco.com>
References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com>
	<466F36C8.5010507@linux.vnet.ibm.com>
	<20070613163821.GB12277@mellanox.co.il> <adafy4v69ig.fsf@cisco.com>
Message-ID: <20070613174930.GE12277@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH draft, untested] ehca srq emulation (for IPoIB CM)
> 
> Not sure I follow how this code works.  What happens if I attach 100
> QPs to an SRQ and then post only 50 receives?

Thanks for asking. 

Note this is not a full emulation, just close enough to make IPoIB CM work.

The assumption I made is that you will post max_wrs receives (this is what IPoIB does).

If this is what you do, each QP will get WR_PER_QP receives (its a macro now,
can be made module option, or exposed in srq_attr).
And if you try to attach more than max_wrs/WR_PER_QP QPs, create QP will fail.


-- 
MST


From sean.hefty at intel.com  Wed Jun 13 11:02:47 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 13 Jun 2007 11:02:47 -0700
Subject: [ofa-general] Re: crash in ipoib
In-Reply-To: <20070613084531.GG1975@mellanox.co.il>
Message-ID: <000001c7ade5$0daebd20$4acc180a@amr.corp.intel.com>

>This looks strange. Can you supply some more data please?
>Which HCA are you running on?
>What test are you running?
>What should I do to reproduce this?
>Further, could you supply the full oops?

Woody will need to answer the test/config questions.  The oops is only displayed
on the screen, and the stack trace is about 50-75 calls long.  The start of the
oops gets pushed off the screen.  (Can we be overrunning the stack?)  I'm not at
the systems today, but can probably get what else is available tomorrow.

We have, I think, up to 16 systems running the tests, and we only see failures
on specific nodes (which all happen to be the same type of system
).

- Sean


From rdreier at cisco.com  Wed Jun 13 11:05:15 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 13 Jun 2007 11:05:15 -0700
Subject: [ofa-general] Re: crash in ipoib
In-Reply-To: <000001c7ade5$0daebd20$4acc180a@amr.corp.intel.com> (Sean Hefty's
	message of "Wed, 13 Jun 2007 11:02:47 -0700")
References: <000001c7ade5$0daebd20$4acc180a@amr.corp.intel.com>
Message-ID: <ada3b0v6884.fsf@cisco.com>

 > Woody will need to answer the test/config questions.  The oops is only displayed
 > on the screen, and the stack trace is about 50-75 calls long.  The start of the
 > oops gets pushed off the screen.  (Can we be overrunning the stack?)  I'm not at
 > the systems today, but can probably get what else is available tomorrow.

If you don't have serial console, it might be worth trying to get
netconsole working.  It's usually pretty simple to set up (see
Documentation/networking/netconsole.txt, you basically just need
another system running netcat to capture the log messages).

 - R.


From mst at dev.mellanox.co.il  Wed Jun 13 11:09:49 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Jun 2007 21:09:49 +0300
Subject: [ofa-general] Re: Re: crash in ipoib
In-Reply-To: <000001c7ade5$0daebd20$4acc180a@amr.corp.intel.com>
References: <20070613084531.GG1975@mellanox.co.il>
	<000001c7ade5$0daebd20$4acc180a@amr.corp.intel.com>
Message-ID: <20070613180949.GH12277@mellanox.co.il>

> Quoting Sean Hefty <sean.hefty at intel.com>:
> Subject: RE: Re: crash in ipoib
> 
> >This looks strange. Can you supply some more data please?
> >Which HCA are you running on?
> >What test are you running?
> >What should I do to reproduce this?
> >Further, could you supply the full oops?
> 
> Woody will need to answer the test/config questions.  The oops is only displayed
> on the screen, and the stack trace is about 50-75 calls long.  The start of the
> oops gets pushed off the screen.  (Can we be overrunning the stack?)  I'm not at
> the systems today, but can probably get what else is available tomorrow.

Getting a serial console would be the thing to do then.
If you are worried about stack overflow, build your kernel
with stack instrumentation.
It's quite likely the real oops reason has scrolled off the screen,
what you post here could be thre result of fullowing memory corruption.

> We have, I think, up to 16 systems running the tests, and we only see failures
> on specific nodes (which all happen to be the same type of system
> ).

One thing to try to check is whether it's kernel-specific.
What happens if you install a different kernel/OS there?
Try RHEL5 or just build 2.6.20 kernel there.
Does it still happen?

-- 
MST


From mst at dev.mellanox.co.il  Wed Jun 13 11:15:10 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Jun 2007 21:15:10 +0300
Subject: [ofa-general] Re: crash in ipoib
In-Reply-To: <ada3b0v6884.fsf@cisco.com>
References: <000001c7ade5$0daebd20$4acc180a@amr.corp.intel.com>
	<ada3b0v6884.fsf@cisco.com>
Message-ID: <20070613181510.GI12277@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [ofa-general] Re: crash in ipoib
> 
>  > Woody will need to answer the test/config questions.  The oops is only displayed
>  > on the screen, and the stack trace is about 50-75 calls long.  The start of the
>  > oops gets pushed off the screen.  (Can we be overrunning the stack?)  I'm not at
>  > the systems today, but can probably get what else is available tomorrow.
> 
> If you don't have serial console, it might be worth trying to get
> netconsole working.  It's usually pretty simple to set up (see
> Documentation/networking/netconsole.txt, you basically just need
> another system running netcat to capture the log messages).

I don't think it's available on that ancient kernel.

-- 
MST


From rdreier at cisco.com  Wed Jun 13 11:18:08 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 13 Jun 2007 11:18:08 -0700
Subject: [ofa-general] Re: crash in ipoib
In-Reply-To: <20070613181510.GI12277@mellanox.co.il> (Michael S. Tsirkin's
	message of "Wed, 13 Jun 2007 21:15:10 +0300")
References: <000001c7ade5$0daebd20$4acc180a@amr.corp.intel.com>
	<ada3b0v6884.fsf@cisco.com> <20070613181510.GI12277@mellanox.co.il>
Message-ID: <aday7in4t27.fsf@cisco.com>

 > I don't think it's available on that ancient kernel.

I thought RHEL4 patched netconsole in... at least 'modinfo netconsole'
seems to find something on a RHEL4 system we have around.

 - R.


From mst at dev.mellanox.co.il  Wed Jun 13 11:23:06 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Jun 2007 21:23:06 +0300
Subject: [ofa-general] Re: crash in ipoib
In-Reply-To: <aday7in4t27.fsf@cisco.com>
References: <000001c7ade5$0daebd20$4acc180a@amr.corp.intel.com>
	<ada3b0v6884.fsf@cisco.com> <20070613181510.GI12277@mellanox.co.il>
	<aday7in4t27.fsf@cisco.com>
Message-ID: <20070613182306.GJ12277@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [ofa-general] Re: crash in ipoib
> 
>  > I don't think it's available on that ancient kernel.
> 
> I thought RHEL4 patched netconsole in... at least 'modinfo netconsole'
> seems to find something on a RHEL4 system we have around.

Cool, worth a try then.

-- 
MST


From robert.j.woodruff at intel.com  Wed Jun 13 12:29:17 2007
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Wed, 13 Jun 2007 12:29:17 -0700
Subject: [ofa-general] RE: Re: crash in ipoib
In-Reply-To: <20070613180949.GH12277@mellanox.co.il>
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C027FC818@orsmsx418.amr.corp.intel.com>

We are running on a RHEL EL4 2.6.9-42EL kernel on a rocks install.

The tests I run are IMB with Intel MPI over uDAPL and at the same
time as IMB over IPopIB. It usiually takes at least 1 day sometimes 2 
days of running IMB in a loop with various number of processes per node,
1,2, and 4. It seems to fail randomly, not on the same
node everytime, so it is not feasible to connect a serial console 
to every node. It would also be hard for us to put in a new kernel
as this has problems with rocks. The systems are the older Xeon,
Lindenhurst, 3.6Ghz

I have not seen this error on any other kernel or system, I have tested
RHEL5 and RHEL4-U5, but only on 2 nodes, but that does not seem 
to fail. We also having OFED 1.2 running on a 64 and 256 node production
applications
development clusters and they have not reported any similar problems,
but they
are not running the same tests. 

I plan on loading OFED 1.2-rc5 today. Is there an easy way to build the 
IPoIB driver from the OFED installer so that it has debug enabled ?

 woody

-----Original Message-----
From: Michael S. Tsirkin [mailto:mst at dev.mellanox.co.il] 
Sent: Wednesday, June 13, 2007 11:10 AM
To: Hefty, Sean
Cc: 'Michael S. Tsirkin'; Sean Hefty; Woodruff, Robert J; 'Vladimir
Sokolovsky'; general at lists.openfabrics.org
Subject: Re: Re: crash in ipoib

> Quoting Sean Hefty <sean.hefty at intel.com>:
> Subject: RE: Re: crash in ipoib
> 
> >This looks strange. Can you supply some more data please?
> >Which HCA are you running on?
> >What test are you running?
> >What should I do to reproduce this?
> >Further, could you supply the full oops?
> 
> Woody will need to answer the test/config questions.  The oops is only
displayed
> on the screen, and the stack trace is about 50-75 calls long.  The
start of the
> oops gets pushed off the screen.  (Can we be overrunning the stack?)
I'm not at
> the systems today, but can probably get what else is available
tomorrow.

Getting a serial console would be the thing to do then.
If you are worried about stack overflow, build your kernel
with stack instrumentation.
It's quite likely the real oops reason has scrolled off the screen,
what you post here could be thre result of fullowing memory corruption.

> We have, I think, up to 16 systems running the tests, and we only see
failures
> on specific nodes (which all happen to be the same type of system
> ).

One thing to try to check is whether it's kernel-specific.
What happens if you install a different kernel/OS there?
Try RHEL5 or just build 2.6.20 kernel there.
Does it still happen?

-- 
MST


From rdreier at cisco.com  Wed Jun 13 12:30:42 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 13 Jun 2007 12:30:42 -0700
Subject: [ofa-general] RE: Re: crash in ipoib
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691C027FC818@orsmsx418.amr.corp.intel.com>
	(Robert J. Woodruff's message of "Wed,
	13 Jun 2007 12:29:17 -0700")
References: <BAE9DCEF64577A439B3A37F36F9B691C027FC818@orsmsx418.amr.corp.intel.com>
Message-ID: <adatztb4pp9.fsf@cisco.com>

 > I plan on loading OFED 1.2-rc5 today. Is there an easy way to build the 
 > IPoIB driver from the OFED installer so that it has debug enabled ?

I would hope that that is the way the installer builds it by default.


From robert.j.woodruff at intel.com  Wed Jun 13 12:48:46 2007
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Wed, 13 Jun 2007 12:48:46 -0700
Subject: [ofa-general] RE: Re: crash in ipoib
In-Reply-To: <adatztb4pp9.fsf@cisco.com>
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C027FC851@orsmsx418.amr.corp.intel.com>

Roland wrote,

>I would hope that that is the way the installer builds it by default.

I found am option in the ofed.conf that allows additional
parameters to be passed to the build.

I added this and am rebuilding it now.

OFA_KERNEL_PARAMS="--with-memtrack --with-ipoib_debug-mod"


From rdreier at cisco.com  Wed Jun 13 13:40:33 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 13 Jun 2007 13:40:33 -0700
Subject: [ofa-general] Re: [PATCH 2 of 2] mlx4: deal with ownership bit
	wraparound when cleaning cq
In-Reply-To: <200706130836.25074.jackm@dev.mellanox.co.il> (Jack Morgenstein's
	message of "Wed, 13 Jun 2007 08:36:24 +0300")
References: <200706130836.25074.jackm@dev.mellanox.co.il>
Message-ID: <adair9r4mgu.fsf@cisco.com>

thanks, applied 1 & 2.


From sweitzen at cisco.com  Wed Jun 13 17:50:21 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 13 Jun 2007 17:50:21 -0700
Subject: [ofa-general] OFED 1.2 rc5 release
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90156362A@mtlexch01.mtl.com>
References: <43AA3CB3C1BF5A499F5AAD31CA5023AC06624A26@mtlexch01.mtl.com><6C2C79E72C305246B504CBA17B5500C9015634B7@mtlexch01.mtl.com>
	<6C2C79E72C305246B504CBA17B5500C90156362A@mtlexch01.mtl.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA303B0DFB0@xmb-sjc-216.amer.cisco.com>

I have created 1.2rc5 in bugzilla.

Tziporet, I'm not sure how you created your "fixed in rc5" list, but
some of the bugs on it are still open (for example,
https://bugs.openfabrics.org/show_bug.cgi?id=577).

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

> -----Original Message-----
> From: general-bounces at lists.openfabrics.org 
> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of 
> Tziporet Koren
> Sent: Wednesday, June 13, 2007 7:26 AM
> To: ewg at lists.openfabrics.org
> Cc: general at lists.openfabrics.org
> Subject: [ofa-general] OFED 1.2 rc5 release
> 
> Hi, 
> 
> OFED 1.2-RC5 is available on
> http://www.openfabrics.org/builds/ofed-1.2/ 
> File: OFED-1.2-rc5.tgz 
> To get BUILD_ID run ofed_info 
> 
> Please report any issues in bugzilla https://bugs.openfabrics.org/
> 
> The GA release is expected next Wed (June 20) based on RC5 tests
> 
> Tziporet & Vlad 
> 
> ==============================================================
> ==========
> 
> Release information: 
> 
> OS support: 
> Novell: 
>     - SLES 9.0 SP3 
>     - SLES10 
>     - SLES10 SP1 RC5
> Redhat: 
>     - Redhat EL4 up3, up4 and up5 
>     - Redhat EL5 
> kernel.org: 
>     - 2.6.20 
>     - 2.6.19 
> 
> Note: Fedora C6 and SuSE Pro 10 are not part of the official list. 
> We keep the backport patches for these OSes and make sure OFED compile
> and loaded properly but will not do full QA cycle.
> 
> Systems: 
>     * x86_64 
>     * x86 
>     * ia64 
>     * ppc64 
> 
> Main changes from OFED-1.1-rc4: 
> ===============================
> 1. Fixed 8 bugs (see attached for fixed issues)
> 2. Added support for SLES10 SP1 RC5 (tvflash is disabled for now)
> 3. Added support for iSER on RHEL 4
> 4. Updated documents - all owners please review to make sure docs of
> your component is updated.
> 
> See bugzilla for all open issues. 
> 
> Tasks that should be completed for the GA release: 
> 1. Complete all documentation (release notes, README, etc.) 
> 2. Run all QA tests on all platforms
> 


From vlad at dev.mellanox.co.il  Thu Jun 14 00:08:03 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Thu, 14 Jun 2007 10:08:03 +0300
Subject: [ofa-general] [GIT PULL] please pull tvflash.git
In-Reply-To: <ada1wgf7rgw.fsf@cisco.com>
References: <ada1wgf7rgw.fsf@cisco.com>
Message-ID: <4670E953.9060703@dev.mellanox.co.il>

Roland Dreier wrote:
> Vlad, please pull from
> 
>     git://staging.openfabrics.org/~rdreier/tvflash.git
> 
> to get tvflash updates that will fix problems building on SLES 10 SP1
> and Fedora 7 due to linking with libgz or libz
> (https://bugs.openfabrics.org/show_bug.cgi?id=558).
> 
> Thanks,
>   Roland
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 

Done,

Regards,
Vladimir


From kliteyn at dev.mellanox.co.il  Thu Jun 14 01:19:57 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 14 Jun 2007 11:19:57 +0300
Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node guid
 files options for fat-tree
Message-ID: <4670FA2D.7070708@dev.mellanox.co.il>

Hi Hal.

The following three patches are adding root and compute node guid files
options for fat-tree routing, reading these files in fat-tree, and
taking care of non-compute nodes when creating fat-tree order file.

[1/3] Added two options:
        ftree_root_guid_file - file that contains list of root guids
        ftree_cn_guid_file - file that contains list of compute node guids
      For now, these options are exposed via options file only.

[2/3] Fat-tree routing reads root guid file and compute node guid file,
      and creates map of roots and compute nodes (CNs) to be used later.

[3/3] Non-CNs are treated as "dummies" when creating fat-tree order file,
      because they are not participating in the MPI all-to-all communication.

-- Yevgeny


From kliteyn at dev.mellanox.co.il  Thu Jun 14 01:20:06 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 14 Jun 2007 11:20:06 +0300
Subject: [ofa-general] PATCH [1/3] osm: adding root and compute node guid
 files options for fat-tree
Message-ID: <4670FA36.6060303@dev.mellanox.co.il>

Hi Hal,

Added two options:
  ftree_root_guid_file - file that contains list of root guids for fat-tree routing
  ftree_cn_guid_file - file that contains list of compute node guidsfor fat-tree routing

For now, these options are exposed via options file only.

Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 opensm/include/opensm/osm_subnet.h |   10 ++++++++++
 opensm/opensm/osm_subnet.c         |   22 ++++++++++++++++++++++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
index c62128b..46d90d6 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -279,6 +279,8 @@ typedef struct _osm_subn_opt
   char *                   lid_matrix_dump_file;
   char *                   ucast_dump_file;
   char *                   updn_guid_file;
+  char *                   ftree_root_guid_file;
+  char *                   ftree_cn_guid_file;
   char *                   sa_db_file;
   boolean_t                exit_on_fatal;
   boolean_t                honor_guid2lid_file;
@@ -455,6 +457,14 @@ typedef struct _osm_subn_opt
 *	updn_guid_file
 *		Pointer to name of the UPDN guid file given by User
 *
+*	ftree_root_guid_file
+*		Name of the file that contains list of root guids that
+*		will be used by fat-tree routing (provided by User)
+*
+*	ftree_cn_guid_file
+*		Name of the file that contains list of compute node guids that
+*		will be used by fat-tree routing (provided by User)
+*
 *	sa_db_file
 *		Name of the SA database file.
 *
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 736f49a..a39ada6 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -501,6 +501,8 @@ osm_subn_set_default_opt(
   p_opt->lid_matrix_dump_file = NULL;
   p_opt->ucast_dump_file = NULL;
   p_opt->updn_guid_file = NULL;
+  p_opt->ftree_root_guid_file = NULL;
+  p_opt->ftree_cn_guid_file = NULL;
   p_opt->sa_db_file = NULL;
   p_opt->exit_on_fatal = TRUE;
   p_opt->enable_quirks = FALSE;
@@ -1326,6 +1328,14 @@ osm_subn_parse_conf_file(
         "updn_guid_file",
         p_key, p_val, &p_opts->updn_guid_file);
 
+      __osm_subn_opts_unpack_charp( 
+        "updn_guid_file",
+        p_key, p_val, &p_opts->ftree_root_guid_file);
+
+      __osm_subn_opts_unpack_charp( 
+        "updn_guid_file",
+        p_key, p_val, &p_opts->ftree_cn_guid_file);
+
       __osm_subn_opts_unpack_charp(
         "sa_db_file",
         p_key, p_val, &p_opts->sa_db_file);
@@ -1554,6 +1564,18 @@ osm_subn_write_conf_file(
              "# One guid in each line\n"
              "updn_guid_file %s\n\n",
              p_opts->updn_guid_file);
+  if (p_opts->ftree_root_guid_file)
+    fprintf( opts_file,
+             "# The file holding the fat-tree root node guids\n"
+             "# One guid in each line\n"
+             "ftree_root_guid_file %s\n\n",
+             p_opts->ftree_root_guid_file);
+  if (p_opts->ftree_cn_guid_file)
+    fprintf( opts_file,
+             "# The file holding the fat-tree compute node guids\n"
+             "# One guid in each line\n"
+             "ftree_cn_guid_file %s\n\n",
+             p_opts->ftree_cn_guid_file);
   if (p_opts->sa_db_file)
     fprintf( opts_file,
              "# SA database file name\n"
-- 
1.5.1.4


From kliteyn at dev.mellanox.co.il  Thu Jun 14 01:20:19 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 14 Jun 2007 11:20:19 +0300
Subject: [ofa-general] [PATCH 3/3] osm: adding root and compute node guid
 files options for fat-tree
Message-ID: <4670FA43.9090904@dev.mellanox.co.il>

Hi Hal,

Non-CNs are treated as "dummies" when creating fat-tree order file,
because they are not participating in the MPI all-to-all communication.

-- Yevgeny

Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 opensm/opensm/osm_ucast_ftree.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c
index b1ee0ca..d3ff45f 100644
--- a/opensm/opensm/osm_ucast_ftree.c
+++ b/opensm/opensm/osm_ucast_ftree.c
@@ -1373,9 +1373,13 @@ __osm_ftree_fabric_dump_hca_ordering(
          p_group = p_sw->down_port_groups[j];
          p_hca = p_group->remote_hca_or_sw.remote_hca;
 
-         fprintf(p_hca_ordering_file,"0x%x\t%s\n", 
-                 cl_ntoh16(p_group->remote_base_lid),
-                 p_hca->p_osm_node->print_desc);
+         /* treat non-compute nodes as dummy */
+         if (p_hca->is_cn)
+            fprintf(p_hca_ordering_file,"0x%x\t%s\n", 
+                    cl_ntoh16(p_group->remote_base_lid),
+                    p_hca->p_osm_node->print_desc);
+         else
+            fprintf(p_hca_ordering_file,"0xFFFF\tDUMMY\n");
       }
 
       /* now print dummy HCAs */
-- 
1.5.1.4


From kliteyn at dev.mellanox.co.il  Thu Jun 14 01:20:13 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 14 Jun 2007 11:20:13 +0300
Subject: [ofa-general] PATCH [2/3] osm: adding root and compute node guid
 files options for fat-tree
Message-ID: <4670FA3D.3090500@dev.mellanox.co.il>

Hi Hal.

Fat-tree routing reads root guid file and compute node guid file,
and creates map of roots and compute nodes (CNs) to be used later.

--Yevgeny

Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 opensm/opensm/osm_ucast_ftree.c |  232 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 232 insertions(+), 0 deletions(-)

diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c
index 1730ef2..b1ee0ca 100644
--- a/opensm/opensm/osm_ucast_ftree.c
+++ b/opensm/opensm/osm_ucast_ftree.c
@@ -119,6 +119,17 @@ typedef struct {
 
 /***************************************************
  **
+ **  ftree_guid_tbl_element_t definition
+ **
+ ***************************************************/
+
+typedef struct {
+   cl_map_item_t map_item;
+   uint64_t guid;
+} ftree_guid_tbl_element_t;
+
+/***************************************************
+ **
  **  ftree_fwd_tbl_t definition
  **
  ***************************************************/
@@ -182,6 +193,7 @@ typedef struct ftree_sw_t_
    ftree_port_group_t  ** up_port_groups;
    uint8_t                up_port_groups_num;
    ftree_fwd_tbl_t        lft_buf;
+   boolean_t              is_root;
 } ftree_sw_t;
 
 /***************************************************
@@ -195,6 +207,7 @@ typedef struct ftree_hca_t_ {
    osm_node_t           * p_osm_node;
    ftree_port_group_t  ** up_port_groups;
    uint16_t               up_port_groups_num;
+   boolean_t              is_cn;
 } ftree_hca_t;
 
 /***************************************************
@@ -209,6 +222,8 @@ typedef struct ftree_fabric_t_
    cl_qmap_t       hca_tbl;
    cl_qmap_t       sw_tbl;
    cl_qmap_t       sw_by_tuple_tbl;
+   cl_qmap_t       cn_guids_tbl;
+   cl_qmap_t       root_guids_tbl;
    uint8_t         tree_rank;
    ftree_sw_t   ** leaf_switches;
    uint32_t        leaf_switches_num;
@@ -393,6 +408,36 @@ __osm_ftree_sw_tbl_element_destroy(
 
 /***************************************************
  **
+ ** ftree_guid_tbl_element_t functions
+ **
+ ***************************************************/
+
+static ftree_guid_tbl_element_t *
+__osm_ftree_guid_tbl_element_create(
+   IN  uint64_t guid)
+{
+   ftree_guid_tbl_element_t * p_element = 
+      (ftree_guid_tbl_element_t *) malloc(sizeof(ftree_guid_tbl_element_t));
+   if (!p_element)
+       return NULL;
+
+   memset(p_element, 0,sizeof(ftree_guid_tbl_element_t));
+   p_element->guid = guid;
+   return p_element;
+}
+
+/***************************************************/
+
+static void
+__osm_ftree_guid_tbl_element_destroy(
+   IN  ftree_guid_tbl_element_t * p_element)
+{
+   if (p_element)
+      free(p_element);
+}
+
+/***************************************************
+ **
  ** ftree_port_t functions
  **
  ***************************************************/
@@ -607,6 +652,9 @@ __osm_ftree_sw_create(
    p_sw->lft_buf = (ftree_fwd_tbl_t)cl_pool_get(&p_ftree->sw_fwd_tbl_pool);
    memset(p_sw->lft_buf, OSM_NO_PATH, FTREE_FWD_TBL_LEN);
 
+   /* by default the switch is not root */
+   p_sw->is_root = FALSE;
+
    return p_sw;
 } /* __osm_ftree_sw_create() */
 
@@ -810,6 +858,10 @@ __osm_ftree_hca_create(
    if (!p_hca->up_port_groups)
       return NULL;
    p_hca->up_port_groups_num = 0;
+
+   /* by default every CA is treated as compute node */
+   p_hca->is_cn = TRUE;
+
    return p_hca;
 }
 
@@ -934,6 +986,9 @@ __osm_ftree_fabric_create()
    cl_qmap_init(&p_ftree->sw_tbl);
    cl_qmap_init(&p_ftree->sw_by_tuple_tbl);
 
+   cl_qmap_init(&p_ftree->cn_guids_tbl);
+   cl_qmap_init(&p_ftree->root_guids_tbl);
+
    status = cl_pool_init( &p_ftree->sw_fwd_tbl_pool,
                           8,                 /* min pool size */
                           0,                 /* max pool size - unlimited */
@@ -960,6 +1015,8 @@ __osm_ftree_fabric_clear(ftree_fabric_t * p_ftree)
    ftree_sw_t * p_next_sw;
    ftree_sw_tbl_element_t * p_element;
    ftree_sw_tbl_element_t * p_next_element;
+   ftree_guid_tbl_element_t * p_guid_element;
+   ftree_guid_tbl_element_t * p_next_guid_element;
 
    if (!p_ftree)
       return;
@@ -1000,6 +1057,28 @@ __osm_ftree_fabric_clear(ftree_fabric_t * p_ftree)
    }
    cl_qmap_remove_all(&p_ftree->sw_by_tuple_tbl);
 
+   /* remove all the elements of root_guids_tbl */
+
+   p_next_guid_element = (ftree_guid_tbl_element_t *)cl_qmap_head(&p_ftree->root_guids_tbl);
+   while( p_next_guid_element != (ftree_guid_tbl_element_t *)cl_qmap_end(&p_ftree->root_guids_tbl) )
+   {
+      p_guid_element = p_next_guid_element;
+      p_next_guid_element = (ftree_guid_tbl_element_t *)cl_qmap_next(&p_guid_element->map_item );
+      __osm_ftree_guid_tbl_element_destroy(p_guid_element);
+   }
+   cl_qmap_remove_all(&p_ftree->root_guids_tbl);
+
+   /* remove all the elements of cn_guids_tbl */
+
+   p_next_guid_element = (ftree_guid_tbl_element_t *)cl_qmap_head(&p_ftree->cn_guids_tbl);
+   while( p_next_guid_element != (ftree_guid_tbl_element_t *)cl_qmap_end(&p_ftree->cn_guids_tbl) )
+   {
+      p_guid_element = p_next_guid_element;
+      p_next_guid_element = (ftree_guid_tbl_element_t *)cl_qmap_next(&p_guid_element->map_item );
+      __osm_ftree_guid_tbl_element_destroy(p_guid_element);
+   }
+   cl_qmap_remove_all(&p_ftree->cn_guids_tbl);
+
    /* free the leaf switches array */
    if ((p_ftree->leaf_switches_num > 0) && (p_ftree->leaf_switches))
       free(p_ftree->leaf_switches);
@@ -1048,6 +1127,16 @@ __osm_ftree_fabric_add_hca(ftree_fabric_t * p_ftree, osm_node_t * p_osm_node)
 
    CL_ASSERT(osm_node_get_type(p_osm_node) == IB_NODE_TYPE_CA);
 
+   /* if a user has supplied CN guids list, and this CA's guid 
+      is not there, then the CA should be marked as non-CN */
+   if ( (!cl_is_qmap_empty(&p_ftree->cn_guids_tbl)) && 
+        (cl_qmap_get(&p_ftree->cn_guids_tbl,
+                    cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node))) ==
+                        cl_qmap_end(&p_ftree->cn_guids_tbl)) )
+   {
+      p_hca->is_cn = FALSE;
+   }
+
    cl_qmap_insert(&p_ftree->hca_tbl,
                   p_osm_node->node_info.node_guid,
                   &p_hca->map_item);
@@ -1062,6 +1151,16 @@ __osm_ftree_fabric_add_sw(ftree_fabric_t * p_ftree, osm_switch_t * p_osm_sw)
 
    CL_ASSERT(osm_node_get_type(p_osm_sw->p_node) == IB_NODE_TYPE_SWITCH);
 
+   /* if a user has supplied root guids list, and this switch's guid 
+      *is* there, then the switch should be marked as root */
+   if ( (!cl_is_qmap_empty(&p_ftree->root_guids_tbl)) && 
+        (cl_qmap_get(&p_ftree->root_guids_tbl,
+                    cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node))) !=
+                        cl_qmap_end(&p_ftree->root_guids_tbl)) )
+   {
+      p_sw->is_root = TRUE;
+   }
+
    cl_qmap_insert(&p_ftree->sw_tbl,
                   p_osm_sw->p_node->node_info.node_guid,
                   &p_sw->map_item);
@@ -2907,6 +3006,127 @@ __osm_ftree_fabric_populate_ports(
 /***************************************************
  ***************************************************/
 
+static int
+__osm_ftree_convert_list2qmap(
+   cl_list_t * p_guid_list,
+   cl_qmap_t * p_map )
+{
+   uint64_t * p_guid;
+
+   if ( !p_map )
+      return -1;
+
+   if ( !p_guid_list || !cl_list_count(p_guid_list) )
+      return 0;
+
+   while ( (p_guid = (uint64_t*)cl_list_remove_head(p_guid_list)) )
+   {
+      cl_qmap_insert( p_map, 
+                      *p_guid,
+                      &(__osm_ftree_guid_tbl_element_create(*p_guid)->map_item) );
+      free(p_guid);
+   }
+
+   CL_ASSERT(cl_is_list_empty(p_guid_list));
+
+   return 0;
+} /* __osm_ftree_convert_list2qmap() */
+
+/***************************************************
+ ***************************************************/
+
+static int
+__osm_ftree_fabric_read_guid_files(
+   IN  ftree_fabric_t * p_ftree)
+{
+   cl_list_t guid_list;
+   ftree_guid_tbl_element_t * p_guid_element;
+   ftree_guid_tbl_element_t * p_next_guid_element;
+   int status = 0;
+
+   OSM_LOG_ENTER(&p_ftree->p_osm->log, __osm_ftree_fabric_read_guid_files);
+
+   cl_list_construct( &guid_list );
+   cl_list_init( &guid_list, 10 );
+
+   p_ftree->p_osm->subn.opt.ftree_root_guid_file    = "/tmp/ftree.root.guids";
+   p_ftree->p_osm->subn.opt.ftree_cn_guid_file      = "/tmp/ftree.cn.guids";
+
+   if (p_ftree->p_osm->subn.opt.ftree_root_guid_file)
+   {
+      osm_log( &p_ftree->p_osm->log, OSM_LOG_DEBUG,
+               "__osm_ftree_read_guid_files: "
+               "Fetching root nodes from file %s\n",
+               p_ftree->p_osm->subn.opt.ftree_root_guid_file );
+
+      if ( osm_ucast_mgr_read_guid_file( &p_ftree->p_osm->sm.ucast_mgr,
+                                         p_ftree->p_osm->subn.opt.ftree_root_guid_file,
+                                         &guid_list ) ||
+           __osm_ftree_convert_list2qmap( &guid_list,
+                                          &p_ftree->root_guids_tbl ) )
+      {
+         status = -1;
+         goto Exit;
+      }
+
+      if (osm_log_is_active(&p_ftree->p_osm->log,OSM_LOG_DEBUG))
+      {
+         p_next_guid_element = (ftree_guid_tbl_element_t *)cl_qmap_head(&p_ftree->root_guids_tbl);
+         while( p_next_guid_element != (ftree_guid_tbl_element_t *)cl_qmap_end(&p_ftree->root_guids_tbl) )
+         {
+            p_guid_element = p_next_guid_element;
+            p_next_guid_element = (ftree_guid_tbl_element_t *)cl_qmap_next(&p_guid_element->map_item );
+            osm_log( &p_ftree->p_osm->log, OSM_LOG_DEBUG,
+                     "__osm_ftree_fabric_read_guid_files:   "
+                     "root guid 0x%016" PRIx64 "\n",
+                     p_guid_element->guid );
+         }
+      }
+   }
+   CL_ASSERT(cl_is_list_empty(&guid_list));
+
+   if (p_ftree->p_osm->subn.opt.ftree_cn_guid_file)
+   {
+      osm_log( &p_ftree->p_osm->log, OSM_LOG_DEBUG,
+               "__osm_ftree_read_guid_files: "
+               "Fetching compute nodes from file %s\n",
+               p_ftree->p_osm->subn.opt.ftree_cn_guid_file );
+
+      if ( osm_ucast_mgr_read_guid_file( &p_ftree->p_osm->sm.ucast_mgr,
+                                         p_ftree->p_osm->subn.opt.ftree_cn_guid_file,
+                                         &guid_list ) ||
+           __osm_ftree_convert_list2qmap( &guid_list,
+                                          &p_ftree->cn_guids_tbl ) )
+      {
+         status = -1;
+         goto Exit;
+      }
+
+      if (osm_log_is_active(&p_ftree->p_osm->log,OSM_LOG_DEBUG))
+      {
+         p_next_guid_element = (ftree_guid_tbl_element_t *)cl_qmap_head(&p_ftree->cn_guids_tbl);
+         while( p_next_guid_element != (ftree_guid_tbl_element_t *)cl_qmap_end(&p_ftree->cn_guids_tbl) )
+         {
+            p_guid_element = p_next_guid_element;
+            p_next_guid_element = (ftree_guid_tbl_element_t *)cl_qmap_next(&p_guid_element->map_item );
+            osm_log( &p_ftree->p_osm->log, OSM_LOG_DEBUG,
+                     "__osm_ftree_fabric_read_guid_files:   "
+                     "compute node guid 0x%016" PRIx64 "\n",
+                     p_guid_element->guid );
+         }
+      }
+   }
+   CL_ASSERT(cl_is_list_empty(&guid_list));
+
+  Exit:
+   OSM_LOG_EXIT(&p_ftree->p_osm->log);
+   cl_list_destroy(&guid_list);
+   return status;
+} /*__osm_ftree_fabric_read_guid_files() */
+
+/***************************************************
+ ***************************************************/
+
 static int 
 __osm_ftree_construct_fabric(
    IN  void * context)
@@ -2947,6 +3167,18 @@ __osm_ftree_construct_fabric(
       goto Exit;
    }
 
+   osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE,
+           "__osm_ftree_construct_fabric: "
+           "Reading guid files provided by user\n");
+   if (__osm_ftree_fabric_read_guid_files(p_ftree) != 0)
+   {
+      osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS,
+              "Failed reading guid files - "
+              "falling back to default routing\n");
+      status = -1;
+      goto Exit;
+   }
+
    osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE,"__osm_ftree_construct_fabric: \n"
            "                       |----------------------------------------|\n"
            "                       |- Starting FatTree fabric construction -|\n"
-- 
1.5.1.4


From kliteyn at dev.mellanox.co.il  Thu Jun 14 01:25:21 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 14 Jun 2007 11:25:21 +0300
Subject: [ofa-general] [PATCH] osm: bugfix - if fat-tree failed,
 osm should fall back to default routing
Message-ID: <4670FB71.5090406@dev.mellanox.co.il>

Hi Hal,

When fat-tree fails to populate all the data structures,
it should return error and let osm fall back to default routing.

Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 opensm/opensm/osm_ucast_ftree.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c
index d3ff45f..2236734 100644
--- a/opensm/opensm/osm_ucast_ftree.c
+++ b/opensm/opensm/osm_ucast_ftree.c
@@ -3302,11 +3302,15 @@ __osm_ftree_do_routing(
    IN  void * context)
 {
    ftree_fabric_t * p_ftree = context;
+   int status = 0;
 
    OSM_LOG_ENTER(&p_ftree->p_osm->log, __osm_ftree_do_routing);
 
    if (!p_ftree->fabric_built)
+   {
+      status = -1;
       goto Exit;
+   }
 
    osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE,"__osm_ftree_do_routing: "
            "Starting FatTree routing\n");
@@ -3330,7 +3334,7 @@ __osm_ftree_do_routing(
 
  Exit:
    OSM_LOG_EXIT(&p_ftree->p_osm->log);
-   return 0;
+   return status;
 }
 
 /***************************************************
-- 
1.5.1.4


From vlad at lists.openfabrics.org  Thu Jun 14 02:43:53 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Thu, 14 Jun 2007 02:43:53 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070614-0200 daily build status
Message-ID: <20070614094353.E1D6AE6086C@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.12
Passed on ppc64 with linux-2.6.15
Passed on x86_64 with linux-2.6.19
Passed on ia64 with linux-2.6.12
Passed on powerpc with linux-2.6.18
Passed on ia64 with linux-2.6.13
Passed on x86_64 with linux-2.6.20
Passed on powerpc with linux-2.6.19
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.14
Passed on x86_64 with linux-2.6.18
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.13
Passed on powerpc with linux-2.6.13
Passed on x86_64 with linux-2.6.12
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on ia64 with linux-2.6.15
Passed on x86_64 with linux-2.6.14
Passed on powerpc with linux-2.6.14
Passed on ia64 with linux-2.6.17
Passed on powerpc with linux-2.6.15
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.17
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.15
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.12
Passed on ia64 with linux-2.6.16
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.13
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.14
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From sashak at voltaire.com  Thu Jun 14 04:37:57 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Thu, 14 Jun 2007 14:37:57 +0300
Subject: [ofa-general] [PATCH] opensm/osm_helper.c: fixing PortInfo CapMask
	printing
Message-ID: <20070614113757.GA5908@sashak.voltaire.com>


When PortInfo:CapMask is zero, non-initialized local buffer (garbage)
is printed. There is the fix.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 opensm/opensm/osm_helper.c |   19 ++++++++++---------
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/opensm/opensm/osm_helper.c b/opensm/opensm/osm_helper.c
index 724ecdf..2b35bdd 100644
--- a/opensm/opensm/osm_helper.c
+++ b/opensm/opensm/osm_helper.c
@@ -546,9 +546,6 @@ osm_dbg_get_capabilities_str(
   uint32_t total_len = 0;
   char *p_local = p_buf;
 
-  if( !p_pi->capability_mask )
-    return;
-
   strcpy( p_local, "Capability Mask:\n" );
   p_local += strlen( p_local );
 
@@ -839,9 +836,11 @@ osm_dump_port_info(
              );
 
     /*  show the capabilities mask */
-    osm_dbg_get_capabilities_str( buf, BUF_SIZE, "\t\t\t\t", p_pi );
-
-    osm_log( p_log, log_level, "%s", buf );
+    if( p_pi->capability_mask )
+    {
+      osm_dbg_get_capabilities_str( buf, BUF_SIZE, "\t\t\t\t", p_pi );
+      osm_log( p_log, log_level, "%s", buf );
+    }
   }
 }
 
@@ -936,9 +935,11 @@ osm_dump_portinfo_record(
              );
 
     /*  show the capabilities mask */
-    osm_dbg_get_capabilities_str( buf, BUF_SIZE, "\t\t\t\t", p_pi );
-
-    osm_log( p_log, log_level, "%s", buf );
+    if( p_pi->capability_mask )
+    {
+      osm_dbg_get_capabilities_str( buf, BUF_SIZE, "\t\t\t\t", p_pi );
+      osm_log( p_log, log_level, "%s", buf );
+    }
   }
 }
 
-- 
1.5.2.1.137.g426c


From sashak at voltaire.com  Thu Jun 14 05:15:01 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Thu, 14 Jun 2007 15:15:01 +0300
Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node
	guid files options for fat-tree
In-Reply-To: <4670FA2D.7070708@dev.mellanox.co.il>
References: <4670FA2D.7070708@dev.mellanox.co.il>
Message-ID: <20070614121501.GC5908@sashak.voltaire.com>

Hi Yevgeny,

On 11:19 Thu 14 Jun     , Yevgeny Kliteynik wrote:
> 
>  The following three patches are adding root and compute node guid files
>  options for fat-tree routing,

Is there any reason to not share root guids file option with up/down?

Also the way how root guids are handled (in both up/down and ftree)
doesn't look very optimal - guids are loaded to dynamic list, the list
is converted to map, this map is matched and root nodes are marked as
roots. Isn't it would be easy just to mark root nodes during file
parsing?

Sasha


From kliteyn at dev.mellanox.co.il  Thu Jun 14 05:36:15 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 14 Jun 2007 15:36:15 +0300
Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node	guid
	files options for fat-tree
In-Reply-To: <20070614121501.GC5908@sashak.voltaire.com>
References: <4670FA2D.7070708@dev.mellanox.co.il>
	<20070614121501.GC5908@sashak.voltaire.com>
Message-ID: <4671363F.6060600@dev.mellanox.co.il>

Sasha Khapyorsky wrote:
> Hi Yevgeny,
> 
> On 11:19 Thu 14 Jun     , Yevgeny Kliteynik wrote:
>>  The following three patches are adding root and compute node guid files
>>  options for fat-tree routing,
> 
> Is there any reason to not share root guids file option with up/down?

There are two new options for fat-tree: roots and compute nodes (CN).
These two will be very "tightly coupled" and would have more implication
on the routing than in case of up/dn roots. For instance, having root
file but not CN file means that the topology doesn't have to be pure fat-tree,
but all the CAs are considered CNs and have to be on the same level of the tree.
And there is similar implication of all the combinations of these two options.

Because of this coupling I wanted to differentiate these two options from
the up/dn roots.

Thoughts?
 
> Also the way how root guids are handled (in both up/down and ftree)
> doesn't look very optimal - guids are loaded to dynamic list, the list
> is converted to map, this map is matched and root nodes are marked as
> roots. Isn't it would be easy just to mark root nodes during file 
> parsing?

The only thing you can save here is converting list to map:
You have to parse the guids file anyway, and you have to build all the
fat-tree data structures anyway. So if you parse the file and fill the
map right away instead of filling the list first, you will save the 
list2map conversion.
But then up/dn and fat-tree can't use the same function to parse the guid file,
and since the list2map conversion is not a big deal (we're talking about list
of roots, which is couple of hundreds of guids at max), I prefer to leave it
and not to use separate parsing functions for up/dn and fat-tree.

BTW, since we're on this subject, how about removing the list2array conversion
in the same place in up/dn routing?

-- Yevgeny
 
> Sasha
> 


From kliteyn at dev.mellanox.co.il  Thu Jun 14 06:16:55 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 14 Jun 2007 16:16:55 +0300
Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node	guid
	files options for fat-tree
In-Reply-To: <4671363F.6060600@dev.mellanox.co.il>
References: <4670FA2D.7070708@dev.mellanox.co.il>	<20070614121501.GC5908@sashak.voltaire.com>
	<4671363F.6060600@dev.mellanox.co.il>
Message-ID: <46713FC7.3030104@dev.mellanox.co.il>

Hi Sasha,

Yevgeny Kliteynik wrote:
> Sasha Khapyorsky wrote:
>> Hi Yevgeny,
>>
>> On 11:19 Thu 14 Jun     , Yevgeny Kliteynik wrote:
>>>  The following three patches are adding root and compute node guid files
>>>  options for fat-tree routing,
>>
>> Is there any reason to not share root guids file option with up/down?
> 
> There are two new options for fat-tree: roots and compute nodes (CN).
> These two will be very "tightly coupled" and would have more implication
> on the routing than in case of up/dn roots. For instance, having root
> file but not CN file means that the topology doesn't have to be pure 
> fat-tree,
> but all the CAs are considered CNs and have to be on the same level of 
> the tree.
> And there is similar implication of all the combinations of these two 
> options.
> 
> Because of this coupling I wanted to differentiate these two options from
> the up/dn roots.
> 
> Thoughts?
> 
>> Also the way how root guids are handled (in both up/down and ftree)
>> doesn't look very optimal - guids are loaded to dynamic list, the list
>> is converted to map, this map is matched and root nodes are marked as
>> roots. Isn't it would be easy just to mark root nodes during file 
>> parsing?
> 
> The only thing you can save here is converting list to map:
> You have to parse the guids file anyway, and you have to build all the
> fat-tree data structures anyway. So if you parse the file and fill the
> map right away instead of filling the list first, you will save the 
> list2map conversion.
> But then up/dn and fat-tree can't use the same function to parse the 
> guid file,
> and since the list2map conversion is not a big deal (we're talking about 
> list > of roots, which is couple of hundreds of guids at max), I prefer  
> to leave it and not to use separate parsing functions for up/dn and fat-tree.

Actually, I can do something else here:
 - parse guid file into list
 - populate fat-tree switches and CAs
 - scan guid list, and for each guid mark the matching node 
   in the fat-tree maps

Sounds OK?

-- Yevgeny

> BTW, since we're on this subject, how about removing the list2array 
> conversion
> in the same place in up/dn routing?
> 
> -- Yevgeny
> 
>> Sasha
>>
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 


From sashak at voltaire.com  Thu Jun 14 06:45:19 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Thu, 14 Jun 2007 16:45:19 +0300
Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node
	guid files options for fat-tree
In-Reply-To: <4671363F.6060600@dev.mellanox.co.il>
References: <4670FA2D.7070708@dev.mellanox.co.il>
	<20070614121501.GC5908@sashak.voltaire.com>
	<4671363F.6060600@dev.mellanox.co.il>
Message-ID: <20070614134519.GD5908@sashak.voltaire.com>

On 15:36 Thu 14 Jun     , Yevgeny Kliteynik wrote:
>  Sasha Khapyorsky wrote:
> > Hi Yevgeny,
> > On 11:19 Thu 14 Jun     , Yevgeny Kliteynik wrote:
> >>  The following three patches are adding root and compute node guid files
> >>  options for fat-tree routing,
> > Is there any reason to not share root guids file option with up/down?
> 
>  There are two new options for fat-tree: roots and compute nodes (CN).
>  These two will be very "tightly coupled" and would have more implication
>  on the routing than in case of up/dn roots. For instance, having root
>  file but not CN file means that the topology doesn't have to be pure 
>  fat-tree,
>  but all the CAs are considered CNs and have to be on the same level of the 
>  tree.
>  And there is similar implication of all the combinations of these two 
>  options.
> 
>  Because of this coupling I wanted to differentiate these two options from
>  the up/dn roots.
> 
>  Thoughts?

I still not have strong option about two options against common one.
Hypothetically if in some days we will implement routing engine chains
(so failed algo will fallback to next in chain and not just to default)
separate options could be useful.

> > Also the way how root guids are handled (in both up/down and ftree)
> > doesn't look very optimal - guids are loaded to dynamic list, the list
> > is converted to map, this map is matched and root nodes are marked as
> > roots. Isn't it would be easy just to mark root nodes during file parsing?
> 
>  The only thing you can save here is converting list to map:

I don't think the root guids map is needed - you can just set is_root
field for sw nodes by guid(s) specified in the file, since you already
have sw by guid map.

>  You have to parse the guids file anyway, and you have to build all the
>  fat-tree data structures anyway. So if you parse the file and fill the
>  map right away instead of filling the list first, you will save the list2map 
>  conversion.
>  But then up/dn and fat-tree can't use the same function to parse the guid 
>  file,
>  and since the list2map conversion is not a big deal (we're talking about 
>  list
>  of roots, which is couple of hundreds of guids at max), I prefer to leave it
>  and not to use separate parsing functions for up/dn and fat-tree.

You can pass custom callback to common parser.

>  BTW, since we're on this subject, how about removing the list2array 
>  conversion
>  in the same place in up/dn routing?

Sure, similar junk should be cleaned up in up/down too (and my original
complain was about both root guids users).

Sasha


From sashak at voltaire.com  Thu Jun 14 06:57:17 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Thu, 14 Jun 2007 16:57:17 +0300
Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node
	guid files options for fat-tree
In-Reply-To: <46713FC7.3030104@dev.mellanox.co.il>
References: <4670FA2D.7070708@dev.mellanox.co.il>
	<20070614121501.GC5908@sashak.voltaire.com>
	<4671363F.6060600@dev.mellanox.co.il>
	<46713FC7.3030104@dev.mellanox.co.il>
Message-ID: <20070614135717.GE5908@sashak.voltaire.com>

On 16:16 Thu 14 Jun     , Yevgeny Kliteynik wrote:
>  Hi Sasha,
> 
>  Yevgeny Kliteynik wrote:
> > Sasha Khapyorsky wrote:
> >> Hi Yevgeny,
> >>
> >> On 11:19 Thu 14 Jun     , Yevgeny Kliteynik wrote:
> >>>  The following three patches are adding root and compute node guid files
> >>>  options for fat-tree routing,
> >>
> >> Is there any reason to not share root guids file option with up/down?
> > There are two new options for fat-tree: roots and compute nodes (CN).
> > These two will be very "tightly coupled" and would have more implication
> > on the routing than in case of up/dn roots. For instance, having root
> > file but not CN file means that the topology doesn't have to be pure 
> > fat-tree,
> > but all the CAs are considered CNs and have to be on the same level of the 
> > tree.
> > And there is similar implication of all the combinations of these two 
> > options.
> > Because of this coupling I wanted to differentiate these two options from
> > the up/dn roots.
> > Thoughts?
> >> Also the way how root guids are handled (in both up/down and ftree)
> >> doesn't look very optimal - guids are loaded to dynamic list, the list
> >> is converted to map, this map is matched and root nodes are marked as
> >> roots. Isn't it would be easy just to mark root nodes during file parsing?
> > The only thing you can save here is converting list to map:
> > You have to parse the guids file anyway, and you have to build all the
> > fat-tree data structures anyway. So if you parse the file and fill the
> > map right away instead of filling the list first, you will save the 
> > list2map conversion.
> > But then up/dn and fat-tree can't use the same function to parse the guid 
> > file,
> > and since the list2map conversion is not a big deal (we're talking about 
> > list > of roots, which is couple of hundreds of guids at max), I prefer  to 
> > leave it and not to use separate parsing functions for up/dn and fat-tree.
> 
>  Actually, I can do something else here:
>  - parse guid file into list
>  - populate fat-tree switches and CAs
>  - scan guid list, and for each guid mark the matching node   in the fat-tree 
>  maps
> 
>  Sounds OK?

Yes, much better.

Also there could be something like:
- populate fat-tree switches and CAs
- parse guid file, and for each guid mark the matching node (with
  custom callback)

But with your proposition it is not needed to touch the parser (and
up/down :)).

Sasha


From kliteyn at dev.mellanox.co.il  Thu Jun 14 06:54:35 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 14 Jun 2007 16:54:35 +0300
Subject: [ofa-general] PATCH [2/3] osm: adding root and compute node guid
	files options for fat-tree
In-Reply-To: <4670FA3D.3090500@dev.mellanox.co.il>
References: <4670FA3D.3090500@dev.mellanox.co.il>
Message-ID: <4671489B.1070808@dev.mellanox.co.il>

Hi Hal,

Yevgeny Kliteynik wrote:
> Hi Hal.
> 
> Fat-tree routing reads root guid file and compute node guid file,
> and creates map of roots and compute nodes (CNs) to be used later.
> 
> --Yevgeny
> 
> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> ---
> opensm/opensm/osm_ucast_ftree.c |  232 
> +++++++++++++++++++++++++++++++++++++++
> 1 files changed, 232 insertions(+), 0 deletions(-)
> 
> diff --git a/opensm/opensm/osm_ucast_ftree.c 
> b/opensm/opensm/osm_ucast_ftree.c
> index 1730ef2..b1ee0ca 100644
> --- a/opensm/opensm/osm_ucast_ftree.c
> +++ b/opensm/opensm/osm_ucast_ftree.c
> @@ -119,6 +119,17 @@ typedef struct {
> 
> /***************************************************
>  **
> + **  ftree_guid_tbl_element_t definition
> + **
> + ***************************************************/
> +
> +typedef struct {
> +   cl_map_item_t map_item;
> +   uint64_t guid;
> +} ftree_guid_tbl_element_t;
> +
> +/***************************************************
> + **
>  **  ftree_fwd_tbl_t definition
>  **
>  ***************************************************/
> @@ -182,6 +193,7 @@ typedef struct ftree_sw_t_
>    ftree_port_group_t  ** up_port_groups;
>    uint8_t                up_port_groups_num;
>    ftree_fwd_tbl_t        lft_buf;
> +   boolean_t              is_root;
> } ftree_sw_t;
> 
> /***************************************************
> @@ -195,6 +207,7 @@ typedef struct ftree_hca_t_ {
>    osm_node_t           * p_osm_node;
>    ftree_port_group_t  ** up_port_groups;
>    uint16_t               up_port_groups_num;
> +   boolean_t              is_cn;
> } ftree_hca_t;
> 
> /***************************************************
> @@ -209,6 +222,8 @@ typedef struct ftree_fabric_t_
>    cl_qmap_t       hca_tbl;
>    cl_qmap_t       sw_tbl;
>    cl_qmap_t       sw_by_tuple_tbl;
> +   cl_qmap_t       cn_guids_tbl;
> +   cl_qmap_t       root_guids_tbl;
>    uint8_t         tree_rank;
>    ftree_sw_t   ** leaf_switches;
>    uint32_t        leaf_switches_num;
> @@ -393,6 +408,36 @@ __osm_ftree_sw_tbl_element_destroy(
> 
> /***************************************************
>  **
> + ** ftree_guid_tbl_element_t functions
> + **
> + ***************************************************/
> +
> +static ftree_guid_tbl_element_t *
> +__osm_ftree_guid_tbl_element_create(
> +   IN  uint64_t guid)
> +{
> +   ftree_guid_tbl_element_t * p_element = +      
> (ftree_guid_tbl_element_t *) malloc(sizeof(ftree_guid_tbl_element_t));
> +   if (!p_element)
> +       return NULL;
> +
> +   memset(p_element, 0,sizeof(ftree_guid_tbl_element_t));
> +   p_element->guid = guid;
> +   return p_element;
> +}
> +
> +/***************************************************/
> +
> +static void
> +__osm_ftree_guid_tbl_element_destroy(
> +   IN  ftree_guid_tbl_element_t * p_element)
> +{
> +   if (p_element)
> +      free(p_element);
> +}
> +
> +/***************************************************
> + **
>  ** ftree_port_t functions
>  **
>  ***************************************************/
> @@ -607,6 +652,9 @@ __osm_ftree_sw_create(
>    p_sw->lft_buf = (ftree_fwd_tbl_t)cl_pool_get(&p_ftree->sw_fwd_tbl_pool);
>    memset(p_sw->lft_buf, OSM_NO_PATH, FTREE_FWD_TBL_LEN);
> 
> +   /* by default the switch is not root */
> +   p_sw->is_root = FALSE;
> +
>    return p_sw;
> } /* __osm_ftree_sw_create() */
> 
> @@ -810,6 +858,10 @@ __osm_ftree_hca_create(
>    if (!p_hca->up_port_groups)
>       return NULL;
>    p_hca->up_port_groups_num = 0;
> +
> +   /* by default every CA is treated as compute node */
> +   p_hca->is_cn = TRUE;
> +
>    return p_hca;
> }
> 
> @@ -934,6 +986,9 @@ __osm_ftree_fabric_create()
>    cl_qmap_init(&p_ftree->sw_tbl);
>    cl_qmap_init(&p_ftree->sw_by_tuple_tbl);
> 
> +   cl_qmap_init(&p_ftree->cn_guids_tbl);
> +   cl_qmap_init(&p_ftree->root_guids_tbl);
> +
>    status = cl_pool_init( &p_ftree->sw_fwd_tbl_pool,
>                           8,                 /* min pool size */
>                           0,                 /* max pool size - 
> unlimited */
> @@ -960,6 +1015,8 @@ __osm_ftree_fabric_clear(ftree_fabric_t * p_ftree)
>    ftree_sw_t * p_next_sw;
>    ftree_sw_tbl_element_t * p_element;
>    ftree_sw_tbl_element_t * p_next_element;
> +   ftree_guid_tbl_element_t * p_guid_element;
> +   ftree_guid_tbl_element_t * p_next_guid_element;
> 
>    if (!p_ftree)
>       return;
> @@ -1000,6 +1057,28 @@ __osm_ftree_fabric_clear(ftree_fabric_t * p_ftree)
>    }
>    cl_qmap_remove_all(&p_ftree->sw_by_tuple_tbl);
> 
> +   /* remove all the elements of root_guids_tbl */
> +
> +   p_next_guid_element = (ftree_guid_tbl_element_t 
> *)cl_qmap_head(&p_ftree->root_guids_tbl);
> +   while( p_next_guid_element != (ftree_guid_tbl_element_t 
> *)cl_qmap_end(&p_ftree->root_guids_tbl) )
> +   {
> +      p_guid_element = p_next_guid_element;
> +      p_next_guid_element = (ftree_guid_tbl_element_t 
> *)cl_qmap_next(&p_guid_element->map_item );
> +      __osm_ftree_guid_tbl_element_destroy(p_guid_element);
> +   }
> +   cl_qmap_remove_all(&p_ftree->root_guids_tbl);
> +
> +   /* remove all the elements of cn_guids_tbl */
> +
> +   p_next_guid_element = (ftree_guid_tbl_element_t 
> *)cl_qmap_head(&p_ftree->cn_guids_tbl);
> +   while( p_next_guid_element != (ftree_guid_tbl_element_t 
> *)cl_qmap_end(&p_ftree->cn_guids_tbl) )
> +   {
> +      p_guid_element = p_next_guid_element;
> +      p_next_guid_element = (ftree_guid_tbl_element_t 
> *)cl_qmap_next(&p_guid_element->map_item );
> +      __osm_ftree_guid_tbl_element_destroy(p_guid_element);
> +   }
> +   cl_qmap_remove_all(&p_ftree->cn_guids_tbl);
> +
>    /* free the leaf switches array */
>    if ((p_ftree->leaf_switches_num > 0) && (p_ftree->leaf_switches))
>       free(p_ftree->leaf_switches);
> @@ -1048,6 +1127,16 @@ __osm_ftree_fabric_add_hca(ftree_fabric_t * 
> p_ftree, osm_node_t * p_osm_node)
> 
>    CL_ASSERT(osm_node_get_type(p_osm_node) == IB_NODE_TYPE_CA);
> 
> +   /* if a user has supplied CN guids list, and this CA's guid +      
> is not there, then the CA should be marked as non-CN */
> +   if ( (!cl_is_qmap_empty(&p_ftree->cn_guids_tbl)) && +        
> (cl_qmap_get(&p_ftree->cn_guids_tbl,
> +                    
> cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node))) ==
> +                        cl_qmap_end(&p_ftree->cn_guids_tbl)) )
> +   {
> +      p_hca->is_cn = FALSE;
> +   }
> +
>    cl_qmap_insert(&p_ftree->hca_tbl,
>                   p_osm_node->node_info.node_guid,
>                   &p_hca->map_item);
> @@ -1062,6 +1151,16 @@ __osm_ftree_fabric_add_sw(ftree_fabric_t * 
> p_ftree, osm_switch_t * p_osm_sw)
> 
>    CL_ASSERT(osm_node_get_type(p_osm_sw->p_node) == IB_NODE_TYPE_SWITCH);
> 
> +   /* if a user has supplied root guids list, and this switch's guid 
> +      *is* there, then the switch should be marked as root */
> +   if ( (!cl_is_qmap_empty(&p_ftree->root_guids_tbl)) && +        
> (cl_qmap_get(&p_ftree->root_guids_tbl,
> +                    
> cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node))) !=
> +                        cl_qmap_end(&p_ftree->root_guids_tbl)) )
> +   {
> +      p_sw->is_root = TRUE;
> +   }
> +
>    cl_qmap_insert(&p_ftree->sw_tbl,
>                   p_osm_sw->p_node->node_info.node_guid,
>                   &p_sw->map_item);
> @@ -2907,6 +3006,127 @@ __osm_ftree_fabric_populate_ports(
> /***************************************************
>  ***************************************************/
> 
> +static int
> +__osm_ftree_convert_list2qmap(
> +   cl_list_t * p_guid_list,
> +   cl_qmap_t * p_map )
> +{
> +   uint64_t * p_guid;
> +
> +   if ( !p_map )
> +      return -1;
> +
> +   if ( !p_guid_list || !cl_list_count(p_guid_list) )
> +      return 0;
> +
> +   while ( (p_guid = (uint64_t*)cl_list_remove_head(p_guid_list)) )
> +   {
> +      cl_qmap_insert( p_map, +                      *p_guid,
> +                      
> &(__osm_ftree_guid_tbl_element_create(*p_guid)->map_item) );
> +      free(p_guid);
> +   }
> +
> +   CL_ASSERT(cl_is_list_empty(p_guid_list));
> +
> +   return 0;
> +} /* __osm_ftree_convert_list2qmap() */
> +
> +/***************************************************
> + ***************************************************/
> +
> +static int
> +__osm_ftree_fabric_read_guid_files(
> +   IN  ftree_fabric_t * p_ftree)
> +{
> +   cl_list_t guid_list;
> +   ftree_guid_tbl_element_t * p_guid_element;
> +   ftree_guid_tbl_element_t * p_next_guid_element;
> +   int status = 0;
> +
> +   OSM_LOG_ENTER(&p_ftree->p_osm->log, __osm_ftree_fabric_read_guid_files);
> +
> +   cl_list_construct( &guid_list );
> +   cl_list_init( &guid_list, 10 );
> +
> +   p_ftree->p_osm->subn.opt.ftree_root_guid_file    = "/tmp/ftree.root.guids";
> +   p_ftree->p_osm->subn.opt.ftree_cn_guid_file      = "/tmp/ftree.cn.guids";

These two lines are, of course, a mistake :)

-- Yevgeny

> +
> +   if (p_ftree->p_osm->subn.opt.ftree_root_guid_file)
> +   {
> +      osm_log( &p_ftree->p_osm->log, OSM_LOG_DEBUG,
> +               "__osm_ftree_read_guid_files: "
> +               "Fetching root nodes from file %s\n",
> +               p_ftree->p_osm->subn.opt.ftree_root_guid_file );
> +
> +      if ( osm_ucast_mgr_read_guid_file( &p_ftree->p_osm->sm.ucast_mgr,
> +                                         
> p_ftree->p_osm->subn.opt.ftree_root_guid_file,
> +                                         &guid_list ) ||
> +           __osm_ftree_convert_list2qmap( &guid_list,
> +                                          &p_ftree->root_guids_tbl ) )
> +      {
> +         status = -1;
> +         goto Exit;
> +      }
> +
> +      if (osm_log_is_active(&p_ftree->p_osm->log,OSM_LOG_DEBUG))
> +      {
> +         p_next_guid_element = (ftree_guid_tbl_element_t 
> *)cl_qmap_head(&p_ftree->root_guids_tbl);
> +         while( p_next_guid_element != (ftree_guid_tbl_element_t 
> *)cl_qmap_end(&p_ftree->root_guids_tbl) )
> +         {
> +            p_guid_element = p_next_guid_element;
> +            p_next_guid_element = (ftree_guid_tbl_element_t 
> *)cl_qmap_next(&p_guid_element->map_item );
> +            osm_log( &p_ftree->p_osm->log, OSM_LOG_DEBUG,
> +                     "__osm_ftree_fabric_read_guid_files:   "
> +                     "root guid 0x%016" PRIx64 "\n",
> +                     p_guid_element->guid );
> +         }
> +      }
> +   }
> +   CL_ASSERT(cl_is_list_empty(&guid_list));
> +
> +   if (p_ftree->p_osm->subn.opt.ftree_cn_guid_file)
> +   {
> +      osm_log( &p_ftree->p_osm->log, OSM_LOG_DEBUG,
> +               "__osm_ftree_read_guid_files: "
> +               "Fetching compute nodes from file %s\n",
> +               p_ftree->p_osm->subn.opt.ftree_cn_guid_file );
> +
> +      if ( osm_ucast_mgr_read_guid_file( &p_ftree->p_osm->sm.ucast_mgr,
> +                                         
> p_ftree->p_osm->subn.opt.ftree_cn_guid_file,
> +                                         &guid_list ) ||
> +           __osm_ftree_convert_list2qmap( &guid_list,
> +                                          &p_ftree->cn_guids_tbl ) )
> +      {
> +         status = -1;
> +         goto Exit;
> +      }
> +
> +      if (osm_log_is_active(&p_ftree->p_osm->log,OSM_LOG_DEBUG))
> +      {
> +         p_next_guid_element = (ftree_guid_tbl_element_t 
> *)cl_qmap_head(&p_ftree->cn_guids_tbl);
> +         while( p_next_guid_element != (ftree_guid_tbl_element_t 
> *)cl_qmap_end(&p_ftree->cn_guids_tbl) )
> +         {
> +            p_guid_element = p_next_guid_element;
> +            p_next_guid_element = (ftree_guid_tbl_element_t 
> *)cl_qmap_next(&p_guid_element->map_item );
> +            osm_log( &p_ftree->p_osm->log, OSM_LOG_DEBUG,
> +                     "__osm_ftree_fabric_read_guid_files:   "
> +                     "compute node guid 0x%016" PRIx64 "\n",
> +                     p_guid_element->guid );
> +         }
> +      }
> +   }
> +   CL_ASSERT(cl_is_list_empty(&guid_list));
> +
> +  Exit:
> +   OSM_LOG_EXIT(&p_ftree->p_osm->log);
> +   cl_list_destroy(&guid_list);
> +   return status;
> +} /*__osm_ftree_fabric_read_guid_files() */
> +
> +/***************************************************
> + ***************************************************/
> +
> static int __osm_ftree_construct_fabric(
>    IN  void * context)
> @@ -2947,6 +3167,18 @@ __osm_ftree_construct_fabric(
>       goto Exit;
>    }
> 
> +   osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE,
> +           "__osm_ftree_construct_fabric: "
> +           "Reading guid files provided by user\n");
> +   if (__osm_ftree_fabric_read_guid_files(p_ftree) != 0)
> +   {
> +      osm_log(&p_ftree->p_osm->log, OSM_LOG_SYS,
> +              "Failed reading guid files - "
> +              "falling back to default routing\n");
> +      status = -1;
> +      goto Exit;
> +   }
> +
>    osm_log(&p_ftree->p_osm->log, 
> OSM_LOG_VERBOSE,"__osm_ftree_construct_fabric: \n"
>            "                       
> |----------------------------------------|\n"
>            "                       |- Starting FatTree fabric 
> construction -|\n"


From kliteyn at dev.mellanox.co.il  Thu Jun 14 07:00:06 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 14 Jun 2007 17:00:06 +0300
Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node	guid
	files options for fat-tree
In-Reply-To: <20070614135717.GE5908@sashak.voltaire.com>
References: <4670FA2D.7070708@dev.mellanox.co.il>
	<20070614121501.GC5908@sashak.voltaire.com>
	<4671363F.6060600@dev.mellanox.co.il>
	<46713FC7.3030104@dev.mellanox.co.il>
	<20070614135717.GE5908@sashak.voltaire.com>
Message-ID: <467149E6.80606@dev.mellanox.co.il>

Sasha Khapyorsky wrote:
> On 16:16 Thu 14 Jun     , Yevgeny Kliteynik wrote:
>>  Hi Sasha,
>>
>>  Yevgeny Kliteynik wrote:
>>> Sasha Khapyorsky wrote:
>>>> Hi Yevgeny,
>>>>
>>>> On 11:19 Thu 14 Jun     , Yevgeny Kliteynik wrote:
>>>>>  The following three patches are adding root and compute node guid files
>>>>>  options for fat-tree routing,
>>>> Is there any reason to not share root guids file option with up/down?
>>> There are two new options for fat-tree: roots and compute nodes (CN).
>>> These two will be very "tightly coupled" and would have more implication
>>> on the routing than in case of up/dn roots. For instance, having root
>>> file but not CN file means that the topology doesn't have to be pure 
>>> fat-tree,
>>> but all the CAs are considered CNs and have to be on the same level of the 
>>> tree.
>>> And there is similar implication of all the combinations of these two 
>>> options.
>>> Because of this coupling I wanted to differentiate these two options from
>>> the up/dn roots.
>>> Thoughts?
>>>> Also the way how root guids are handled (in both up/down and ftree)
>>>> doesn't look very optimal - guids are loaded to dynamic list, the list
>>>> is converted to map, this map is matched and root nodes are marked as
>>>> roots. Isn't it would be easy just to mark root nodes during file parsing?
>>> The only thing you can save here is converting list to map:
>>> You have to parse the guids file anyway, and you have to build all the
>>> fat-tree data structures anyway. So if you parse the file and fill the
>>> map right away instead of filling the list first, you will save the 
>>> list2map conversion.
>>> But then up/dn and fat-tree can't use the same function to parse the guid 
>>> file,
>>> and since the list2map conversion is not a big deal (we're talking about 
>>> list > of roots, which is couple of hundreds of guids at max), I prefer  to 
>>> leave it and not to use separate parsing functions for up/dn and fat-tree.
>>  Actually, I can do something else here:
>>  - parse guid file into list
>>  - populate fat-tree switches and CAs
>>  - scan guid list, and for each guid mark the matching node   in the fat-tree 
>>  maps
>>
>>  Sounds OK?
> 
> Yes, much better.
> 
> Also there could be something like:
> - populate fat-tree switches and CAs
> - parse guid file, and for each guid mark the matching node (with
>   custom callback)
> 
> But with your proposition it is not needed to touch the parser (and
> up/down :)).

OK, I'll rewrite it as I've described it.
What about the rest of the patches?

-- Yevgeny


> Sasha
> 


From sashak at voltaire.com  Thu Jun 14 07:31:34 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Thu, 14 Jun 2007 17:31:34 +0300
Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node
	guid files options for fat-tree
In-Reply-To: <467149E6.80606@dev.mellanox.co.il>
References: <4670FA2D.7070708@dev.mellanox.co.il>
	<20070614121501.GC5908@sashak.voltaire.com>
	<4671363F.6060600@dev.mellanox.co.il>
	<46713FC7.3030104@dev.mellanox.co.il>
	<20070614135717.GE5908@sashak.voltaire.com>
	<467149E6.80606@dev.mellanox.co.il>
Message-ID: <20070614143134.GF5908@sashak.voltaire.com>

On 17:00 Thu 14 Jun     , Yevgeny Kliteynik wrote:
> >>  Actually, I can do something else here:
> >>  - parse guid file into list
> >>  - populate fat-tree switches and CAs
> >>  - scan guid list, and for each guid mark the matching node   in the 
> >> fat-tree  maps
> >>
> >>  Sounds OK?
> > Yes, much better.
> > Also there could be something like:
> > - populate fat-tree switches and CAs
> > - parse guid file, and for each guid mark the matching node (with
> >   custom callback)
> > But with your proposition it is not needed to touch the parser (and
> > up/down :)).
> 
>  OK, I'll rewrite it as I've described it.
>  What about the rest of the patches?

Basically looks fine.

Just small nits: there are trailing white spaces (you can use 'git-diff
--color' in order to see it or apply the patch with 'git-am
--whitespace=...'), it is helpful to have descriptive per patch subjects
in emails (git-am gets this as patch summary) - git-format-patch is
useful there.

Sasha


From mshefty at ichips.intel.com  Thu Jun 14 09:20:54 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 14 Jun 2007 09:20:54 -0700
Subject: [ofa-general] crash in ipoib
In-Reply-To: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>
References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>
Message-ID: <46716AE6.9050804@ichips.intel.com>

Here's the capture from the network console

<5> [...network console startup...]
<5> Unable to handle kernel NULL pointer dereference at 0000000000000008 
RIP:
<5> <4>Warning: kfree_skb on hard IRQ ffffffff802bb055
<5> Warning: kfree_skb on hard IRQ ffffffff802bb055
<5> Warning: kfree_skb on hard IRQ ffffffff802bb055
<5> Warning: kfree_skb on hard IRQ ffffffff802bb055
<5> <ffffffffa0146b60>{:ib_ipoib:ipoib_cm_handle_rx_wc+378}
<5> PML4 dcc2f067 PGD 102087067 PMD 0
<5> Oops: 0002 [1] SMP
<5> CPU 1
<5> Modules linked in: netconsole det(U) nfs lockd nfs_acl autofs4 
i2c_dev i2c_core sunrpc rdma_ucm(U) ib_vnic(U) ib_sdp(U) rdma_cm(U) 
iw_cm(U) ib_addr(U) ib_local_sa(U) ib_ipath(U) ipt_REJECT ipt_state 
ip_conntrack iptable_filter ip_tables dm_mirror dm_mod button battery ac 
joydev uhci_hcd ehci_hcd hw_random ib_mthca(U) ib_ipoib(U) ib_umad(U) 
ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) md5 ipv6 
e1000(U) ahci ext3 jbd ata_piix libata sd_mod scsi_mod
<5> Pid: 1584, comm: ib_cm/1 Tainted: PF     2.6.9-42.ELsmp
<5> RIP: 0010:[<ffffffffa0146b60>] 
<ffffffffa0146b60>{:ib_ipoib:ipoib_cm_handle_rx_wc+378}
<5> RSP: 0018:0000010005d7b940  EFLAGS: 00010046
<5> RAX: 0000000000000000 RBX: 000001010d3a8e00 RCX: 0000000000000000
<5> RDX: 000001010d3a8e10 RSI: 00000101191b3990 RDI: 00000101191b3380
<5> RBP: 000001011302b680 R08: 0000000000000010 R09: 0000010119301e00
<5> R10: 000000000000001f R11: 00000000000000e4 R12: 0000000000000206
<5> R13: 00000101191b3380 R14: 00000101191b3000 R15: 0000000000000030
<5> FS:  0000000000000000(0000) GS:ffffffff804e5100(0000) 
knlGS:0000000000000000
<5> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<5> CR2: 0000000000000008 CR3: 0000000005d68000 CR4: 00000000000006e0
<5> Process ib_cm/1 (pid: 1584, threadinfo 0000010119c14000, task 
000001011a9c5030)
<5> Stack: 0000000000000206 0000000000000030 0000000000000206 
0000010110e8fb00
<5>        00000101191b37b8 0000000000000206 00000000dc62401c 
0000000400000206
<5>        0100000082000001 000000041a9121c0
<5> Call Trace:<IRQ> <ffffffffa0141fa0>{:ib_ipoib:ipoib_ib_completion+144}
<5>        <ffffffffa015c7d3>{:ib_mthca:mthca_tavor_interrupt+95}
<5>        <ffffffffa015c225>{:ib_mthca:mthca_eq_int+221} 
<ffffffff80113209>{do_IRQ+266}
<5>        <ffffffffa015c7d3>{:ib_mthca:mthca_tavor_interrupt+95}
<5>        <ffffffff80112f4a>{handle_IRQ_event+41} 
<ffffffff801131c4>{do_IRQ+197}
<5>        <ffffffff80110833>{ret_from_intr+0} 
<ffffffff801ec72d>{csum_partial+1209}
<5>        <ffffffff802abeb1>{skb_checksum+308} 
<ffffffffa01ca846>{:ip_conntrack:tcp_error+312}
<5>        <ffffffffa01c9197>{:ip_conntrack:ip_conntrack_in+163}
<5>        <ffffffff802c8e30>{ip_local_deliver_finish+0} 
<ffffffff802b92fe>{nf_hook_slow+184}
<5>        <ffffffff802b8f1e>{nf_iterate+82} 
<ffffffff802c9232>{ip_rcv_finish+0}
<5>        <ffffffff802b92ba>{nf_hook_slow+116} 
<ffffffff802c9232>{ip_rcv_finish+0}
<5>        <ffffffff802c98e8>{ip_rcv+1119} 
<ffffffff802b066b>{netif_receive_skb+791}
<5>        <ffffffff802b0730>{process_backlog+136} 
<ffffffff802b0884>{net_rx_action+203}
<5>        <ffffffff8013c738>{__do_softirq+88} 
<ffffffff8013c7e1>{do_softirq+49}
<5>        <ffffffff80113247>{do_IRQ+328} 
<ffffffff80110833>{ret_from_intr+0}
<5>         <EOI> <ffffffff8030b0cb>{_spin_unlock_irqrestore+47}
<5>        <ffffffffa0119ee7>{:ib_cm:ib_send_cm_rep+812} 
<ffffffffa01468ef>{:ib_ipoib:ipoib_cm_rx_handler+821}
<5>        <ffffffffa0146530>{:ib_ipoib:ipoib_cm_rx_event_handler+0}
<5>        <ffffffffa00f4711>{:ib_core:ib_find_cached_pkey+192}
<5>        <ffffffffa011b09d>{:ib_cm:cm_process_work+101} 
<ffffffffa011ba49>{:ib_cm:cm_req_handler+2398}
<5>        <ffffffffa011be2e>{:ib_cm:cm_work_handler+0} 
<ffffffffa011be5c>{:ib_cm:cm_work_handler+46}
<5>        <ffffffff80147852>{worker_thread+419} 
<ffffffff80133da9>{default_wake_function+0}
<5>        <ffffffff80133dfa>{__wake_up_common+67} 
<ffffffff80133da9>{default_wake_function+0}
<5>        <ffffffff8014b4f0>{keventd_create_kthread+0} 
<ffffffff801476af>{worker_thread+0}
<5>        <ffffffff8014b4f0>{keventd_create_kthread+0} 
<ffffffff8014b4c7>{kthread+200}
<5>        <ffffffff80110f47>{child_rip+8} 
<ffffffff8014b4f0>{keventd_create_kthread+0}
<5>        <ffffffff8014b3ff>{kthread+0} <ffffffff80110f3f>{child_rip+0}
<5>
<5>
<5> Code: 48 89 48 08 48 89 01 49 8b 86 90 09 00 00 48 89 50 08 48 89
<5> RIP <ffffffffa0146b60>{:ib_ipoib:ipoib_cm_handle_rx_wc+378} RSP 
<0000010005d7b940>
<5> CR2: 0000000000000008
<5>  <0>Kernel panic - not syncing: Oops
<5>  Badness in panic at kernel/panic.c:118
<5>
<5> Call Trace:<IRQ> <ffffffff80137a86>{panic+527} 
<ffffffff8013fcb4>{__mod_timer+293}
<5>        <ffffffff80232de1>{complement_pos+12} 
<ffffffff801f89e8>{vgacon_cursor+213}
<5>        <ffffffff801f8913>{vgacon_cursor+0} 
<ffffffff801239b6>{bust_spinlocks+62}
<5>        <ffffffff80111b07>{oops_end+65} 
<ffffffff80124148>{do_page_fault+1204}
<5>        <ffffffffa016ba02>{:ib_mthca:mthca_tavor_post_srq_recv+839}
<5>        <ffffffffa0145fdf>{:ib_ipoib:ipoib_cm_post_receive+119}
<5>        <ffffffff80161936>{cache_alloc_refill+390} 
<ffffffff80110d91>{error_exit+0}
<5>        <ffffffffa0146b60>{:ib_ipoib:ipoib_cm_handle_rx_wc+378}
<5>        <ffffffffa0141fa0>{:ib_ipoib:ipoib_ib_completion+144}
<5>        <ffffffffa015c7d3>{:ib_mthca:mthca_tavor_interrupt+95}
<5>        <ffffffffa015c225>{:ib_mthca:mthca_eq_int+221} 
<ffffffff80113209>{do_IRQ+266}
<5>        <ffffffffa015c7d3>{:ib_mthca:mthca_tavor_interrupt+95}
<5>        <ffffffff80112f4a>{handle_IRQ_event+41} 
<ffffffff801131c4>{do_IRQ+197}
<5>        <ffffffff80110833>{ret_from_intr+0} 
<ffffffff801ec72d>{csum_partial+1209}
<5>        <ffffffff802abeb1>{skb_checksum+308} 
<ffffffffa01ca846>{:ip_conntrack:tcp_error+312}
<5>        <ffffffffa01c9197>{:ip_conntrack:ip_conntrack_in+163}
<5>        <ffffffff802c8e30>{ip_local_deliver_finish+0} 
<ffffffff802b92fe>{nf_hook_slow+184}
<5>        <ffffffff802b8f1e>{nf_iterate+82} 
<ffffffff802c9232>{ip_rcv_finish+0}
<5>        <ffffffff802b92ba>{nf_hook_slow+116} 
<ffffffff802c9232>{ip_rcv_finish+0}
<5>        <ffffffff802c98e8>{ip_rcv+1119} 
<ffffffff802b066b>{netif_receive_skb+791}
<5>        <ffffffff802b0730>{process_backlog+136} 
<ffffffff802b0884>{net_rx_action+203}
<5>        <ffffffff8013c738>{__do_softirq+88} 
<ffffffff8013c7e1>{do_softirq+49}
<5>        <ffffffff80113247>{do_IRQ+328} 
<ffffffff80110833>{ret_from_intr+0}
<5>         <EOI> <ffffffff8030b0cb>{_spin_unlock_irqrestore+47}
<5>        <ffffffffa0119ee7>{:ib_cm:ib_send_cm_rep+812} 
<ffffffffa01468ef>{:ib_ipoib:ipoib_cm_rx_handler+821}
<5>        <ffffffffa0146530>{:ib_ipoib:ipoib_cm_rx_event_handler+0}
<5>        <ffffffffa00f4711>{:ib_core:ib_find_cached_pkey+192}
<5>        <ffffffffa011b09d>{:ib_cm:cm_process_work+101} 
<ffffffffa011ba49>{:ib_cm:cm_req_handler+2398}
<5>        <ffffffffa011be2e>{:ib_cm:cm_work_handler+0} 
<ffffffffa011be5c>{:ib_cm:cm_work_handler+46}
<5>        <ffffffff80147852>{worker_thread+419} 
<ffffffff80133da9>{default_wake_function+0}
<5>        <ffffffff80133dfa>{__wake_up_common+67} 
<ffffffff80133da9>{default_wake_function+0}
<5>        <ffffffff8014b4f0>{keventd_create_kthread+0} 
<ffffffff801476af>{worker_thread+0}
<5>        <ffffffff8014b4f0>{keventd_create_kthread+0} 
<ffffffff8014b4c7>{kthread+200}
<5>        <ffffffff80110f47>{child_rip+8} 
<ffffffff8014b4f0>{keventd_create_kthread+0}
<5>        <ffffffff8014b3ff>{kthread+0} <ffffffff80110f3f>{child_rip+0}
<5>
<5> Badness in i8042_panic_blink at drivers/input/serio/i8042.c:987
<5>
<5> Call Trace:<IRQ> <ffffffff80241feb>{i8042_panic_blink+238} 
<ffffffff80137a34>{panic+445}
<5>        <ffffffff8013fcb4>{__mod_timer+293} 
<ffffffff80232de1>{complement_pos+12}
<5>        <ffffffff801f89e8>{vgacon_cursor+213} 
<ffffffff801f8913>{vgacon_cursor+0}
<5>        <ffffffff801239b6>{bust_spinlocks+62} 
<ffffffff80111b07>{oops_end+65}
<5>        <ffffffff80124148>{do_page_fault+1204} 
<ffffffffa016ba02>{:ib_mthca:mthca_tavor_post_srq_recv+839}
<5>        <ffffffffa0145fdf>{:ib_ipoib:ipoib_cm_post_receive+119}
<5>        <ffffffff80161936>{cache_alloc_refill+390} 
<ffffffff80110d91>{error_exit+0}
<5>        <ffffffffa0146b60>{:ib_ipoib:ipoib_cm_handle_rx_wc+378}
<5>        <ffffffffa0141fa0>{:ib_ipoib:ipoib_ib_completion+144}
<5>        <ffffffffa015c7d3>{:ib_mthca:mthca_tavor_interrupt+95}
<5>        <ffffffffa015c225>{:ib_mthca:mthca_eq_int+221} 
<ffffffff80113209>{do_IRQ+266}
<5>        <ffffffffa015c7d3>{:ib_mthca:mthca_tavor_interrupt+95}
<5>        <ffffffff80112f4a>{handle_IRQ_event+41} 
<ffffffff801131c4>{do_IRQ+197}
<5>        <ffffffff80110833>{ret_from_intr+0} 
<ffffffff801ec72d>{csum_partial+1209}
<5>        <ffffffff802abeb1>{skb_checksum+308} 
<ffffffffa01ca846>{:ip_conntrack:tcp_error+312}
<5>        <ffffffffa01c9197>{:ip_conntrack:ip_conntrack_in+163}
<5>        <ffffffff802c8e30>{ip_local_deliver_finish+0} 
<ffffffff802b92fe>{nf_hook_slow+184}
<5>        <ffffffff802b8f1e>{nf_iterate+82} 
<ffffffff802c9232>{ip_rcv_finish+0}
<5>        <ffffffff802b92ba>{nf_hook_slow+116} 
<ffffffff802c9232>{ip_rcv_finish+0}
<5>        <ffffffff802c98e8>{ip_rcv+1119} 
<ffffffff802b066b>{netif_receive_skb+791}
<5>        <ffffffff802b0730>{process_backlog+136} 
<ffffffff802b0884>{net_rx_action+203}
<5>        <ffffffff8013c738>{__do_softirq+88} 
<ffffffff8013c7e1>{do_softirq+49}
<5>        <ffffffff80113247>{do_IRQ+328} 
<ffffffff80110833>{ret_from_intr+0}
<5>         <EOI> <ffffffff8030b0cb>{_spin_unlock_irqrestore+47}
<5>        <ffffffffa0119ee7>{:ib_cm:ib_send_cm_rep+812} 
<ffffffffa01468ef>{:ib_ipoib:ipoib_cm_rx_handler+821}
<5>        <ffffffffa0146530>{:ib_ipoib:ipoib_cm_rx_event_handler+0}
<5>        <ffffffffa00f4711>{:ib_core:ib_find_cached_pkey+192}
<5>        <ffffffffa011b09d>{:ib_cm:cm_process_work+101} 
<ffffffffa011ba49>{:ib_cm:cm_req_handler+2398}
<5>        <ffffffffa011be2e>{:ib_cm:cm_work_handler+0} 
<ffffffffa011be5c>{:ib_cm:cm_work_handler+46}
<5>        <ffffffff80147852>{worker_thread+419} 
<ffffffff80133da9>{default_wake_function+0}
<5>        <ffffffff80133dfa>{__wake_up_common+67} 
<ffffffff80133da9>{default_wake_function+0}
<5>        <ffffffff8014b4f0>{keventd_create_kthread+0} 
<ffffffff801476af>{worker_thread+0}
<5>        <ffffffff8014b4f0>{keventd_create_kthread+0} 
<ffffffff8014b4c7>{kthread+200}
<5>        <ffffffff80110f47>{child_rip+8} 
<ffffffff8014b4f0>{keventd_create_kthread+0}
<5>        <ffffffff8014b3ff>{kthread+0} <ffffffff80110f3f>{child_rip+0}
<5>
<5> Badness in i8042_panic_blink at drivers/input/serio/i8042.c:990
<5>
<5> Call Trace:<IRQ> <ffffffff8024207d>{i8042_panic_blink+384} 
<ffffffff80137a34>{panic+445}
<5>        <ffffffff8013fcb4>{__mod_timer+293} 
<ffffffff80232de1>{complement_pos+12}
<5>        <ffffffff801f89e8>{vgacon_cursor+213} 
<ffffffff801f8913>{vgacon_cursor+0}
<5>        <ffffffff801239b6>{bust_spinlocks+62} 
<ffffffff80111b07>{oops_end+65}
<5>        <ffffffff80124148>{do_page_fault+1204} 
<ffffffffa016ba02>{:ib_mthca:mthca_tavor_post_srq_recv+839}
<5>        <ffffffffa0145fdf>{:ib_ipoib:ipoib_cm_post_receive+119}
<5>        <ffffffff80161936>{cache_alloc_refill+390} 
<ffffffff80110d91>{error_exit+0}
<5>        <ffffffffa0146b60>{:ib_ipoib:ipoib_cm_handle_rx_wc+378}
<5>        <ffffffffa0141fa0>{:ib_ipoib:ipoib_ib_completion+144}
<5>        <ffffffffa015c7d3>{:ib_mthca:mthca_tavor_interrupt+95}
<5>        <ffffffffa015c225>{:ib_mthca:mthca_eq_int+221} 
<ffffffff80113209>{do_IRQ+266}
<5>        <ffffffffa015c7d3>{:ib_mthca:mthca_tavor_interrupt+95}
<5>        <ffffffff80112f4a>{handle_IRQ_event+41} 
<ffffffff801131c4>{do_IRQ+197}
<5>        <ffffffff80110833>{ret_from_intr+0} 
<ffffffff801ec72d>{csum_partial+1209}
<5>        <ffffffff802abeb1>{skb_checksum+308} 
<ffffffffa01ca846>{:ip_conntrack:tcp_error+312}
<5>        <ffffffffa01c9197>{:ip_conntrack:ip_conntrack_in+163}
<5>        <ffffffff802c8e30>{ip_local_deliver_finish+0} 
<ffffffff802b92fe>{nf_hook_slow+184}
<5>        <ffffffff802b8f1e>{nf_iterate+82} 
<ffffffff802c9232>{ip_rcv_finish+0}
<5>        <ffffffff802b92ba>{nf_hook_slow+116} 
<ffffffff802c9232>{ip_rcv_finish+0}
<5>        <ffffffff802c98e8>{ip_rcv+1119} 
<ffffffff802b066b>{netif_receive_skb+791}
<5>        <ffffffff802b0730>{process_backlog+136} 
<ffffffff802b0884>{net_rx_action+203}
<5>        <ffffffff8013c738>{__do_softirq+88} 
<ffffffff8013c7e1>{do_softirq+49}
<5>        <ffffffff80113247>{do_IRQ+328} 
<ffffffff80110833>{ret_from_intr+0}
<5>         <EOI> <ffffffff8030b0cb>{_spin_unlock_irqrestore+47}
<5>        <ffffffffa0119ee7>{:ib_cm:ib_send_cm_rep+812} 
<ffffffffa01468ef>{:ib_ipoib:ipoib_cm_rx_handler+821}
<5>        <ffffffffa0146530>{:ib_ipoib:ipoib_cm_rx_event_handler+0}
<5>        <ffffffffa00f4711>{:ib_core:ib_find_cached_pkey+192}
<5>        <ffffffffa011b09d>{:ib_cm:cm_process_work+101} 
<ffffffffa011ba49>{:ib_cm:cm_req_handler+2398}
<5>        <ffffffffa011be2e>{:ib_cm:cm_work_handler+0} 
<ffffffffa011be5c>{:ib_cm:cm_work_handler+46}
<5>        <ffffffff80147852>{worker_thread+419} 
<ffffffff80133da9>{default_wake_function+0}
<5>        <ffffffff80133dfa>{__wake_up_common+67} 
<ffffffff80133da9>{default_wake_function+0}
<5>        <ffffffff8014b4f0>{keventd_create_kthread+0} 
<ffffffff801476af>{worker_thread+0}
<5>        <ffffffff8014b4f0>{keventd_create_kthread+0} 
<ffffffff8014b4c7>{kthread+200}
<5>        <ffffffff80110f47>{child_rip+8} 
<ffffffff8014b4f0>{keventd_create_kthread+0}
<5>        <ffffffff8014b3ff>{kthread+0} <ffffffff80110f3f>{child_rip+0}
<5>
<5> Badness in i8042_panic_blink at drivers/input/serio/i8042.c:992
<5>
<5> Call Trace:<IRQ> <ffffffff802420e2>{i8042_panic_blink+485} 
<ffffffff80137a34>{panic+445}
<5>        <ffffffff8013fcb4>{__mod_timer+293} 
<ffffffff80232de1>{complement_pos+12}
<5>        <ffffffff801f89e8>{vgacon_cursor+213} 
<ffffffff801f8913>{vgacon_cursor+0}
<5>        <ffffffff801239b6>{bust_spinlocks+62} 
<ffffffff80111b07>{oops_end+65}
<5>        <ffffffff80124148>{do_page_fault+1204} 
<ffffffffa016ba02>{:ib_mthca:mthca_tavor_post_srq_recv+839}
<5>        <ffffffffa0145fdf>{:ib_ipoib:ipoib_cm_post_receive+119}
<5>        <ffffffff80161936>{cache_alloc_refill+390} 
<ffffffff80110d91>{error_exit+0}
<5>        <ffffffffa0146b60>{:ib_ipoib:ipoib_cm_handle_rx_wc+378}
<5>        <ffffffffa0141fa0>{:ib_ipoib:ipoib_ib_completion+144}
<5>        <ffffffffa015c7d3>{:ib_mthca:mthca_tavor_interrupt+95}
<5>        <ffffffffa015c225>{:ib_mthca:mthca_eq_int+221} 
<ffffffff80113209>{do_IRQ+266}
<5>        <ffffffffa015c7d3>{:ib_mthca:mthca_tavor_interrupt+95}
<5>        <ffffffff80112f4a>{handle_IRQ_event+41} 
<ffffffff801131c4>{do_IRQ+197}
<5>        <ffffffff80110833>{ret_from_intr+0} 
<ffffffff801ec72d>{csum_partial+1209}
<5>        <ffffffff802abeb1>{skb_checksum+308} 
<ffffffffa01ca846>{:ip_conntrack:tcp_error+312}
<5>        <ffffffffa01c9197>{:ip_conntrack:ip_conntrack_in+163}
<5>        <ffffffff802c8e30>{ip_local_deliver_finish+0} 
<ffffffff802b92fe>{nf_hook_slow+184}
<5>        <ffffffff802b8f1e>{nf_iterate+82} 
<ffffffff802c9232>{ip_rcv_finish+0}
<5>        <ffffffff802b92ba>{nf_hook_slow+116} 
<ffffffff802c9232>{ip_rcv_finish+0}
<5>        <ffffffff802c98e8>{ip_rcv+1119} 
<ffffffff802b066b>{netif_receive_skb+791}
<5>        <ffffffff802b0730>{process_backlog+136} 
<ffffffff802b0884>{net_rx_action+203}
<5>        <ffffffff8013c738>{__do_softirq+88} 
<ffffffff8013c7e1>{do_softirq+49}
<5>        <ffffffff80113247>{do_IRQ+328} 
<ffffffff80110833>{ret_from_intr+0}
<5>         <EOI> <ffffffff8030b0cb>{_spin_unlock_irqrestore+47}
<5>        <ffffffffa0119ee7>{:ib_cm:ib_send_cm_rep+812} 
<ffffffffa01468ef>{:ib_ipoib:ipoib_cm_rx_handler+821}
<5>        <ffffffffa0146530>{:ib_ipoib:ipoib_cm_rx_event_handler+0}
<5>        <ffffffffa00f4711>{:ib_core:ib_find_cached_pkey+192}
<5>        <ffffffffa011b09d>{:ib_cm:cm_process_work+101} 
<ffffffffa011ba49>{:ib_cm:cm_req_handler+2398}
<5>        <ffffffffa011be2e>{:ib_cm:cm_work_handler+0} 
<ffffffffa011be5c>{:ib_cm:cm_work_handler+46}
<5>        <ffffffff80147852>{worker_thread+419} 
<ffffffff80133da9>{default_wake_function+0}
<5>        <ffffffff80133dfa>{__wake_up_common+67} 
<ffffffff80133da9>{default_wake_function+0}
<5>        <ffffffff8014b4f0>{keventd_create_kthread+0} 
<ffffffff801476af>{worker_thread+0}
<5>        <ffffffff8014b4f0>{keventd_create_kthread+0} 
<ffffffff8014b4c7>{kthread+200}
<5>        <ffffffff80110f47>{child_rip+8} 
<ffffffff8014b4f0>{keventd_create_kthread+0}
<5>        <ffffffff8014b3ff>{kthread+0} <ffffffff80110f3f>{child_rip+0}
<5>


From mshefty at ichips.intel.com  Thu Jun 14 09:39:25 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 14 Jun 2007 09:39:25 -0700
Subject: [ofa-general] Re: [PATCH draft,	untested] ehca srq emulation
	(for IPoIB CM)
In-Reply-To: <20070613174930.GE12277@mellanox.co.il>
References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com>	<466F36C8.5010507@linux.vnet.ibm.com>	<20070613163821.GB12277@mellanox.co.il>
	<adafy4v69ig.fsf@cisco.com> <20070613174930.GE12277@mellanox.co.il>
Message-ID: <46716F3D.7050206@ichips.intel.com>

> Note this is not a full emulation, just close enough to make IPoIB CM work.

If the emulation is only enough for IPoIB, then I think it belongs in 
IPoIB, and not in every HCA driver.

- Sean


From andrey.slepuhin at t-platforms.ru  Thu Jun 14 09:41:06 2007
From: andrey.slepuhin at t-platforms.ru (Andrey Slepuhin)
Date: Thu, 14 Jun 2007 20:41:06 +0400
Subject: [ofa-general] Problems with mlx4
In-Reply-To: <adaejkf7v3g.fsf@cisco.com>
References: <467005B9.8070708@t-platforms.ru> <adaejkf7v3g.fsf@cisco.com>
Message-ID: <46716FA2.7020805@t-platforms.ru>

Hi Roland,

I upgraded the switch FW to version 1.0 and applied your latest mlx4 
patches, but I'm still in the same situation - the link is down. What 
else can go wrong?

Thanks,
Andrey

Roland Dreier wrote:
>  > I just setup a test cluster using ConnectX cards, but I can not get
>  > link up.
>
> Most likely you need to update your switch FW.  You need Anafa2 FW
> version 1.0 to negotiate a DDR link with ConnectX.
>
> BTW what firmware version do you have on your HCAs?  You probably want
> to update to 2.0.156 (the mlx4 driver won't work with 2.0.158 for a
> day or two still) so that you don't have to monkey around with
> hard-coding your switch ports to DDR only.
>
>  - R.
>   


From rdreier at cisco.com  Thu Jun 14 09:44:29 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 14 Jun 2007 09:44:29 -0700
Subject: [ofa-general] Problems with mlx4
In-Reply-To: <46716FA2.7020805@t-platforms.ru> (Andrey Slepuhin's message of
	"Thu, 14 Jun 2007 20:41:06 +0400")
References: <467005B9.8070708@t-platforms.ru> <adaejkf7v3g.fsf@cisco.com>
	<46716FA2.7020805@t-platforms.ru>
Message-ID: <ada645q4haq.fsf@cisco.com>

 > I upgraded the switch FW to version 1.0 and applied your latest mlx4
 > patches, but I'm still in the same situation - the link is down. What
 > else can go wrong?

Please read my whole email, especially this part:

  BTW what firmware version do you have on your HCAs?  You probably want
  to update to 2.0.156 (the mlx4 driver won't work with 2.0.158 for a
  day or two still) so that you don't have to monkey around with
  hard-coding your switch ports to DDR only.


From rdreier at cisco.com  Thu Jun 14 09:48:10 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 14 Jun 2007 09:48:10 -0700
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <46716F3D.7050206@ichips.intel.com> (Sean Hefty's message of "Thu,
	14 Jun 2007 09:39:25 -0700")
References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com>
	<466F36C8.5010507@linux.vnet.ibm.com>
	<20070613163821.GB12277@mellanox.co.il> <adafy4v69ig.fsf@cisco.com>
	<20070613174930.GE12277@mellanox.co.il>
	<46716F3D.7050206@ichips.intel.com>
Message-ID: <ada1wge4h4l.fsf@cisco.com>

 > > Note this is not a full emulation, just close enough to make IPoIB CM work.

 > If the emulation is only enough for IPoIB, then I think it belongs in
 > IPoIB, and not in every HCA driver.

I was thinking the same thing.  Otherwise you're just setting a booby
trap for someone who tries to use SRQ for something else.

However it may be a good approach to put an abstraction layer in IPoIB
so that the CM code can use an SRQ-like interface to both HCAs that
support SRQ and HCAs that don't.

 - R.


From mshefty at ichips.intel.com  Thu Jun 14 10:01:31 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 14 Jun 2007 10:01:31 -0700
Subject: [ofa-general] crash in ipoib
In-Reply-To: <46716AE6.9050804@ichips.intel.com>
References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>
	<46716AE6.9050804@ichips.intel.com>
Message-ID: <4671746B.2050903@ichips.intel.com>

I don't know if this is the issue, but here's the code from 
ipoib_cm_req_handler():

	ret = ipoib_cm_send_rep(dev, cm_id, p->qp,
				&event->param.req_rcvd, psn);
	if (ret) {
		ipoib_warn(priv, "failed to send REP: %d\n", ret);
		goto err_rep;
	}

	cm_id->context = p;
	p->jiffies = jiffies;
	p->state = IPOIB_CM_RX_LIVE;
	spin_lock_irq(&priv->lock);
	if (list_empty(&priv->cm.passive_ids))
		queue_delayed_work(ipoib_workqueue,
				   &priv->cm.stale_task,
				   IPOIB_CM_RX_DELAY);
	list_add(&p->list, &priv->cm.passive_ids);
	spin_unlock_irq(&priv->lock);


Note that once the REP is sent, the QP is connected.  Data can be 
received, we can have events, we can be disconnected, whatever... but 
we're not yet on the passive_ids list.

- Sean


From mst at dev.mellanox.co.il  Thu Jun 14 10:35:23 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Jun 2007 20:35:23 +0300
Subject: [ofa-general] crash in ipoib
In-Reply-To: <46716AE6.9050804@ichips.intel.com>
References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>
	<46716AE6.9050804@ichips.intel.com>
Message-ID: <20070614173522.GA29561@mellanox.co.il>


> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [ofa-general] crash in ipoib
> 
> Here's the capture from the network console

Aha, cool.

> <5> [...network console startup...]
> <5> Unable to handle kernel NULL pointer dereference at 0000000000000008 
> RIP:
> <5> <4>Warning: kfree_skb on hard IRQ ffffffff802bb055
> <5> Warning: kfree_skb on hard IRQ ffffffff802bb055
> <5> Warning: kfree_skb on hard IRQ ffffffff802bb055
> <5> Warning: kfree_skb on hard IRQ ffffffff802bb055

Weird stuff, it looks like we are freeing an skb with
a destructor. Where does ffffffff802bb055 point to?
Since 2.6.12 we'd get a proper stack dump for this, but
in 2.6.9 need to decode it manually.

> <5> <ffffffffa0146b60>{:ib_ipoib:ipoib_cm_handle_rx_wc+378}
> <5> PML4 dcc2f067 PGD 102087067 PMD 0
> <5> Oops: 0002 [1] SMP
> <5> CPU 1
> <5> Modules linked in: netconsole det(U) nfs lockd nfs_acl autofs4 
> i2c_dev i2c_core sunrpc rdma_ucm(U) ib_vnic(U) ib_sdp(U) rdma_cm(U) 
> iw_cm(U) ib_addr(U) ib_local_sa(U) ib_ipath(U) ipt_REJECT ipt_state 
> ip_conntrack iptable_filter ip_tables dm_mirror dm_mod button battery ac 
> joydev uhci_hcd ehci_hcd hw_random ib_mthca(U) ib_ipoib(U) ib_umad(U) 
> ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) md5 ipv6 
> e1000(U) ahci ext3 jbd ata_piix libata sd_mod scsi_mod
> <5> Pid: 1584, comm: ib_cm/1 Tainted: PF     2.6.9-42.ELsmp
> <5> RIP: 0010:[<ffffffffa0146b60>] 
> <ffffffffa0146b60>{:ib_ipoib:ipoib_cm_handle_rx_wc+378}
> <5> RSP: 0018:0000010005d7b940  EFLAGS: 00010046
> <5> RAX: 0000000000000000 RBX: 000001010d3a8e00 RCX: 0000000000000000
> <5> RDX: 000001010d3a8e10 RSI: 00000101191b3990 RDI: 00000101191b3380
> <5> RBP: 000001011302b680 R08: 0000000000000010 R09: 0000010119301e00
> <5> R10: 000000000000001f R11: 00000000000000e4 R12: 0000000000000206
> <5> R13: 00000101191b3380 R14: 00000101191b3000 R15: 0000000000000030
> <5> FS:  0000000000000000(0000) GS:ffffffff804e5100(0000) 
> knlGS:0000000000000000
> <5> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> <5> CR2: 0000000000000008 CR3: 0000000005d68000 CR4: 00000000000006e0
> <5> Process ib_cm/1 (pid: 1584, threadinfo 0000010119c14000, task 
> 000001011a9c5030)
> <5> Stack: 0000000000000206 0000000000000030 0000000000000206 
> 0000010110e8fb00
> <5>        00000101191b37b8 0000000000000206 00000000dc62401c 
> 0000000400000206
> <5>        0100000082000001 000000041a9121c0

Where does :ib_ipoib:ipoib_cm_handle_rx_wc+378 point to on your system?


-- 
MST


From xma at us.ibm.com  Thu Jun 14 10:38:54 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Thu, 14 Jun 2007 10:38:54 -0700
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <ada1wge4h4l.fsf@cisco.com>
Message-ID: <OFC3189839.AA4A12A3-ON872572FA.006060B5-882572FA.006648DF@us.ibm.com>


>  > > Note this is not a full emulation, just close enough to make
> IPoIB CM work.
>
>  > If the emulation is only enough for IPoIB, then I think it belongs in
>  > IPoIB, and not in every HCA driver.
>
> I was thinking the same thing.  Otherwise you're just setting a booby
> trap for someone who tries to use SRQ for something else.
>
> However it may be a good approach to put an abstraction layer in IPoIB
> so that the CM code can use an SRQ-like interface to both HCAs that
> support SRQ and HCAs that don't.
>
>  - R.

That's an interesting point. How to explore different HCAs hardware
features in ULPs is definitely worth to think about deeply.

Thanks
Shirley
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070614/c4c7bfc4/attachment.html>

From mshefty at ichips.intel.com  Thu Jun 14 10:47:08 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 14 Jun 2007 10:47:08 -0700
Subject: [ofa-general] crash in ipoib
In-Reply-To: <20070614173522.GA29561@mellanox.co.il>
References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>	<46716AE6.9050804@ichips.intel.com>
	<20070614173522.GA29561@mellanox.co.il>
Message-ID: <46717F1C.3010604@ichips.intel.com>

> Where does :ib_ipoib:ipoib_cm_handle_rx_wc+378 point to on your system?

It points to list_move below:

	if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) {
		p = wc->qp->qp_context;
		if (p && time_after_eq(jiffies, p->jiffies +
					IPOIB_CM_RX_UPDATE_TIME)) {
			spin_lock_irqsave(&priv->lock, flags);
			p->jiffies = jiffies;
			/* Move this entry to list head, but do not
			   re-add it
			 * if it has been moved out of list. */
			if (p->state == IPOIB_CM_RX_LIVE)
 >>>				list_move(&p->list,
					  priv->cm.passive_ids);
			spin_unlock_irqrestore(&priv->lock, flags);
		}
	}

There appears to be a race in ipoib_cm_req_handler() setting the 
ipoib_cm_rx state outside of a lock, and before the item it added to a 
list.  I think this could cause list_move() call above to oops.  I think 
ipoib_cm_req_handler() needs changes, but I'm not sure if this is enough 
(patch below has line wrap issues...):

@@ -291,16 +291,16 @@ static int ipoib_cm_req_handler(struct ib_cm_id 
*cm_id, st
         if (ret)
                 goto err_modify;

+       cm_id->context = p;
         ret = ipoib_cm_send_rep(dev, cm_id, p->qp,
				&event->param.req_rcvd, psn);
         if (ret) {
                 ipoib_warn(priv, "failed to send REP: %d\n", ret);
                 goto err_rep;
         }

-       cm_id->context = p;
         p->jiffies = jiffies;
-       p->state = IPOIB_CM_RX_LIVE;
         spin_lock_irq(&priv->lock);
+       p->state = IPOIB_CM_RX_LIVE;
         if (list_empty(&priv->cm.passive_ids))
                 queue_delayed_work(ipoib_workqueue,
                                    &priv->cm.stale_task,
				   IPOIB_CM_RX_DELAY);

- Sean


From mst at dev.mellanox.co.il  Thu Jun 14 10:50:30 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Jun 2007 20:50:30 +0300
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <ada1wge4h4l.fsf@cisco.com>
References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com>
	<466F36C8.5010507@linux.vnet.ibm.com>
	<20070613163821.GB12277@mellanox.co.il> <adafy4v69ig.fsf@cisco.com>
	<20070613174930.GE12277@mellanox.co.il>
	<46716F3D.7050206@ichips.intel.com> <ada1wge4h4l.fsf@cisco.com>
Message-ID: <20070614175030.GB29561@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [ofa-general] Re: [PATCH draft,?untested] ehca srq emulation (for IPoIB CM)
> 
>  > > Note this is not a full emulation, just close enough to make IPoIB CM work.
> 
>  > If the emulation is only enough for IPoIB, then I think it belongs in
>  > IPoIB, and not in every HCA driver.

"every HCA driver" is an exagerration:
1. ehca is the only one that does not support SRQ in hardware
2. emulation (and ipoib nosrq patches, too) work by assuming only a
   small number of connections and a huge amount of memory.
   This is true for systems where ehca is used but not in the general case

> I was thinking the same thing.  Otherwise you're just setting a booby
> trap for someone who tries to use SRQ for something else.

The emulation is quite close IMO - most likely it will just work,
but if not, we can just document the limitations.

In case a ULP wants to avoid using the emulation, we could have a "SRQ is
emulated bit" to distinguish between these.

> However it may be a good approach to put an abstraction layer in IPoIB
> so that the CM code can use an SRQ-like interface to both HCAs that
> support SRQ and HCAs that don't.

2 issues with this:

1. I think other ULPs can benefit from this emulation too.
2. The emulation does need help from hardware (e.g. I use a qp token
   in CQE for QP lookups and SRQ detection).
   Implementing it on top of exiting verbs can be done
   only if verbs interface is extended.


-- 
MST


From ahubbe at iol.unh.edu  Thu Jun 14 10:56:43 2007
From: ahubbe at iol.unh.edu (Allen Hubbe)
Date: Thu, 14 Jun 2007 13:56:43 -0400 (EDT)
Subject: [ofa-general] memory leak in librdmacm, libibverbs
Message-ID: <Pine.LNX.4.64.0706141337570.9826@postal.iol.unh.edu>


I found a memory leak that is present in at least librdmacm and
libibverbs.  The libraries allow a user to get a device list, and later
free the device list.  In freeing the device list, the devices in the list
are not freed, causing a memory leak.  It would not be wise to free all
the devices in the list, either, because the user very likely wants to
continue using one of the devices that was returned in the list.  I think
the intent of the methods was for the list to live the life of the
program, but that might not be the way it gets used.  I included a short
example program, run on a machine with devices present it will consume all
available memory.

---------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <rdma/rdma_cma.h>

int main(){

      struct ibv_context **ibv_devices;
      int num_devices;

      ibv_devices = rdma_get_devices(&num_devices);
      if(ibv_devices == NULL)
      {
          printf("no devices found, exiting\n");
          exit(1);
      }
      else
      {
          while(1)
          {
              rdma_free_devices(ibv_devices);
              ibv_devices = rdma_get_devices(NULL);
          }
      }

      return 0;
}
---------------------------------------------------------------


From rdreier at cisco.com  Thu Jun 14 11:12:14 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 14 Jun 2007 11:12:14 -0700
Subject: [ofa-general] memory leak in librdmacm, libibverbs
In-Reply-To: <Pine.LNX.4.64.0706141337570.9826@postal.iol.unh.edu> (Allen
	Hubbe's message of "Thu, 14 Jun 2007 13:56:43 -0400 (EDT)")
References: <Pine.LNX.4.64.0706141337570.9826@postal.iol.unh.edu>
Message-ID: <adalkemjthd.fsf@cisco.com>

 > I found a memory leak that is present in at least librdmacm and
 > libibverbs.  The libraries allow a user to get a device list, and later
 > free the device list.  In freeing the device list, the devices in the list
 > are not freed, causing a memory leak.  It would not be wise to free all
 > the devices in the list, either, because the user very likely wants to
 > continue using one of the devices that was returned in the list.  I think
 > the intent of the methods was for the list to live the life of the
 > program, but that might not be the way it gets used.

I don't see it.  Both rdma_get_devices() and ibv_get_device_list()
don't allocate anything beyond the list they return to the caller.
The device structures are just allocated once when the libraries
discover the devices.  And rdma_free_devices() and
ibv_free_device_list() both free exactly what the corresponding get
function allocated.

 > I included a short example program, run on a machine with devices
 > present it will consume all available memory.

I ran this program on a system where rdma_get_devices() reports 1
device found, and the memory used by the process does not increase
after startup, even after running for a few minutes.

 - R.


From rdreier at cisco.com  Thu Jun 14 11:24:37 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 14 Jun 2007 11:24:37 -0700
Subject: [ofa-general] memory leak in librdmacm, libibverbs
In-Reply-To: <adalkemjthd.fsf@cisco.com> (Roland Dreier's message of "Thu,
	14 Jun 2007 11:12:14 -0700")
References: <Pine.LNX.4.64.0706141337570.9826@postal.iol.unh.edu>
	<adalkemjthd.fsf@cisco.com>
Message-ID: <adahcpajswq.fsf@cisco.com>

Please don't Cc: iwarplab at iol.unh.edu if I'm going to get a bounce
about a subscribers-only list when I reply to your email.

 - R.


From mst at dev.mellanox.co.il  Thu Jun 14 11:44:45 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Jun 2007 21:44:45 +0300
Subject: [ofa-general] crash in ipoib
In-Reply-To: <46717F1C.3010604@ichips.intel.com>
References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>
	<46716AE6.9050804@ichips.intel.com>
	<20070614173522.GA29561@mellanox.co.il>
	<46717F1C.3010604@ichips.intel.com>
Message-ID: <20070614184445.GC29561@mellanox.co.il>

> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [ofa-general] crash in ipoib
> 
> >Where does :ib_ipoib:ipoib_cm_handle_rx_wc+378 point to on your system?
> 
> It points to list_move below:
> 
> 	if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) {
> 		p = wc->qp->qp_context;
> 		if (p && time_after_eq(jiffies, p->jiffies +
> 					IPOIB_CM_RX_UPDATE_TIME)) {
> 			spin_lock_irqsave(&priv->lock, flags);
> 			p->jiffies = jiffies;
> 			/* Move this entry to list head, but do not
> 			   re-add it
> 			 * if it has been moved out of list. */
> 			if (p->state == IPOIB_CM_RX_LIVE)
> >>>				list_move(&p->list,
> 					  priv->cm.passive_ids);
> 			spin_unlock_irqrestore(&priv->lock, flags);
> 		}
> 	}
> 
> There appears to be a race in ipoib_cm_req_handler() setting the 
> ipoib_cm_rx state outside of a lock, and before the item it added to a 
> list.  I think this could cause list_move() call above to oops.

Hmm,yes, looks like you are right.

> I think 
> ipoib_cm_req_handler() needs changes, but I'm not sure if this is enough 
> (patch below has line wrap issues...):
> 
> @@ -291,16 +291,16 @@ static int ipoib_cm_req_handler(struct ib_cm_id 
> *cm_id, st
>         if (ret)
>                 goto err_modify;
> 
> +       cm_id->context = p;
>         ret = ipoib_cm_send_rep(dev, cm_id, p->qp,
> 				&event->param.req_rcvd, psn);
>         if (ret) {
>                 ipoib_warn(priv, "failed to send REP: %d\n", ret);
>                 goto err_rep;
>         }
> 
> -       cm_id->context = p;
>         p->jiffies = jiffies;
> -       p->state = IPOIB_CM_RX_LIVE;
>         spin_lock_irq(&priv->lock);
> +       p->state = IPOIB_CM_RX_LIVE;
>         if (list_empty(&priv->cm.passive_ids))
>                 queue_delayed_work(ipoib_workqueue,
>                                    &priv->cm.stale_task,
> 				   IPOIB_CM_RX_DELAY);

I'm not sure this is enough. Maybe the following is needed?
Can you test it?

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 076a0bb..2509bb8 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -320,12 +320,6 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 	if (ret)
 		goto err_modify;
 
-	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
-	if (ret) {
-		ipoib_warn(priv, "failed to send REP: %d\n", ret);
-		goto err_rep;
-	}
-
 	cm_id->context = p;
 	p->jiffies = jiffies;
 	p->state = IPOIB_CM_RX_LIVE;
@@ -335,6 +329,13 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 				   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
 	list_add(&p->list, &priv->cm.passive_ids);
 	spin_unlock_irq(&priv->lock);
+
+	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
+	if (ret) {
+		/* TODO: error handling is wrong here */
+		ipoib_warn(priv, "failed to send REP: %d\n", ret);
+		goto err_rep;
+	}
 	return 0;
 
 err_rep:

-- 
MST


From andrey.slepuhin at t-platforms.ru  Thu Jun 14 11:45:47 2007
From: andrey.slepuhin at t-platforms.ru (Andrey Slepuhin)
Date: Thu, 14 Jun 2007 22:45:47 +0400
Subject: [ofa-general] Problems with mlx4
In-Reply-To: <ada645q4haq.fsf@cisco.com>
References: <467005B9.8070708@t-platforms.ru>
	<adaejkf7v3g.fsf@cisco.com>	<46716FA2.7020805@t-platforms.ru>
	<ada645q4haq.fsf@cisco.com>
Message-ID: <46718CDB.2050809@t-platforms.ru>

Aha, just got latest firmware tools from Mellanox with ConnectX support 
and realized that the firmware was 2.0.147... After upgrading (but to 
2.0.158 - that's the only firmware revision I got from Mellanox) the 
link was initialized, so I started to build the userspace... Thanks, Roland!

Best regards,
Andrey

Roland Dreier wrote:
>  > I upgraded the switch FW to version 1.0 and applied your latest mlx4
>  > patches, but I'm still in the same situation - the link is down. What
>  > else can go wrong?
>
> Please read my whole email, especially this part:
>
>   BTW what firmware version do you have on your HCAs?  You probably want
>   to update to 2.0.156 (the mlx4 driver won't work with 2.0.158 for a
>   day or two still) so that you don't have to monkey around with
>   hard-coding your switch ports to DDR only.
>   


From mst at dev.mellanox.co.il  Thu Jun 14 12:08:37 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Jun 2007 22:08:37 +0300
Subject: [ofa-general] crash in ipoib
In-Reply-To: <20070614184445.GC29561@mellanox.co.il>
References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>
	<46716AE6.9050804@ichips.intel.com>
	<20070614173522.GA29561@mellanox.co.il>
	<46717F1C.3010604@ichips.intel.com>
	<20070614184445.GC29561@mellanox.co.il>
Message-ID: <20070614190837.GA2207@mellanox.co.il>

> I'm not sure this is enough. Maybe the following is needed?
> Can you test it?

And here's a version with error handling fixed.
Sean, does this solve your crash?

--->

Move RX to passive_list before sending a REP.

Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>

---

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 076a0bb..2be2c76 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -320,12 +320,6 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 	if (ret)
 		goto err_modify;
 
-	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
-	if (ret) {
-		ipoib_warn(priv, "failed to send REP: %d\n", ret);
-		goto err_rep;
-	}
-
 	cm_id->context = p;
 	p->jiffies = jiffies;
 	p->state = IPOIB_CM_RX_LIVE;
@@ -335,6 +329,13 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 				   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
 	list_add(&p->list, &priv->cm.passive_ids);
 	spin_unlock_irq(&priv->lock);
+
+	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
+	if (ret) {
+		ipoib_warn(priv, "failed to send REP: %d\n", ret);
+		if (ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE))
+			ipoib_warn(priv, "unable to move qp to error state\n");
+	}
 	return 0;
 
 err_rep:

-- 
MST


From rdreier at cisco.com  Thu Jun 14 12:14:00 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 14 Jun 2007 12:14:00 -0700
Subject: [ofa-general] crash in ipoib
In-Reply-To: <20070614190837.GA2207@mellanox.co.il> (Michael S. Tsirkin's
	message of "Thu, 14 Jun 2007 22:08:37 +0300")
References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>
	<46716AE6.9050804@ichips.intel.com>
	<20070614173522.GA29561@mellanox.co.il>
	<46717F1C.3010604@ichips.intel.com>
	<20070614184445.GC29561@mellanox.co.il>
	<20070614190837.GA2207@mellanox.co.il>
Message-ID: <adad4zyjqmf.fsf@cisco.com>

 > +	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
 > +	if (ret) {
 > +		ipoib_warn(priv, "failed to send REP: %d\n", ret);
 > +		if (ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE))
 > +			ipoib_warn(priv, "unable to move qp to error state\n");
 > +	}

So if sending a rep fails, this leaves p on the passive_ids list with
state IPOIB_CM_RX_LIVE.  Does it ever get cleaned up?

The old code used to destroy the qp and free p if sending a REP failed.

 - R.


From mshefty at ichips.intel.com  Thu Jun 14 12:35:27 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 14 Jun 2007 12:35:27 -0700
Subject: [ofa-general] crash in ipoib
In-Reply-To: <20070614190837.GA2207@mellanox.co.il>
References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>	<46716AE6.9050804@ichips.intel.com>	<20070614173522.GA29561@mellanox.co.il>	<46717F1C.3010604@ichips.intel.com>	<20070614184445.GC29561@mellanox.co.il>
	<20070614190837.GA2207@mellanox.co.il>
Message-ID: <4671987F.3080403@ichips.intel.com>

> And here's a version with error handling fixed.
> Sean, does this solve your crash?

We'll test a patch once we can agree on it.  It can take up to a day for 
us to hit this issue though.

We had created the following to try, which leaves the error handling the 
same.  Which approach do you prefer?

@@ -291,16 +291,17 @@ static int ipoib_cm_req_handler(struct ib_cm_id
         if (ret)
                 goto err_modify;

+       cm_id->context = p;
+       spin_lock_irq(&priv->lock);
         ret = ipoib_cm_send_rep(dev, cm_id, p->qp,
				&event->param.req_rcvd, psn);
         if (ret) {
+               spin_unlock_irq(&priv->lock);
                 ipoib_warn(priv, "failed to send REP: %d\n", ret);
                 goto err_rep;
         }

-       cm_id->context = p;
         p->jiffies = jiffies;
         p->state = IPOIB_CM_RX_LIVE;
-       spin_lock_irq(&priv->lock);
         if (list_empty(&priv->cm.passive_ids))
                 queue_delayed_work(ipoib_workqueue,
                                    &priv->cm.stale_task,
				   IPOIB_CM_RX_DELAY);

- Sean


From mst at dev.mellanox.co.il  Thu Jun 14 13:15:38 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Jun 2007 23:15:38 +0300
Subject: [ofa-general] crash in ipoib
In-Reply-To: <adad4zyjqmf.fsf@cisco.com>
References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>
	<46716AE6.9050804@ichips.intel.com>
	<20070614173522.GA29561@mellanox.co.il>
	<46717F1C.3010604@ichips.intel.com>
	<20070614184445.GC29561@mellanox.co.il>
	<20070614190837.GA2207@mellanox.co.il> <adad4zyjqmf.fsf@cisco.com>
Message-ID: <20070614201538.GB2207@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [ofa-general] crash in ipoib
> 
>  > +	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
>  > +	if (ret) {
>  > +		ipoib_warn(priv, "failed to send REP: %d\n", ret);
>  > +		if (ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE))
>  > +			ipoib_warn(priv, "unable to move qp to error state\n");
>  > +	}
> 
> So if sending a rep fails, this leaves p on the passive_ids list with
> state IPOIB_CM_RX_LIVE.  Does it ever get cleaned up?

Yes, in the usual way: upon the last wqe reached event.

> The old code used to destroy the qp and free p if sending a REP failed.

This was really a wrong thing to do - destroying QP connected to srq must be
done with the draining procedure, in case the remote violates the protocol
and sends us packets for this QPN.

-- 
MST


From mst at dev.mellanox.co.il  Thu Jun 14 13:20:07 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Jun 2007 23:20:07 +0300
Subject: [ofa-general] crash in ipoib
In-Reply-To: <4671987F.3080403@ichips.intel.com>
References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>
	<46716AE6.9050804@ichips.intel.com>
	<20070614173522.GA29561@mellanox.co.il>
	<46717F1C.3010604@ichips.intel.com>
	<20070614184445.GC29561@mellanox.co.il>
	<20070614190837.GA2207@mellanox.co.il>
	<4671987F.3080403@ichips.intel.com>
Message-ID: <20070614202006.GC2207@mellanox.co.il>

> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [ofa-general] crash in ipoib
> 
> >And here's a version with error handling fixed.
> >Sean, does this solve your crash?
> 
> We'll test a patch once we can agree on it.  It can take up to a day for 
> us to hit this issue though.
> 
> We had created the following to try, which leaves the error handling the 
> same.  Which approach do you prefer?
> 
> @@ -291,16 +291,17 @@ static int ipoib_cm_req_handler(struct ib_cm_id
>         if (ret)
>                 goto err_modify;
> 
> +       cm_id->context = p;
> +       spin_lock_irq(&priv->lock);
>         ret = ipoib_cm_send_rep(dev, cm_id, p->qp,
> 				&event->param.req_rcvd, psn);
>         if (ret) {
> +               spin_unlock_irq(&priv->lock);
>                 ipoib_warn(priv, "failed to send REP: %d\n", ret);
>                 goto err_rep;
>         }
> 
> -       cm_id->context = p;
>         p->jiffies = jiffies;
>         p->state = IPOIB_CM_RX_LIVE;
> -       spin_lock_irq(&priv->lock);
>         if (list_empty(&priv->cm.passive_ids))
>                 queue_delayed_work(ipoib_workqueue,
>                                    &priv->cm.stale_task,
> 				   IPOIB_CM_RX_DELAY);
> 

I think my patch is more correct, but just for the sake of testing
yours should be sufficient as well.

-- 
MST


From pradeeps at linux.vnet.ibm.com  Thu Jun 14 15:46:25 2007
From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana)
Date: Thu, 14 Jun 2007 15:46:25 -0700
Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation
	(for IPoIB CM)
In-Reply-To: <20070614175030.GB29561@mellanox.co.il>
References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com>
	<466F36C8.5010507@linux.vnet.ibm.com>
	<20070613163821.GB12277@mellanox.co.il> <adafy4v69ig.fsf@cisco.com>
	<20070613174930.GE12277@mellanox.co.il>
	<46716F3D.7050206@ichips.intel.com> <ada1wge4h4l.fsf@cisco.com>
	<20070614175030.GB29561@mellanox.co.il>
Message-ID: <4671C541.4040503@linux.vnet.ibm.com>

Michael S. Tsirkin wrote:
>> Quoting Roland Dreier <rdreier at cisco.com>:
>> Subject: Re: [ofa-general] Re: [PATCH draft,?untested] ehca srq emulation (for IPoIB CM)
>>
>>  > > Note this is not a full emulation, just close enough to make IPoIB CM work.
>>
>>  > If the emulation is only enough for IPoIB, then I think it belongs in
>>  > IPoIB, and not in every HCA driver.
> 
> "every HCA driver" is an exagerration:
> 1. ehca is the only one that does not support SRQ in hardware
> 2. emulation (and ipoib nosrq patches, too) work by assuming only a
>    small number of connections and a huge amount of memory.
>    This is true for systems where ehca is used but not in the general case
> 

Pushing the changes into the driver is a potential maintenance
nightmare. How does one keep changes across layers in sync?

That was the reason I strived to use common code in the NOSRQ case; at
least  as much as possible and all of it in IPoIB.

In the emulation approach by apportioning off WRs across QPs, we will be
sacrificing performance by dropping packets or returning an RNR on a
really busy QP. As I see it, the alternative is to allocate a really big
SRQ, even when there are very few QPs and wasting a lot of the unused WRs.

Thus even with a small number of heavily used connections and huge
amounts of memory we will not be able to derive the performance
benefits that connected mode can potentially offer.

>> I was thinking the same thing.  Otherwise you're just setting a booby
>> trap for someone who tries to use SRQ for something else.
> 
> The emulation is quite close IMO - most likely it will just work,
> but if not, we can just document the limitations.
> 
> In case a ULP wants to avoid using the emulation, we could have a "SRQ is
> emulated bit" to distinguish between these.
> 
>> However it may be a good approach to put an abstraction layer in IPoIB
>> so that the CM code can use an SRQ-like interface to both HCAs that
>> support SRQ and HCAs that don't.
> 
> 2 issues with this:
> 
> 1. I think other ULPs can benefit from this emulation too.
> 2. The emulation does need help from hardware (e.g. I use a qp token
>    in CQE for QP lookups and SRQ detection).
>    Implementing it on top of exiting verbs can be done
>    only if verbs interface is extended.


Pradeep


From friedman at ucla.edu  Thu Jun 14 21:31:35 2007
From: friedman at ucla.edu (Scott A. Friedman)
Date: Thu, 14 Jun 2007 21:31:35 -0700
Subject: [ofa-general] iWarp cxgb3 firmware
Message-ID: <46721627.9000907@ucla.edu>

Hi

Is anyone using the cxgb3 module in rc4 or rc5? If so, where are you 
getting the correct firmware that it seems to want (4.2)? Chelsio is 
only distributing v4.1 on their web site. I would like to know since my 
iWarp nodes are currently stuck at rc3, whose cxgb3 needs version 4.0

Do these firmware versions make significant changes?

Thanks,
Scott


From mst at dev.mellanox.co.il  Thu Jun 14 22:18:46 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 15 Jun 2007 08:18:46 +0300
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <4671C541.4040503@linux.vnet.ibm.com>
References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com>
	<466F36C8.5010507@linux.vnet.ibm.com>
	<20070613163821.GB12277@mellanox.co.il> <adafy4v69ig.fsf@cisco.com>
	<20070613174930.GE12277@mellanox.co.il>
	<46716F3D.7050206@ichips.intel.com> <ada1wge4h4l.fsf@cisco.com>
	<20070614175030.GB29561@mellanox.co.il>
	<4671C541.4040503@linux.vnet.ibm.com>
Message-ID: <20070615051846.GG2207@mellanox.co.il>


> Pushing the changes into the driver is a potential maintenance
> nightmare. How does one keep changes across layers in sync?

We have different definitions of "across layers": in my code everything is kept
inside ehca. I call it a maintenance nightmare when there's code in IPoIB that
only ehca owners can test.

> That was the reason I strived to use common code in the NOSRQ case; at
> least  as much as possible and all of it in IPoIB.

And you ended up with a bigger patch.

> In the emulation approach by apportioning off WRs across QPs, we will be
> sacrificing performance by dropping packets or returning an RNR on a
> really busy QP. As I see it, the alternative is to allocate a really big
> SRQ, even when there are very few QPs and wasting a lot of the unused WRs.

As I said, there are obvious performance optimisatons to implement.
We can later add code in IPoIB that, for very large SRQ size,
will post WRs on demand. But at least that will be common code
that everyone can test.

-- 
MST


From vlad at lists.openfabrics.org  Fri Jun 15 02:42:19 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Fri, 15 Jun 2007 02:42:19 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070615-0200 daily build status
Message-ID: <20070615094219.3411CE6080B@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.12
Passed on ia64 with linux-2.6.12
Passed on x86_64 with linux-2.6.20
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on ia64 with linux-2.6.18
Passed on x86_64 with linux-2.6.18
Passed on ia64 with linux-2.6.13
Passed on powerpc with linux-2.6.19
Passed on ia64 with linux-2.6.14
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.17
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.16
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.13
Passed on ia64 with linux-2.6.15
Passed on ppc64 with linux-2.6.18
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.15
Passed on ia64 with linux-2.6.17
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.13
Passed on x86_64 with linux-2.6.21.1
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.16
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.14
Passed on x86_64 with linux-2.6.15
Passed on ppc64 with linux-2.6.17
Passed on x86_64 with linux-2.6.14
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.14
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From hanafim.ctr at asc.hpc.mil  Fri Jun 15 06:52:04 2007
From: hanafim.ctr at asc.hpc.mil (MAHMOUD HANAFI)
Date: Fri, 15 Jun 2007 09:52:04 -0400
Subject: [ofa-general] OFED SRP Frame/MTU tunning
Message-ID: <46729984.3070101@asc.hpc.mil>

All,

I would like to configure SRP to use frame size of 1k,it defaults to 2K. Is this an options that can 
be set/configured?

Thanks,
-- 
Mahmoud Hanafi
Senior System Administrator
ASC/MSRC
www.asc.hpc.mil
2435 5th Street
WPAFB, OHIO 45433
(937) 255-1536


From chas at cmf.nrl.navy.mil  Fri Jun 15 07:33:29 2007
From: chas at cmf.nrl.navy.mil (chas williams - CONTRACTOR)
Date: Fri, 15 Jun 2007 10:33:29 -0400
Subject: [ofa-general] OFED SRP Frame/MTU tunning 
In-Reply-To: <46729984.3070101@asc.hpc.mil> 
Message-ID: <200706151433.l5FEXT0X032144@cmf.nrl.navy.mil>

In message <46729984.3070101 at asc.hpc.mil>,MAHMOUD HANAFI writes:
>I would like to configure SRP to use frame size of 1k,it defaults to 2K. Is th
>is an options that can 
>be set/configured?

apply this patch.  i should have made this a per login item though.

--- a/drivers/infiniband/ulp/srp/ib_srp.c.orig	2006-12-21 14:15:33.728164124 -0500
+++ b/drivers/infiniband/ulp/srp/ib_srp.c	2006-12-21 15:26:44.234250010 -0500
@@ -83,6 +83,10 @@
 MODULE_PARM_DESC(mellanox_workarounds,
 		 "Enable workarounds for Mellanox SRP target bugs if != 0");
 
+static int tavor_quirk = 0;
+module_param_named(tavor_quirk, tavor_quirk, int, 0644);
+MODULE_PARM_DESC(tavor_quirk, "Tavor performance quirk: limit MTU to 1K if > 0");
+
 static const u8 mellanox_oui[3] = { 0x00, 0x02, 0xc9 };
 
 static void srp_add_one(struct ib_device *device);
@@ -256,8 +260,14 @@
 	target->status = status;
 	if (status)
 		printk(KERN_ERR PFX "Got failed path rec status %d\n", status);
-	else
+	else {
 		target->path = *pathrec;
+		if (tavor_quirk) {
+			if (target->path.mtu > IB_MTU_1024)
+				target->path.mtu = IB_MTU_1024;
+		}
+	}
+	
 	complete(&target->done);
 }
 

From jsquyres at cisco.com  Fri Jun 15 08:11:20 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Fri, 15 Jun 2007 11:11:20 -0400
Subject: [ofa-general] http://git.openfabrics.org/
Message-ID: <A85B03FF-E321-4037-939C-917B9B483DED@cisco.com>

I notice that http://git.openfabrics.org/ shows the main OFA web  
site, but http://git.openfabrics.org/git/ shows all the git  
repositories.

Can a redirect be installed such that http://git.openfabrics.org/ is  
automatically sent to http://git.openfabrics.org/git/?

I think that would be a little more intuitive.

Thanks!

-- 
Jeff Squyres
Cisco Systems


From swise at opengridcomputing.com  Fri Jun 15 08:27:51 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 15 Jun 2007 10:27:51 -0500
Subject: [ofa-general] iWarp cxgb3 firmware
In-Reply-To: <46721627.9000907@ucla.edu>
References: <46721627.9000907@ucla.edu>
Message-ID: <4672AFF7.1020203@opengridcomputing.com>

Scott A. Friedman wrote:
> Hi
> 
> Is anyone using the cxgb3 module in rc4 or rc5? If so, where are you 
> getting the correct firmware that it seems to want (4.2)? Chelsio is 
> only distributing v4.1 on their web site. I would like to know since my 
> iWarp nodes are currently stuck at rc3, whose cxgb3 needs version 4.0
> 
> Do these firmware versions make significant changes?
> 

Unfortunately, yes, they do.  -rc4 and beyond requires firwmare version 
4.2 to fix some streaming mode->rdma mode connection transition fixes. 
And the interface between the driver and firmare changed which is why 
the requirement is there.  -rc4 and beyond _will not_ work with anything 
less than 4.2  I pushed the changes into -rc4 to get them in before 
ofed-1.2 ships as this was a critical bug.

The 4.2 firmware will be available this week from Chelsio.  Contact your 
chelsio rep to get it.  Perhaps you can get a pre-release version today...


Steve.


From mshefty at ichips.intel.com  Fri Jun 15 08:49:55 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 15 Jun 2007 08:49:55 -0700
Subject: [ofa-general] Re: [PATCH draft,	untested] ehca srq emulation
	(for IPoIB CM)
In-Reply-To: <20070615051846.GG2207@mellanox.co.il>
References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com>	<466F36C8.5010507@linux.vnet.ibm.com>	<20070613163821.GB12277@mellanox.co.il>
	<adafy4v69ig.fsf@cisco.com>	<20070613174930.GE12277@mellanox.co.il>	<46716F3D.7050206@ichips.intel.com>
	<ada1wge4h4l.fsf@cisco.com>	<20070614175030.GB29561@mellanox.co.il>	<4671C541.4040503@linux.vnet.ibm.com>
	<20070615051846.GG2207@mellanox.co.il>
Message-ID: <4672B523.50502@ichips.intel.com>

> We have different definitions of "across layers": in my code everything is kept
> inside ehca. I call it a maintenance nightmare when there's code in IPoIB that
> only ehca owners can test.

I disagree with the concept of adding this code into the lower level 
driver.  Posting a receive buffer onto a QP after it gets a receive 
completion is something the ULP can and should do.

SRQ support is optional.  There's no reason why the no-SRQ code in IPoIB 
can't be tested on all HCAs.  It's the SRQ code that requires specific 
hardware.

- Sean


From mst at dev.mellanox.co.il  Fri Jun 15 09:07:09 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 15 Jun 2007 19:07:09 +0300
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <4672B523.50502@ichips.intel.com>
References: <466F36C8.5010507@linux.vnet.ibm.com>
	<20070613163821.GB12277@mellanox.co.il> <adafy4v69ig.fsf@cisco.com>
	<20070613174930.GE12277@mellanox.co.il>
	<46716F3D.7050206@ichips.intel.com> <ada1wge4h4l.fsf@cisco.com>
	<20070614175030.GB29561@mellanox.co.il>
	<4671C541.4040503@linux.vnet.ibm.com>
	<20070615051846.GG2207@mellanox.co.il>
	<4672B523.50502@ichips.intel.com>
Message-ID: <20070615160709.GK2207@mellanox.co.il>

> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [ofa-general] Re: [PATCH draft,?untested] ehca srq emulation (for IPoIB CM)
> 
> >We have different definitions of "across layers": in my code everything is 
> >kept
> >inside ehca. I call it a maintenance nightmare when there's code in IPoIB 
> >that
> >only ehca owners can test.
> 
> I disagree with the concept of adding this code into the lower level 
> driver.  Posting a receive buffer onto a QP after it gets a receive 
> completion is something the ULP can and should do.
> 
> SRQ support is optional.  There's no reason why the no-SRQ code in IPoIB 
> can't be tested on all HCAs.  It's the SRQ code that requires specific 
> hardware.

Basically, I think that because of lack of SW level flow control,
generally IPoIB CM without SRQ does not make sense because of
the scalabilty problems.

However, the argument for adding this protocol revolves around the claim
that ehca (the only low level driver without SRQ that we have)
is used on systems with huge amount of memory and a small number
of nodes.

-- 
MST


From mst at dev.mellanox.co.il  Fri Jun 15 09:09:24 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 15 Jun 2007 19:09:24 +0300
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <ada1wge4h4l.fsf@cisco.com>
References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com>
	<466F36C8.5010507@linux.vnet.ibm.com>
	<20070613163821.GB12277@mellanox.co.il> <adafy4v69ig.fsf@cisco.com>
	<20070613174930.GE12277@mellanox.co.il>
	<46716F3D.7050206@ichips.intel.com> <ada1wge4h4l.fsf@cisco.com>
Message-ID: <20070615160924.GL2207@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [ofa-general] Re: [PATCH draft,?untested] ehca srq emulation (for IPoIB CM)
> 
>  > > Note this is not a full emulation, just close enough to make IPoIB CM work.
> 
>  > If the emulation is only enough for IPoIB, then I think it belongs in
>  > IPoIB, and not in every HCA driver.
> 
> I was thinking the same thing.  Otherwise you're just setting a booby
> trap for someone who tries to use SRQ for something else.

Would adding "wrs per qp" in srq attr structure solve this?

> However it may be a good approach to put an abstraction layer in IPoIB
> so that the CM code can use an SRQ-like interface to both HCAs that
> support SRQ and HCAs that don't.

If you are thinking about something like what was done to solve ipath
DMA problems, I'm for it. This will likely require minor extensions
to verbs API, like DMA thing did.

-- 
MST


From Kapil.Dukle at med.ge.com  Fri Jun 15 09:21:16 2007
From: Kapil.Dukle at med.ge.com (Dukle, Kapil (GE Healthcare))
Date: Fri, 15 Jun 2007 12:21:16 -0400
Subject: [ofa-general] Infiniband data transfer across different IB drivers
Message-ID: <DE4D96C8DFF3B94BACC3B6FE3B7D14010452E77F@CINMLVEM11.e2k.ad.ge.com>


Hi, 
I am currently experimenting with Infiniband data transfers across two
servers with different operating systems
and IB drivers.
 
Server A runs VxWorks 5.5 and uses Mellanox IB driver modules and VAPI
interface
  
Server B runs Linux 2.6.x and uses OFED 1.0 drivers and the OFED Verbs
API

Problem:
I have written code (that makes the respective Verbs calls) to setup
queue pairs and initialize them with the
destination queue pair number and lid. The connection type is IBV_QPT_RC
(Reliable Connection).
The traces seem to confirm that the destination qpn, lid values are
correct. The next thing
I try to do is to post send requests on Server A, and receive requests
on Server B. I then check the 
respective completion queues for any events. The problem is that I do
NOT see any completion events on 
the receive completion queue for Server B.

Questions:
- Are these two drivers (Mellanox VAPI and OFED) compatible with each
other in the first place?

- Is it possible to verify the two queue pairs are indeed "connected" to
each other?

- Can I enable some debug mechanism at the driver level to see what the
send/receive requests translate to, and what the underlying
errors could be (if any)?


Here is some information about the network that may help:

[root at ServerB ~]# ps -elf | grep opensm
4 S root      2695     1  0  32   - - 14738 stext  Jun14 ?
00:00:00 /usr/local/ofed/bin/opensm -t 200 -g 0
0 S root     12030 11992  0  76   0 - 13981 pipe_w 11:18 pts/1
00:00:00 grep opensm

[root at ServerB ~]# sminfo
sminfo: sm lid 0x1 sm guid 0x2c90200212251, activity count 40926
priority 1 state SMINFO_MASTER 3


[root at ServerB ~]# ibnetdiscover -v
        [1] {0002c90200212250}
DR path [0][1] -> new remote ca {00d01c000001010a} portnum 2 lid 0x2-0x2
"ServerA HCA-1 (Topspin HCA)"
        [2] {00d01c000001010a}
#
# Topology file: generated on Fri Jun 15 11:05:52 2007
#
# Max of 1 hops discovered
# Initiated from node 0002c90200212250 port 0002c90200212251

vendid=0xd01c
devid=0x5a44
sysimgguid=0xd01c000001010a
caguid=0xd01c000001010a
Ca      2 "H-00d01c000001010a"          # ServerA HCA-1 (Topspin HCA)
[2]     "H-0002c90200212250"[1]         # lid 2 lmc 0

vendid=0x2c9
devid=0x5a44
sysimgguid=0x2c90200212253
caguid=0x2c90200212250
Ca      2 "H-0002c90200212250"          # ServerB HCA-1
[1]     "H-00d01c000001010a"[2]         # lid 1 lmc 0


[root at ServerB ~]# ibcheckstate  -v

# Checking Ca: nodeguid 0x00d01c000001010a
Node check lid 2:  OK
Port check lid 2 port 2:  OK

# Checking Ca: nodeguid 0x0002c90200212250
Node check lid 1:  OK
Port check lid 1 port 1:  OK

## Summary: 2 nodes checked, 0 bad nodes found
##          2 ports checked, 0 ports with bad state found


[root at ServerB ~]# ibnodes -v
Ca      : 0x00d01c000001010a ports 2 "ServerA HCA-1 (Topspin HCA)"
Ca      : 0x0002c90200212250 ports 2 "ServerB HCA-1"


Please let me know if you need any other information. 


Thanks in advance,

Kapil

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070615/8984b886/attachment.html>

From mshefty at ichips.intel.com  Fri Jun 15 09:28:19 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 15 Jun 2007 09:28:19 -0700
Subject: [ofa-general] crash in ipoib
In-Reply-To: <20070614190837.GA2207@mellanox.co.il>
References: <000101c7ad1b$f8a9d370$9c98070a@amr.corp.intel.com>	<46716AE6.9050804@ichips.intel.com>	<20070614173522.GA29561@mellanox.co.il>	<46717F1C.3010604@ichips.intel.com>	<20070614184445.GC29561@mellanox.co.il>
	<20070614190837.GA2207@mellanox.co.il>
Message-ID: <4672BE23.3050809@ichips.intel.com>

> And here's a version with error handling fixed.
> Sean, does this solve your crash?

We've been running this patch since yesterday and haven't seen any 
crashes.  We'll continue testing this over the week-end.

- Sean


From sean.hefty at intel.com  Fri Jun 15 09:34:55 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 15 Jun 2007 09:34:55 -0700
Subject: [ofa-general] [PATCH] for-2.6.23 ib/umad: add partition support
Message-ID: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>

In order to support multiple partitions, user_mad needs to handle
different pkey's.  PKeys must be specified by the user when sending
and receiving MADs.  This bumps the ABI.

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
If there are no objections, I will queue this patch for 2.6.23, and request
a pull when 2.6.23 is closer.


 drivers/infiniband/core/user_mad.c |    5 +++--
 include/rdma/ib_user_mad.h         |    4 +++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index d97ded2..b0128fa 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -228,6 +228,7 @@ static void recv_handler(struct ib_mad_agent *agent,
 	packet->mad.hdr.lid 	  = cpu_to_be16(mad_recv_wc->wc->slid);
 	packet->mad.hdr.sl  	  = mad_recv_wc->wc->sl;
 	packet->mad.hdr.path_bits = mad_recv_wc->wc->dlid_path_bits;
+	packet->mad.hdr.pkey_index  = mad_recv_wc->wc->pkey_index;
 	packet->mad.hdr.grh_present = !!(mad_recv_wc->wc->wc_flags & IB_WC_GRH);
 	if (packet->mad.hdr.grh_present) {
 		struct ib_ah_attr ah_attr;
@@ -503,8 +504,8 @@ static ssize_t ib_umad_write(struct file *filp, const char __user
*buf,
 	data_len = count - sizeof (struct ib_user_mad) - hdr_len;
 	packet->msg = ib_create_send_mad(agent,
 					 be32_to_cpu(packet->mad.hdr.qpn),
-					 0, rmpp_active, hdr_len,
-					 data_len, GFP_KERNEL);
+					 packet->mad.hdr.pkey_index, rmpp_active,
+					 hdr_len, data_len, GFP_KERNEL);
 	if (IS_ERR(packet->msg)) {
 		ret = PTR_ERR(packet->msg);
 		goto err_ah;
diff --git a/include/rdma/ib_user_mad.h b/include/rdma/ib_user_mad.h
index d66b15e..e7bf6fa 100644
--- a/include/rdma/ib_user_mad.h
+++ b/include/rdma/ib_user_mad.h
@@ -43,7 +43,7 @@
  * Increment this value if any changes that break userspace ABI
  * compatibility are made.
  */
-#define IB_USER_MAD_ABI_VERSION	5
+#define IB_USER_MAD_ABI_VERSION	6
 
 /*
  * Make sure that all structs defined in this file remain laid out so
@@ -88,6 +88,8 @@ struct ib_user_mad_hdr {
 	__u8	traffic_class;
 	__u8	gid[16];
 	__be32	flow_label;
+	__u16   pkey_index;
+	__u8    reserved[6];
 };
 
 /**


-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 4370 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070615/89a18e94/attachment.bin>

From halr at voltaire.com  Fri Jun 15 09:36:41 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 15 Jun 2007 12:36:41 -0400
Subject: [ofa-general] Re: [PATCH] opensm/osm_helper.c: fixing PortInfo
	CapMask printing
In-Reply-To: <20070614113757.GA5908@sashak.voltaire.com>
References: <20070614113757.GA5908@sashak.voltaire.com>
Message-ID: <1181925385.5681.364065.camel@hal.voltaire.com>

On Thu, 2007-06-14 at 07:37, Sasha Khapyorsky wrote:
> When PortInfo:CapMask is zero, non-initialized local buffer (garbage)
> is printed. There is the fix.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Good find. Thanks. Applied.

-- Hal


From pradeeps at linux.vnet.ibm.com  Fri Jun 15 09:39:56 2007
From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana)
Date: Fri, 15 Jun 2007 09:39:56 -0700
Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation
	(for IPoIB CM)
In-Reply-To: <20070615051846.GG2207@mellanox.co.il>
References: <000a01c7ad25$c7c63780$9c98070a@amr.corp.intel.com>
	<466F36C8.5010507@linux.vnet.ibm.com>
	<20070613163821.GB12277@mellanox.co.il> <adafy4v69ig.fsf@cisco.com>
	<20070613174930.GE12277@mellanox.co.il>
	<46716F3D.7050206@ichips.intel.com> <ada1wge4h4l.fsf@cisco.com>
	<20070614175030.GB29561@mellanox.co.il>
	<4671C541.4040503@linux.vnet.ibm.com>
	<20070615051846.GG2207@mellanox.co.il>
Message-ID: <4672C0DC.8060308@linux.vnet.ibm.com>

Michael S. Tsirkin wrote:
>> Pushing the changes into the driver is a potential maintenance
>> nightmare. How does one keep changes across layers in sync?
> 
> We have different definitions of "across layers": in my code everything is kept
> inside ehca. I call it a maintenance nightmare when there's code in IPoIB that
> only ehca owners can test.
> 
>> That was the reason I strived to use common code in the NOSRQ case; at
>> least  as much as possible and all of it in IPoIB.
> 
> And you ended up with a bigger patch.
> 
>> In the emulation approach by apportioning off WRs across QPs, we will be
>> sacrificing performance by dropping packets or returning an RNR on a
>> really busy QP. As I see it, the alternative is to allocate a really big
>> SRQ, even when there are very few QPs and wasting a lot of the unused WRs.
> 
> As I said, there are obvious performance optimisatons to implement.
> We can later add code in IPoIB that, for very large SRQ size,
> will post WRs on demand. But at least that will be common code
> that everyone can test.
> 

Micheal,

That is exactly the point. I made some decisions that you may not
agree with entirely. Each solution has its benefits and draw
backs. I feel that for a "performance related patch", performance should
be one of the most important attributes. Some of the other issues are
secondary.

Here is a patch that is working and tested on multiple HCAs. If you
feel it needs to be embellished in certain ways, sure go ahead and
incorporate changes on top of my patch. After all this is open source
development.

At the same time I would have reservations about a patch that takes a
performance hit even though it may have other desirable attributes.

I have already incorporated several of your valuable suggestions into
this patch, even though I did not agree with all of them. I see no need
for us to take opposite sides on every issue, but rather we should work
more constructively.

This issue has dragged on for weeks without much forward progress. We
need to make some decisions and close out this issue at the earliest.

Pradeep


From robert.j.woodruff at intel.com  Fri Jun 15 09:43:51 2007
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Fri, 15 Jun 2007 09:43:51 -0700
Subject: [ofa-general] crash in ipoib
In-Reply-To: <4672BE23.3050809@ichips.intel.com>
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C0285B8C1@orsmsx418.amr.corp.intel.com>

Sean wrote,
>> And here's a version with error handling fixed.
>> Sean, does this solve your crash?

>We've been running this patch since yesterday and haven't seen any 
>crashes.  We'll continue testing this over the week-end.

>- Sean

This looks like it fixed the panic. 

Should we try to put out a new RC with this latest ipoib fix ?
I really think we need it in the release. If we could get another RC out
today,
that would only delay the release by a couple of more days and we could
release on next Friday rather than wed. and still give people a week to 
test the final RC.

woody


From halr at voltaire.com  Fri Jun 15 09:54:55 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 15 Jun 2007 12:54:55 -0400
Subject: [ofa-general] OFED SRP Frame/MTU tunning
In-Reply-To: <200706151433.l5FEXT0X032144@cmf.nrl.navy.mil>
References: <200706151433.l5FEXT0X032144@cmf.nrl.navy.mil>
Message-ID: <1181926495.5681.365291.camel@hal.voltaire.com>

On Fri, 2007-06-15 at 10:33, chas williams - CONTRACTOR wrote:
> In message <46729984.3070101 at asc.hpc.mil>,MAHMOUD HANAFI writes:
> >I would like to configure SRP to use frame size of 1k,it defaults to 2K. Is th
> >is an options that can 
> >be set/configured?
> 
> apply this patch.  i should have made this a per login item though.

If you are running OpenSM, you don't need this if you set enable_quirks
in opensm.opts.

-- Hal

> 
> --- a/drivers/infiniband/ulp/srp/ib_srp.c.orig	2006-12-21 14:15:33.728164124 -0500
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c	2006-12-21 15:26:44.234250010 -0500
> @@ -83,6 +83,10 @@
>  MODULE_PARM_DESC(mellanox_workarounds,
>  		 "Enable workarounds for Mellanox SRP target bugs if != 0");
>  
> +static int tavor_quirk = 0;
> +module_param_named(tavor_quirk, tavor_quirk, int, 0644);
> +MODULE_PARM_DESC(tavor_quirk, "Tavor performance quirk: limit MTU to 1K if > 0");
> +
>  static const u8 mellanox_oui[3] = { 0x00, 0x02, 0xc9 };
>  
>  static void srp_add_one(struct ib_device *device);
> @@ -256,8 +260,14 @@
>  	target->status = status;
>  	if (status)
>  		printk(KERN_ERR PFX "Got failed path rec status %d\n", status);
> -	else
> +	else {
>  		target->path = *pathrec;
> +		if (tavor_quirk) {
> +			if (target->path.mtu > IB_MTU_1024)
> +				target->path.mtu = IB_MTU_1024;
> +		}
> +	}
> +	
>  	complete(&target->done);
>  }
>  
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From sean.hefty at intel.com  Fri Jun 15 09:59:04 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 15 Jun 2007 09:59:04 -0700
Subject: [ofa-general] [PATCH 1/2] libibumad: fix partition support
In-Reply-To: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
Message-ID: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com>

Allow sending MADs on different partitions.  This requires kernel support,
so requires an ABI bump.  This patch maintains support for the previous
ABI.

Clarify that umad_set_pkey() takes a pkey index, and not the pkey itself.
(Unfortunately, the call is used both ways in the management tree.)

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
Additional changes are needed to retrieve the PKey and GID tables, so that
the PKeys and GIDs can be converted to the correct index.  These will come
in future patches.


 doc/libibumad.txt                   |    2 
 libibumad/include/infiniband/umad.h |    7 +
 libibumad/src/umad.c                |  192 +++++++++++++++++++++++++++--------
 3 files changed, 156 insertions(+), 45 deletions(-)

diff --git a/doc/libibumad.txt b/doc/libibumad.txt
index 7b2b4f4..4e37e60 100644
--- a/doc/libibumad.txt
+++ b/doc/libibumad.txt
@@ -336,7 +336,7 @@ the given host ordered fields. Return 0 on success, -1 on errors.
 umad_set_pkey:
 
 Synopsis:
-	int	umad_set_pkey(void *umad, int pkey);
+	int	umad_set_pkey(void *umad, int pkey_index);
 
 Description: Set the pkey within the 'umad' buffer.  Return 0 on success,
 -1 on errors.
diff --git a/libibumad/include/infiniband/umad.h b/libibumad/include/infiniband/umad.h
old mode 100644
new mode 100755
index 9020649..9369d95
--- a/libibumad/include/infiniband/umad.h
+++ b/libibumad/include/infiniband/umad.h
@@ -60,6 +60,8 @@ typedef struct ib_mad_addr {
 	uint8_t	 traffic_class;
 	uint8_t	 gid[16];
 	uint32_t flow_label;
+	uint16_t pkey_index;
+	uint8_t  reserved[6];
 } ib_mad_addr_t;
 
 typedef struct ib_user_mad {
@@ -72,7 +74,8 @@ typedef struct ib_user_mad {
 	uint8_t  data[0];
 } ib_user_mad_t;
 
-#define IB_UMAD_ABI_VERSION	5
+#define IB_UMAD_MIN_ABI_VERSION	5
+#define IB_UMAD_MAX_ABI_VERSION	6
 #define IB_UMAD_ABI_DIR		"/sys/class/infiniband_mad"
 #define IB_UMAD_ABI_FILE	"abi_version"
 
@@ -167,7 +170,7 @@ int	umad_set_grh_net(void *umad, void *mad_addr);
 int	umad_set_grh(void *umad, void *mad_addr);
 int	umad_set_addr_net(void *umad, int dlid, int dqp, int sl, int qkey);
 int	umad_set_addr(void *umad, int dlid, int dqp, int sl, int qkey);
-int	umad_set_pkey(void *umad, int pkey);
+int	umad_set_pkey(void *umad, int pkey_index);
 
 int	umad_send(int portid, int agentid, void *umad, int length,
 		  int timeout_ms, int retries);
diff --git a/libibumad/src/umad.c b/libibumad/src/umad.c
old mode 100644
new mode 100755
index 5f9b36b..c750fe0
--- a/libibumad/src/umad.c
+++ b/libibumad/src/umad.c
@@ -69,6 +69,7 @@ int umaddebug = 0;
 #define UMAD_DEV_NAME_SZ	32
 #define UMAD_DEV_FILE_SZ	256
 
+static uint abi_version;
 static char *def_ca_name = "mthca0";
 static int def_ca_port = 1;
 
@@ -82,6 +83,31 @@ typedef struct Port {
 
 static Port ports[UMAD_MAX_PORTS];
 
+typedef struct ib_mad_addr_abi_5 {
+	uint32_t qpn;
+	uint32_t qkey;
+	uint16_t lid;
+	uint8_t	 sl;
+	uint8_t	 path_bits;
+	uint8_t	 grh_present;
+	uint8_t	 gid_index;
+	uint8_t	 hop_limit;
+	uint8_t	 traffic_class;
+	uint8_t	 gid[16];
+	uint32_t flow_label;
+} ib_mad_addr_abi_5_t;
+
+typedef struct ib_user_mad_abi_5 {
+	uint32_t agent_id;
+	uint32_t status;
+	uint32_t timeout_ms;
+	uint32_t retries;
+	uint32_t length;
+	ib_mad_addr_abi_5_t addr;
+	uint8_t  data[0];
+} ib_user_mad_abi_5_t;
+
+
 /*************************************
  * Port
  */
@@ -463,6 +489,101 @@ dev_to_umad_id(char *dev, uint port)
 	return -1;	/* not found */
 }
 
+static int
+write_data(int fd, void *data, int size)
+{
+	int n;
+
+	n = write(fd, data, size);
+	if (n != size) {
+		DEBUG("write returned %d != sizeof mad data %d (%m)", n, size);
+		if (!errno)
+			errno = EIO;
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int
+write_abi_5(int fd, struct ib_user_mad *mad, int length)
+{
+	struct ib_user_mad_abi_5 *umad_5;
+	int n;
+
+	n = sizeof *umad_5 + length;
+	umad_5 = malloc(n);
+	if (!umad_5) {
+		errno = ENOMEM;
+		return -ENOMEM;
+	}
+
+	memcpy(umad_5, mad, sizeof *umad_5);
+	memcpy(umad_5->data, mad->data, length);
+
+	n = write_data(fd, umad_5, n);
+	free(umad_5);
+	return n;
+}
+
+static int
+read_data(int fd, void *data, int size, int *length)
+{
+	struct ib_user_mad *mad = data;
+	int n, umad_size;
+
+	umad_size = size - *length;
+
+	n = read(fd, data, size);
+	if ((n >= 0) && (n <= size)) {
+		DEBUG("mad received by agent %d length %d", mad->agent_id, n);
+		if (n > umad_size)
+			*length = n - umad_size;
+		else
+			*length = 0;
+		return mad->agent_id;
+	}
+
+	if (n == -EWOULDBLOCK) {
+		if (!errno)
+			errno = EWOULDBLOCK;
+		return n;
+	}
+
+	DEBUG("read returned %zu > sizeof mad %zu (%m)",
+	      mad->length - umad_size, *length);
+
+	*length = mad->length - umad_size;
+	if (!errno)
+		errno = EIO;
+	return -errno;
+}
+
+static int
+read_abi_5(int fd, void *umad, int *length)
+{
+	struct ib_user_mad *mad = umad;
+	struct ib_user_mad_abi_5 *umad_5;
+	int n;
+
+	n = sizeof *umad_5 + *length;
+	umad_5 = malloc(n);
+	if (!umad_5) {
+		errno = EINVAL;
+		return -EINVAL;
+	}
+
+	n = read_data(fd, umad_5, n, length);
+	if (n >= 0) {
+		memcpy(mad, umad_5, sizeof *umad_5);
+		mad->addr.pkey_index = 0;
+		memcpy(mad->data, umad_5->data, *length);
+	}
+
+	free(umad_5);
+	return n;
+}
+
 /*******************************
  * Public interface
  */
@@ -470,17 +591,19 @@ dev_to_umad_id(char *dev, uint port)
 int
 umad_init(void)
 {
-	uint abi_version;
-
 	TRACE("umad_init");
 	if (sys_read_uint(IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, &abi_version) < 0) {
 		IBWARN("can't read ABI version from %s/%s (%m): is ib_umad module loaded?",
 			IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE);
 		return -1;
 	}
-	if (abi_version != IB_UMAD_ABI_VERSION) {
-		IBWARN("wrong ABI version: %s/%s is %d but library ABI is %d",
-			IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, abi_version, IB_UMAD_ABI_VERSION);
+
+	if (abi_version < IB_UMAD_MIN_ABI_VERSION ||
+	    abi_version > IB_UMAD_MAX_ABI_VERSION) {
+		IBWARN("wrong ABI version: %s/%s is %d but library ABI "
+			"supports %d through %d",
+			IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, abi_version,
+			IB_UMAD_MIN_ABI_VERSION, IB_UMAD_MAX_ABI_VERSION);
 		return -1;
 	}
 	return 0;
@@ -699,11 +822,16 @@ umad_set_grh(void *umad, void *mad_addr)
 }
 
 int
-umad_set_pkey(void *umad, int pkey)
+umad_set_pkey(void *umad, int pkey_index)
 {
-#if 0
-	mad->addr.pkey = 0;		/* FIXME - PKEY support */
-#endif
+	struct ib_user_mad *mad = umad;
+
+	if (abi_version == 5 && pkey_index != 0) {
+		IBWARN("umad_set_pkey: ABI 5 only supports pkey_index 0\n");
+		return -EINVAL;
+	}
+
+	mad->addr.pkey_index = pkey_index;
 	return 0;
 }
 
@@ -761,15 +889,12 @@ umad_send(int portid, int agentid, void *umad, int length,
 	if (umaddebug > 1)
 		umad_dump(mad);
 
-	n = write(port->dev_fd, mad, length + sizeof *mad);
-	if (n == length + sizeof *mad)
-		return 0;
+	if (abi_version == 5)
+		n = write_abi_5(port->dev_fd, mad, length);
+	else
+		n = write_data(port->dev_fd, mad, sizeof *mad + length);
 
-	DEBUG("write returned %d != sizeof umad %zu + length %d (%m)",
-	      n, sizeof *mad, length);
-	if (!errno)
-		errno = EIO;
-	return -EIO;
+	return n;
 }
 
 static int
@@ -793,7 +918,6 @@ dev_poll(int fd, int timeout_ms)
 int
 umad_recv(int portid, void *umad, int *length, int timeout_ms)
 {
-	struct ib_user_mad *mad = umad;
 	Port *port;
 	int n;
 
@@ -817,29 +941,13 @@ umad_recv(int portid, void *umad, int *length, int timeout_ms)
 		return n;
 	}
 
-	n = read(port->dev_fd, umad, sizeof *mad + *length);
-	if ((n >= 0) && (n <= sizeof *mad + *length)) {
-		DEBUG("mad received by agent %d length %d", mad->agent_id, n);
-		if (n > sizeof *mad)
-			*length = n - sizeof *mad;
-		else
-			*length = 0;
-		return mad->agent_id;
-	}
-
-	if (n == -EWOULDBLOCK) {
-		if (!errno)
-			errno = EWOULDBLOCK;
-		return n;
-	}
-
-	DEBUG("read returned %zu > sizeof umad %zu + length %d (%m)",
-	      mad->length - sizeof *mad, sizeof *mad, *length);
+	if (abi_version == 5)
+		n = read_abi_5(port->dev_fd, umad, length);
+	else
+		n = read_data(port->dev_fd, umad,
+			      sizeof(struct ib_user_mad) + *length, length);
 
-	*length = mad->length - sizeof *mad;
-	if (!errno)
-		errno = EIO;
-	return -errno;
+	return n;
 }
 
 int
@@ -996,10 +1104,10 @@ umad_addr_dump(ib_mad_addr_t *addr)
 	gid_str[i*2] = 0;
 	IBWARN("qpn %d qkey 0x%x lid 0x%x sl %d\n"
 		"grh_present %d gid_index %d hop_limit %d traffic_class %d flow_label 0x%x\n"
-		"Gid 0x%s",
+		"Gid 0x%s pkey_index %d",
 		ntohl(addr->qpn), ntohl(addr->qkey), ntohs(addr->lid), addr->sl,
 		addr->grh_present, (int)addr->gid_index, (int)addr->hop_limit,
-		(int)addr->traffic_class, addr->flow_label, gid_str);
+		(int)addr->traffic_class, addr->flow_label, gid_str, addr->pkey_index);
 }
 
 void


From sean.hefty at intel.com  Fri Jun 15 10:01:05 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 15 Jun 2007 10:01:05 -0700
Subject: [ofa-general] [PATCH 2/2] opensm: use pkey index,
	rather than pkey with libibumad
In-Reply-To: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com>
Message-ID: <000901c7af6e$c2d79480$ff0da8c0@amr.corp.intel.com>

The call to umad_set_pkey expects an index, not a pkey.  Use index 0
for now.

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
This was the one place I found where the pkey was being passed into
umad_set_pkey(). 

 opensm/libvendor/osm_vendor_ibumad.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/opensm/libvendor/osm_vendor_ibumad.c b/opensm/libvendor/osm_vendor_ibumad.c
index ee94203..a10388c 100644
--- a/opensm/libvendor/osm_vendor_ibumad.c
+++ b/opensm/libvendor/osm_vendor_ibumad.c
@@ -1086,7 +1086,8 @@ osm_vendor_send(
 			  p_mad_addr->addr_type.gsi.service_level,
 			  IB_QP1_WELL_KNOWN_Q_KEY);
 	umad_set_grh(p_vw->umad, 0);	/* FIXME: GRH support */
-	umad_set_pkey(p_vw->umad, p_mad_addr->addr_type.gsi.pkey);
+	umad_set_pkey(p_vw->umad, 0);
+		/* FIXME: p_mad_addr->addr_type.gsi.pkey to index */
 	if (ib_class_is_rmpp(p_mad->mgmt_class)) {	/* RMPP GSI classes	FIXME: no GRH */
 		if (!ib_rmpp_is_flag_set((ib_rmpp_mad_t *)p_sa,
 					 IB_RMPP_FLAG_ACTIVE)) {


From halr at voltaire.com  Fri Jun 15 10:29:27 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 15 Jun 2007 13:29:27 -0400
Subject: [ofa-general] Re: [PATCH] osm: bugfix - if fat-tree failed,
	osm should fall back to default routing
In-Reply-To: <4670FB71.5090406@dev.mellanox.co.il>
References: <4670FB71.5090406@dev.mellanox.co.il>
Message-ID: <1181928567.5681.367677.camel@hal.voltaire.com>

Hi Yevgeny,

On Thu, 2007-06-14 at 04:25, Yevgeny Kliteynik wrote:
> Hi Hal,
> 
> When fat-tree fails to populate all the data structures,
> it should return error and let osm fall back to default routing.
> 
> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Thanks. Applied.

-- Hal


From mshefty at ichips.intel.com  Fri Jun 15 11:24:19 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 15 Jun 2007 11:24:19 -0700
Subject: [ofa-general] Re: [PATCH draft,	untested] ehca srq emulation
	(for IPoIB CM)
In-Reply-To: <20070615160709.GK2207@mellanox.co.il>
References: <466F36C8.5010507@linux.vnet.ibm.com>	<20070613163821.GB12277@mellanox.co.il>
	<adafy4v69ig.fsf@cisco.com>	<20070613174930.GE12277@mellanox.co.il>	<46716F3D.7050206@ichips.intel.com>
	<ada1wge4h4l.fsf@cisco.com>	<20070614175030.GB29561@mellanox.co.il>	<4671C541.4040503@linux.vnet.ibm.com>	<20070615051846.GG2207@mellanox.co.il>	<4672B523.50502@ichips.intel.com>
	<20070615160709.GK2207@mellanox.co.il>
Message-ID: <4672D953.3050506@ichips.intel.com>

> Basically, I think that because of lack of SW level flow control,
> generally IPoIB CM without SRQ does not make sense because of
> the scalabilty problems.

Most clusters are only 16-32 nodes.  If IPoIB CM without SRQ can support 
this number of systems and outperforms IPoIB UD mode, then I do believe 
that it makes sense.  IPoIB CM support, with or without SRQ, is less 
scalable than IPoIB UD mode, but it was still added because it provided 
a benefit under most conditions.

- Sean


From friedman at ucla.edu  Fri Jun 15 12:01:04 2007
From: friedman at ucla.edu (Scott A. Friedman)
Date: Fri, 15 Jun 2007 12:01:04 -0700
Subject: [ofa-general] Re: iWarp cxgb3 firmware
In-Reply-To: <20070615162126.5D7D4E60886@openfabrics.org>
References: <20070615162126.5D7D4E60886@openfabrics.org>
Message-ID: <4672E1F0.9010406@ucla.edu>

> Scott A. Friedman wrote:
>> > Hi
>> > 
>> > Is anyone using the cxgb3 module in rc4 or rc5? If so, where are you 
>> > getting the correct firmware that it seems to want (4.2)? Chelsio is 
>> > only distributing v4.1 on their web site. I would like to know since my 
>> > iWarp nodes are currently stuck at rc3, whose cxgb3 needs version 4.0
>> > 
>> > Do these firmware versions make significant changes?
>> > 
> 
> Unfortunately, yes, they do.  -rc4 and beyond requires firwmare version 
> 4.2 to fix some streaming mode->rdma mode connection transition fixes. 
> And the interface between the driver and firmare changed which is why 
> the requirement is there.  -rc4 and beyond _will not_ work with anything 
> less than 4.2  I pushed the changes into -rc4 to get them in before 
> ofed-1.2 ships as this was a critical bug.
> 
> The 4.2 firmware will be available this week from Chelsio.  Contact your 
> chelsio rep to get it.  Perhaps you can get a pre-release version today...
> 

Thanks Steve, this explains a lot - all of my trouble have been 
connection related. The connection stage would either work, or not, or 
hang. I will contact them and try again to get the firmware - or wait...

Scott


From panda at cse.ohio-state.edu  Fri Jun 15 12:01:54 2007
From: panda at cse.ohio-state.edu (Dhabaleswar Panda)
Date: Fri, 15 Jun 2007 15:01:54 -0400 (EDT)
Subject: [ofa-general] Announcing the availability of MVAPICH support for
	QLogic InfiniPath adapters
Message-ID: <200706151901.l5FJ1sMk008862@xi.cse.ohio-state.edu>

The MVAPICH team is pleased to announce the availability of MVAPICH
native support for QLogic InfiniPath adapters.

Sample performance numbers include:

  - Opteron single-core with HT and InfiniPath-SDR:
        - 1.26 microsec one-way latency (4 bytes)
        - 953 MB/sec unidirectional bandwidth
        - 1889 MB/sec bidirectional bandwidth
 
  - EM64T quad-core with PCIe and InfiniPath-SDR:
        - 1.91 microsec one-way latency (4 bytes)
        - 957 MB/sec unidirectional bandwidth 
        - 1565 MB/sec bidirectional bandwidth  

More detailed performance numbers can be viewed by visiting
`Performance' section of the project's web page.

For downloading this new support and accessing the anonymous SVN,
please visit the following URL:

http://mvapich.cse.ohio-state.edu/

Please post your feedback to mvapich-discuss mailing list.

Thanks, 

MVAPICH Team 

======================================================================
MVAPICH/MVAPICH2 project is currently supported with funding from
U.S. National Science Foundation, U.S. DOE Office of Science,
Mellanox, Intel, Cisco Systems, QLogic, Sun Microsystems and Linux
Networx; and with equipment support from Advanced Clustering, AMD,
Apple, Appro, Chelsio, Dell, Fujitsu, Fulcrum, IBM, Intel, Mellanox,
Microway, NetEffect, QLogic and Sun Microsystems. Other technology
partner includes Etnus.
======================================================================


From halr at voltaire.com  Fri Jun 15 13:01:37 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 15 Jun 2007 16:01:37 -0400
Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support
In-Reply-To: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com>
References: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com>
Message-ID: <1181937695.5681.377979.camel@hal.voltaire.com>

On Fri, 2007-06-15 at 12:59, Sean Hefty wrote:
> Allow sending MADs on different partitions.  This requires kernel support,
> so requires an ABI bump.  This patch maintains support for the previous
> ABI.

Looks good. A few minor questions/comments embedded below.

> Clarify that umad_set_pkey() takes a pkey index, and not the pkey itself.
> (Unfortunately, the call is used both ways in the management tree.)

Sigh... and opensm (actually libvendor) is the one which uses this
incorrectly. I'm worried about existing OpenSM compatibility with the
new libibumad when ABI 6 is in effect. I think the long standing ABI 5
should be fine, right ?

> Signed-off-by: Sean Hefty <sean.hefty at intel.com>
> ---
> Additional changes are needed to retrieve the PKey and GID tables, so that
> the PKeys and GIDs can be converted to the correct index.  These will come
> in future patches.
> 
> 
>  doc/libibumad.txt                   |    2 
>  libibumad/include/infiniband/umad.h |    7 +
>  libibumad/src/umad.c                |  192 +++++++++++++++++++++++++++--------
>  3 files changed, 156 insertions(+), 45 deletions(-)
> 
> diff --git a/doc/libibumad.txt b/doc/libibumad.txt
> index 7b2b4f4..4e37e60 100644
> --- a/doc/libibumad.txt
> +++ b/doc/libibumad.txt
> @@ -336,7 +336,7 @@ the given host ordered fields. Return 0 on success, -1 on errors.
>  umad_set_pkey:
>  
>  Synopsis:
> -	int	umad_set_pkey(void *umad, int pkey);
> +	int	umad_set_pkey(void *umad, int pkey_index);
>  
>  Description: Set the pkey within the 'umad' buffer.  Return 0 on success,
>  -1 on errors.
> diff --git a/libibumad/include/infiniband/umad.h b/libibumad/include/infiniband/umad.h
> old mode 100644
> new mode 100755
> index 9020649..9369d95
> --- a/libibumad/include/infiniband/umad.h
> +++ b/libibumad/include/infiniband/umad.h
> @@ -60,6 +60,8 @@ typedef struct ib_mad_addr {
>  	uint8_t	 traffic_class;
>  	uint8_t	 gid[16];
>  	uint32_t flow_label;
> +	uint16_t pkey_index;
> +	uint8_t  reserved[6];
>  } ib_mad_addr_t;
>  
>  typedef struct ib_user_mad {
> @@ -72,7 +74,8 @@ typedef struct ib_user_mad {
>  	uint8_t  data[0];
>  } ib_user_mad_t;
>  
> -#define IB_UMAD_ABI_VERSION	5
> +#define IB_UMAD_MIN_ABI_VERSION	5
> +#define IB_UMAD_MAX_ABI_VERSION	6
>  #define IB_UMAD_ABI_DIR		"/sys/class/infiniband_mad"
>  #define IB_UMAD_ABI_FILE	"abi_version"
>  
> @@ -167,7 +170,7 @@ int	umad_set_grh_net(void *umad, void *mad_addr);
>  int	umad_set_grh(void *umad, void *mad_addr);
>  int	umad_set_addr_net(void *umad, int dlid, int dqp, int sl, int qkey);
>  int	umad_set_addr(void *umad, int dlid, int dqp, int sl, int qkey);
> -int	umad_set_pkey(void *umad, int pkey);
> +int	umad_set_pkey(void *umad, int pkey_index);
>  
>  int	umad_send(int portid, int agentid, void *umad, int length,
>  		  int timeout_ms, int retries);
> diff --git a/libibumad/src/umad.c b/libibumad/src/umad.c
> old mode 100644
> new mode 100755

Why the mode change ?

> index 5f9b36b..c750fe0
> --- a/libibumad/src/umad.c
> +++ b/libibumad/src/umad.c
> @@ -69,6 +69,7 @@ int umaddebug = 0;
>  #define UMAD_DEV_NAME_SZ	32
>  #define UMAD_DEV_FILE_SZ	256
>  
> +static uint abi_version;
>  static char *def_ca_name = "mthca0";
>  static int def_ca_port = 1;
>  
> @@ -82,6 +83,31 @@ typedef struct Port {
>  
>  static Port ports[UMAD_MAX_PORTS];
>  
> +typedef struct ib_mad_addr_abi_5 {
> +	uint32_t qpn;
> +	uint32_t qkey;
> +	uint16_t lid;
> +	uint8_t	 sl;
> +	uint8_t	 path_bits;
> +	uint8_t	 grh_present;
> +	uint8_t	 gid_index;
> +	uint8_t	 hop_limit;
> +	uint8_t	 traffic_class;
> +	uint8_t	 gid[16];
> +	uint32_t flow_label;
> +} ib_mad_addr_abi_5_t;
> +
> +typedef struct ib_user_mad_abi_5 {
> +	uint32_t agent_id;
> +	uint32_t status;
> +	uint32_t timeout_ms;
> +	uint32_t retries;
> +	uint32_t length;
> +	ib_mad_addr_abi_5_t addr;
> +	uint8_t  data[0];
> +} ib_user_mad_abi_5_t;
> +
> +
>  /*************************************
>   * Port
>   */
> @@ -463,6 +489,101 @@ dev_to_umad_id(char *dev, uint port)
>  	return -1;	/* not found */
>  }
>  
> +static int
> +write_data(int fd, void *data, int size)
> +{
> +	int n;
> +
> +	n = write(fd, data, size);
> +	if (n != size) {
> +		DEBUG("write returned %d != sizeof mad data %d (%m)", n, size);

Is this really the sizeof the mad data ?

> +		if (!errno)
> +			errno = EIO;
> +		return -EIO;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +write_abi_5(int fd, struct ib_user_mad *mad, int length)
> +{
> +	struct ib_user_mad_abi_5 *umad_5;
> +	int n;
> +
> +	n = sizeof *umad_5 + length;
> +	umad_5 = malloc(n);
> +	if (!umad_5) {
> +		errno = ENOMEM;
> +		return -ENOMEM;
> +	}
> +
> +	memcpy(umad_5, mad, sizeof *umad_5);
> +	memcpy(umad_5->data, mad->data, length);
> +
> +	n = write_data(fd, umad_5, n);
> +	free(umad_5);
> +	return n;
> +}
> +
> +static int
> +read_data(int fd, void *data, int size, int *length)
> +{
> +	struct ib_user_mad *mad = data;
> +	int n, umad_size;
> +
> +	umad_size = size - *length;
> +
> +	n = read(fd, data, size);
> +	if ((n >= 0) && (n <= size)) {
> +		DEBUG("mad received by agent %d length %d", mad->agent_id, n);
> +		if (n > umad_size)
> +			*length = n - umad_size;
> +		else
> +			*length = 0;
> +		return mad->agent_id;
> +	}
> +
> +	if (n == -EWOULDBLOCK) {
> +		if (!errno)
> +			errno = EWOULDBLOCK;
> +		return n;
> +	}
> +
> +	DEBUG("read returned %zu > sizeof mad %zu (%m)",
> +	      mad->length - umad_size, *length);
> +
> +	*length = mad->length - umad_size;
> +	if (!errno)
> +		errno = EIO;
> +	return -errno;
> +}
> +
> +static int
> +read_abi_5(int fd, void *umad, int *length)
> +{
> +	struct ib_user_mad *mad = umad;
> +	struct ib_user_mad_abi_5 *umad_5;
> +	int n;
> +
> +	n = sizeof *umad_5 + *length;
> +	umad_5 = malloc(n);
> +	if (!umad_5) {
> +		errno = EINVAL;
> +		return -EINVAL;
> +	}
> +
> +	n = read_data(fd, umad_5, n, length);
> +	if (n >= 0) {
> +		memcpy(mad, umad_5, sizeof *umad_5);
> +		mad->addr.pkey_index = 0;
> +		memcpy(mad->data, umad_5->data, *length);
> +	}
> +
> +	free(umad_5);
> +	return n;
> +}
> +
>  /*******************************
>   * Public interface
>   */
> @@ -470,17 +591,19 @@ dev_to_umad_id(char *dev, uint port)
>  int
>  umad_init(void)
>  {
> -	uint abi_version;
> -
>  	TRACE("umad_init");
>  	if (sys_read_uint(IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, &abi_version) < 0) {
>  		IBWARN("can't read ABI version from %s/%s (%m): is ib_umad module loaded?",
>  			IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE);
>  		return -1;
>  	}
> -	if (abi_version != IB_UMAD_ABI_VERSION) {
> -		IBWARN("wrong ABI version: %s/%s is %d but library ABI is %d",
> -			IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, abi_version, IB_UMAD_ABI_VERSION);
> +
> +	if (abi_version < IB_UMAD_MIN_ABI_VERSION ||
> +	    abi_version > IB_UMAD_MAX_ABI_VERSION) {
> +		IBWARN("wrong ABI version: %s/%s is %d but library ABI "
> +			"supports %d through %d",
> +			IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, abi_version,
> +			IB_UMAD_MIN_ABI_VERSION, IB_UMAD_MAX_ABI_VERSION);
>  		return -1;
>  	}
>  	return 0;
> @@ -699,11 +822,16 @@ umad_set_grh(void *umad, void *mad_addr)
>  }
>  
>  int
> -umad_set_pkey(void *umad, int pkey)
> +umad_set_pkey(void *umad, int pkey_index)
>  {
> -#if 0
> -	mad->addr.pkey = 0;		/* FIXME - PKEY support */
> -#endif
> +	struct ib_user_mad *mad = umad;
> +
> +	if (abi_version == 5 && pkey_index != 0) {
> +		IBWARN("umad_set_pkey: ABI 5 only supports pkey_index 0\n");
> +		return -EINVAL;
> +	}
> +
> +	mad->addr.pkey_index = pkey_index;
>  	return 0;
>  }
>  
> @@ -761,15 +889,12 @@ umad_send(int portid, int agentid, void *umad, int length,
>  	if (umaddebug > 1)
>  		umad_dump(mad);
>  
> -	n = write(port->dev_fd, mad, length + sizeof *mad);
> -	if (n == length + sizeof *mad)
> -		return 0;
> +	if (abi_version == 5)
> +		n = write_abi_5(port->dev_fd, mad, length);
> +	else
> +		n = write_data(port->dev_fd, mad, sizeof *mad + length);
>  
> -	DEBUG("write returned %d != sizeof umad %zu + length %d (%m)",
> -	      n, sizeof *mad, length);
> -	if (!errno)
> -		errno = EIO;
> -	return -EIO;
> +	return n;
>  }
>  
>  static int
> @@ -793,7 +918,6 @@ dev_poll(int fd, int timeout_ms)
>  int
>  umad_recv(int portid, void *umad, int *length, int timeout_ms)
>  {
> -	struct ib_user_mad *mad = umad;
>  	Port *port;
>  	int n;
>  
> @@ -817,29 +941,13 @@ umad_recv(int portid, void *umad, int *length, int timeout_ms)
>  		return n;
>  	}
>  
> -	n = read(port->dev_fd, umad, sizeof *mad + *length);
> -	if ((n >= 0) && (n <= sizeof *mad + *length)) {
> -		DEBUG("mad received by agent %d length %d", mad->agent_id, n);
> -		if (n > sizeof *mad)
> -			*length = n - sizeof *mad;
> -		else
> -			*length = 0;
> -		return mad->agent_id;
> -	}
> -
> -	if (n == -EWOULDBLOCK) {
> -		if (!errno)
> -			errno = EWOULDBLOCK;
> -		return n;
> -	}
> -
> -	DEBUG("read returned %zu > sizeof umad %zu + length %d (%m)",
> -	      mad->length - sizeof *mad, sizeof *mad, *length);
> +	if (abi_version == 5)
> +		n = read_abi_5(port->dev_fd, umad, length);
> +	else
> +		n = read_data(port->dev_fd, umad,
> +			      sizeof(struct ib_user_mad) + *length, length);
>  
> -	*length = mad->length - sizeof *mad;
> -	if (!errno)
> -		errno = EIO;
> -	return -errno;
> +	return n;
>  }
>  
>  int
> @@ -996,10 +1104,10 @@ umad_addr_dump(ib_mad_addr_t *addr)
>  	gid_str[i*2] = 0;
>  	IBWARN("qpn %d qkey 0x%x lid 0x%x sl %d\n"
>  		"grh_present %d gid_index %d hop_limit %d traffic_class %d flow_label 0x%x\n"
> -		"Gid 0x%s",
> +		"Gid 0x%s pkey_index %d",
>  		ntohl(addr->qpn), ntohl(addr->qkey), ntohs(addr->lid), addr->sl,
>  		addr->grh_present, (int)addr->gid_index, (int)addr->hop_limit,
> -		(int)addr->traffic_class, addr->flow_label, gid_str);
> +		(int)addr->traffic_class, addr->flow_label, gid_str, addr->pkey_index);
>  }
>  
>  void
> 


From halr at voltaire.com  Fri Jun 15 13:39:20 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 15 Jun 2007 16:39:20 -0400
Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node
	guid files options for fat-tree
In-Reply-To: <20070614134519.GD5908@sashak.voltaire.com>
References: <4670FA2D.7070708@dev.mellanox.co.il>
	<20070614121501.GC5908@sashak.voltaire.com>
	<4671363F.6060600@dev.mellanox.co.il>
	<20070614134519.GD5908@sashak.voltaire.com>
Message-ID: <1181939959.5681.380508.camel@hal.voltaire.com>

On Thu, 2007-06-14 at 09:45, Sasha Khapyorsky wrote:
> On 15:36 Thu 14 Jun     , Yevgeny Kliteynik wrote:
> >  Sasha Khapyorsky wrote:
> > > Hi Yevgeny,
> > > On 11:19 Thu 14 Jun     , Yevgeny Kliteynik wrote:
> > >>  The following three patches are adding root and compute node guid files
> > >>  options for fat-tree routing,
> > > Is there any reason to not share root guids file option with up/down?
> > 
> >  There are two new options for fat-tree: roots and compute nodes (CN).
> >  These two will be very "tightly coupled" and would have more implication
> >  on the routing than in case of up/dn roots. For instance, having root
> >  file but not CN file means that the topology doesn't have to be pure 
> >  fat-tree,
> >  but all the CAs are considered CNs and have to be on the same level of the 
> >  tree.
> >  And there is similar implication of all the combinations of these two 
> >  options.
> > 
> >  Because of this coupling I wanted to differentiate these two options from
> >  the up/dn roots.
> > 
> >  Thoughts?
> 
> I still not have strong option about two options against common one.

Me neither.

> Hypothetically if in some days we will implement routing engine chains
> (so failed algo will fallback to next in chain and not just to default)
> separate options could be useful.

So is this a(nother) reason to keep the roots separate or would that be
dealt with when the routing fallback strategy changes ?

-- Hal

> > > Also the way how root guids are handled (in both up/down and ftree)
> > > doesn't look very optimal - guids are loaded to dynamic list, the list
> > > is converted to map, this map is matched and root nodes are marked as
> > > roots. Isn't it would be easy just to mark root nodes during file parsing?
> > 
> >  The only thing you can save here is converting list to map:
> 
> I don't think the root guids map is needed - you can just set is_root
> field for sw nodes by guid(s) specified in the file, since you already
> have sw by guid map.
> 
> >  You have to parse the guids file anyway, and you have to build all the
> >  fat-tree data structures anyway. So if you parse the file and fill the
> >  map right away instead of filling the list first, you will save the list2map 
> >  conversion.
> >  But then up/dn and fat-tree can't use the same function to parse the guid 
> >  file,
> >  and since the list2map conversion is not a big deal (we're talking about 
> >  list
> >  of roots, which is couple of hundreds of guids at max), I prefer to leave it
> >  and not to use separate parsing functions for up/dn and fat-tree.
> 
> You can pass custom callback to common parser.
> 
> >  BTW, since we're on this subject, how about removing the list2array 
> >  conversion
> >  in the same place in up/dn routing?
> 
> Sure, similar junk should be cleaned up in up/down too (and my original
> complain was about both root guids users).
> 
> Sasha


From sashak at voltaire.com  Fri Jun 15 13:59:58 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Fri, 15 Jun 2007 23:59:58 +0300
Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node
	guid files options for fat-tree
In-Reply-To: <1181939959.5681.380508.camel@hal.voltaire.com>
References: <4670FA2D.7070708@dev.mellanox.co.il>
	<20070614121501.GC5908@sashak.voltaire.com>
	<4671363F.6060600@dev.mellanox.co.il>
	<20070614134519.GD5908@sashak.voltaire.com>
	<1181939959.5681.380508.camel@hal.voltaire.com>
Message-ID: <20070615205958.GB10766@sashak.voltaire.com>

On 16:39 Fri 15 Jun     , Hal Rosenstock wrote:
> On Thu, 2007-06-14 at 09:45, Sasha Khapyorsky wrote:
> > On 15:36 Thu 14 Jun     , Yevgeny Kliteynik wrote:
> > >  Sasha Khapyorsky wrote:
> > > > Hi Yevgeny,
> > > > On 11:19 Thu 14 Jun     , Yevgeny Kliteynik wrote:
> > > >>  The following three patches are adding root and compute node guid files
> > > >>  options for fat-tree routing,
> > > > Is there any reason to not share root guids file option with up/down?
> > > 
> > >  There are two new options for fat-tree: roots and compute nodes (CN).
> > >  These two will be very "tightly coupled" and would have more implication
> > >  on the routing than in case of up/dn roots. For instance, having root
> > >  file but not CN file means that the topology doesn't have to be pure 
> > >  fat-tree,
> > >  but all the CAs are considered CNs and have to be on the same level of the 
> > >  tree.
> > >  And there is similar implication of all the combinations of these two 
> > >  options.
> > > 
> > >  Because of this coupling I wanted to differentiate these two options from
> > >  the up/dn roots.
> > > 
> > >  Thoughts?
> > 
> > I still not have strong option about two options against common one.
> 
> Me neither.
> 
> > Hypothetically if in some days we will implement routing engine chains
> > (so failed algo will fallback to next in chain and not just to default)
> > separate options could be useful.
> 
> So is this a(nother) reason to keep the roots separate or would that be
> dealt with when the routing fallback strategy changes ?

It is yet hypothetical. Currently I don't see a strong practical reasons
to have two separate root guids file options for up/down and fat-tree,
but guess this is minor and not showstopper.

Sasha


From halr at voltaire.com  Fri Jun 15 13:57:20 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 15 Jun 2007 16:57:20 -0400
Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node
	guid files options for fat-tree
In-Reply-To: <20070615205958.GB10766@sashak.voltaire.com>
References: <4670FA2D.7070708@dev.mellanox.co.il>
	<20070614121501.GC5908@sashak.voltaire.com>
	<4671363F.6060600@dev.mellanox.co.il>
	<20070614134519.GD5908@sashak.voltaire.com>
	<1181939959.5681.380508.camel@hal.voltaire.com>
	<20070615205958.GB10766@sashak.voltaire.com>
Message-ID: <1181941040.5681.381698.camel@hal.voltaire.com>

On Fri, 2007-06-15 at 16:59, Sasha Khapyorsky wrote:
> On 16:39 Fri 15 Jun     , Hal Rosenstock wrote:
> > On Thu, 2007-06-14 at 09:45, Sasha Khapyorsky wrote:
> > > On 15:36 Thu 14 Jun     , Yevgeny Kliteynik wrote:
> > > >  Sasha Khapyorsky wrote:
> > > > > Hi Yevgeny,
> > > > > On 11:19 Thu 14 Jun     , Yevgeny Kliteynik wrote:
> > > > >>  The following three patches are adding root and compute node guid files
> > > > >>  options for fat-tree routing,
> > > > > Is there any reason to not share root guids file option with up/down?
> > > > 
> > > >  There are two new options for fat-tree: roots and compute nodes (CN).
> > > >  These two will be very "tightly coupled" and would have more implication
> > > >  on the routing than in case of up/dn roots. For instance, having root
> > > >  file but not CN file means that the topology doesn't have to be pure 
> > > >  fat-tree,
> > > >  but all the CAs are considered CNs and have to be on the same level of the 
> > > >  tree.
> > > >  And there is similar implication of all the combinations of these two 
> > > >  options.
> > > > 
> > > >  Because of this coupling I wanted to differentiate these two options from
> > > >  the up/dn roots.
> > > > 
> > > >  Thoughts?
> > > 
> > > I still not have strong option about two options against common one.
> > 
> > Me neither.
> > 
> > > Hypothetically if in some days we will implement routing engine chains
> > > (so failed algo will fallback to next in chain and not just to default)
> > > separate options could be useful.
> > 
> > So is this a(nother) reason to keep the roots separate or would that be
> > dealt with when the routing fallback strategy changes ?
> 
> It is yet hypothetical. Currently I don't see a strong practical reasons
> to have two separate root guids file options for up/down and fat-tree,
> but guess this is minor and not showstopper.

Wouldn't a current practical reason be switching between up/down and fat
tree and they each have different roots ? Is that a real scenario ?

-- Hal

> Sasha


From wombat2 at us.ibm.com  Fri Jun 15 14:04:16 2007
From: wombat2 at us.ibm.com (Bernard King-Smith)
Date: Fri, 15 Jun 2007 17:04:16 -0400
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <20070615190004.4BCB8E6086F@openfabrics.org>
Message-ID: <OF5651D8E8.16899A82-ON852572FB.0070D903-852572FB.0073DCB4@us.ibm.com>

"Sean Hefty" <sean.hefty at intel.com> wrote on 06/15/2007 03:00:04 PM:

> 
> > Basically, I think that because of lack of SW level flow control,
> > generally IPoIB CM without SRQ does not make sense because of
> > the scalability problems.
> 
> Most clusters are only 16-32 nodes.  If IPoIB CM without SRQ can support 

> this number of systems and outperforms IPoIB UD mode, then I do believe 
> that it makes sense.  IPoIB CM support, with or without SRQ, is less 
> scalable than IPoIB UD mode, but it was still added because it provided 
> a benefit under most conditions.

I think Pradeep has been making this very clear all along and that scaling 
is a restriction we can make. Since SRQ is not a required part of the 
spec, then having support for non-SRQ in the IPoIB-CM driver supports the 
minimal requirements. I think it is typical that any driver that supports 
enhancements from a basic spec has exception handling for both cases ( 
base and enhanced ) in the layer in question (ipoib). Putting it in the 
device driver splits the non-SRQ IPoIB support to two layers which is not 
a good idea.

We are already running with the non-SRQ patch here and the results are 
very good. Changing to a different approach is not the right thing to do 
at this time. Emulating in the device driver will only increase the amount 
of work everyone will have to do to get this out, and runs the risk of 
uncovering more complex problems.

Can we close on the last few issues and get this lined up for OFED 1.3?

> 
> - Sean

Regards.

Bernie King-Smith 
IBM Corporation
Server Group
Cluster System Performance 
wombat2 at us.ibm.com    (845)433-8483
Tie. 293-8483 or wombat2 on NOTES 

"We are not responsible for the world we are born into, only for the world 
we leave when we die.
So we have to accept what has gone before us and work to change the only 
thing we can,
-- The Future." William Shatner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070615/e6c878b4/attachment.html>

From mshefty at ichips.intel.com  Fri Jun 15 14:36:21 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 15 Jun 2007 14:36:21 -0700
Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support
In-Reply-To: <1181937695.5681.377979.camel@hal.voltaire.com>
References: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com>
	<1181937695.5681.377979.camel@hal.voltaire.com>
Message-ID: <46730655.7020808@ichips.intel.com>

> Sigh... and opensm (actually libvendor) is the one which uses this
> incorrectly. I'm worried about existing OpenSM compatibility with the
> new libibumad when ABI 6 is in effect. I think the long standing ABI 5
> should be fine, right ?

ABI 5 should be fine, since the pkey isn't actually passed to the 
kernel.  ABI 6 would pass down the wrong index.  I do print a warning if 
umad_set_pkey() is called with an index != 0, but we can remove that.

>> old mode 100644
>> new mode 100755
> 
> Why the mode change ?

This is just my editor being dumb, and me forgetting to tell git to 
ignore mode changes.

>> +	n = write(fd, data, size);
>> +	if (n != size) {
>> +		DEBUG("write returned %d != sizeof mad data %d (%m)", n, size);
> 
> Is this really the sizeof the mad data ?

This is not the size of the data field in a MAD.  It's the size of the 
write = sizeof ib_user_mad + MAD data length.  I can change the comment 
to clarify.

- Sean


From jwong at datallegro.com  Fri Jun 15 14:49:38 2007
From: jwong at datallegro.com (Jeffrey Wong)
Date: Fri, 15 Jun 2007 17:49:38 -0400
Subject: [ofa-general] Trouble installing OFED 1.2-rc5,
	kernel SUSE 10.2.6.21-5 default x86_64
Message-ID: <A382D4292574EB47A85B8159A6AED1A18305C2@FPNYEXCBE02.opus-i.corp>

Hello,
I'm getting the following error when trying to install OFED 1.2-rc5:

configure: error: libpci not found.

Failed to execute: cd /var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/tvflash && env ac_cv_lib_ibverbs_ibv_get_device_list=yes ac_cv_he
ader_infiniband_driver_h=yes ac_cv_func_ibv_read_sysfs_file=yes ac_cv_func_ibv_dontfork_range=yes ac_cv_func_ibv_dofork_range=yes ac_cv_f
unc_ibv_register_driver=yes HAVE_IBV_DEVICE_LIBRARY_EXTENSION_TRUE=yes  ./configure --cache-file=/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/conf
igure.cache --disable-libcheck --prefix /usr --libdir /usr/lib64 --mandir=/usr/share/man --sysconfdir=/etc CPPFLAGS="-I../libibverbs/incl
ude"
error: Bad exit status from /var/tmp/rpm-tmp.30970 (%install)


RPM build errors:
    user vlad does not exist - using root
    group vlad does not exist - using root
    user vlad does not exist - using root
    group vlad does not exist - using root
    Bad exit status from /var/tmp/rpm-tmp.30970 (%install)
ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFE
D' --define 'configure_options --with-dapl --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-lib
ibumad --with-libibverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-sdpnetstat --with-srptoo
ls --with-mstflint --with-perftest --with-tvflash --sysconfdir=/etc --mandir=/usr/share/man' --define 'configure_options32 %{nil} --sysco
nfdir=/etc --mandir=/usr/share/man' --define 'build_32bit 0' --define '_mandir /usr/share/man' /root/OFED-1.2-rc5/SRPMS/ofa_user-1.2-rc5.
src.rpm"


Thanks in advance,

Jeff 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070615/7ffad179/attachment.html>

From sweitzen at cisco.com  Fri Jun 15 14:50:54 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Fri, 15 Jun 2007 14:50:54 -0700
Subject: [ofa-general] Trouble installing OFED 1.2-rc5,
	kernel SUSE 10.2.6.21-5 default x86_64
In-Reply-To: <A382D4292574EB47A85B8159A6AED1A18305C2@FPNYEXCBE02.opus-i.corp>
References: <A382D4292574EB47A85B8159A6AED1A18305C2@FPNYEXCBE02.opus-i.corp>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA303B0E79F@xmb-sjc-216.amer.cisco.com>

https://bugs.openfabrics.org/show_bug.cgi?id=558
<https://bugs.openfabrics.org/show_bug.cgi?id=558> , has been fixed
since rc5.


________________________________

	From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of Jeffrey Wong
	Sent: Friday, June 15, 2007 2:50 PM
	To: general at lists.openfabrics.org
	Subject: [ofa-general] Trouble installing OFED 1.2-rc5,kernel
SUSE 10.2.6.21-5 default x86_64
	
	
	Hello,
	I'm getting the following error when trying to install OFED
1.2-rc5:
	
	configure: error: libpci not found.
	
	Failed to execute: cd
/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/tvflash && env
ac_cv_lib_ibverbs_ibv_get_device_list=yes ac_cv_he
	ader_infiniband_driver_h=yes ac_cv_func_ibv_read_sysfs_file=yes
ac_cv_func_ibv_dontfork_range=yes ac_cv_func_ibv_dofork_range=yes
ac_cv_f
	unc_ibv_register_driver=yes
HAVE_IBV_DEVICE_LIBRARY_EXTENSION_TRUE=yes  ./configure
--cache-file=/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/conf
	igure.cache --disable-libcheck --prefix /usr --libdir /usr/lib64
--mandir=/usr/share/man --sysconfdir=/etc CPPFLAGS="-I../libibverbs/incl
	ude"
	error: Bad exit status from /var/tmp/rpm-tmp.30970 (%install)
	
	
	RPM build errors:
	    user vlad does not exist - using root
	    group vlad does not exist - using root
	    user vlad does not exist - using root
	    group vlad does not exist - using root
	    Bad exit status from /var/tmp/rpm-tmp.30970 (%install)
	ERROR: Failed executing "rpmbuild --rebuild --define '_topdir
/var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root
/var/tmp/OFE
	D' --define 'configure_options --with-dapl --with-ipoibtools
--with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad
--with-lib
	ibumad --with-libibverbs --with-libmthca --with-opensm
--with-librdmacm --with-libsdp --with-openib-diags --with-sdpnetstat
--with-srptoo
	ls --with-mstflint --with-perftest --with-tvflash
--sysconfdir=/etc --mandir=/usr/share/man' --define 'configure_options32
%{nil} --sysco
	nfdir=/etc --mandir=/usr/share/man' --define 'build_32bit 0'
--define '_mandir /usr/share/man'
/root/OFED-1.2-rc5/SRPMS/ofa_user-1.2-rc5.
	src.rpm"
	
	
	Thanks in advance,
	
	Jeff 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070615/e7be93d1/attachment.html>

From halr at voltaire.com  Fri Jun 15 15:00:15 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 15 Jun 2007 18:00:15 -0400
Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support
In-Reply-To: <46730655.7020808@ichips.intel.com>
References: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com>
	<1181937695.5681.377979.camel@hal.voltaire.com>
	<46730655.7020808@ichips.intel.com>
Message-ID: <1181944814.5681.385918.camel@hal.voltaire.com>

On Fri, 2007-06-15 at 17:36, Sean Hefty wrote:
> > Sigh... and opensm (actually libvendor) is the one which uses this
> > incorrectly. I'm worried about existing OpenSM compatibility with the
> > new libibumad when ABI 6 is in effect. I think the long standing ABI 5
> > should be fine, right ?
> 
> ABI 5 should be fine, since the pkey isn't actually passed to the 
> kernel.  ABI 6 would pass down the wrong index.

Right.

> I do print a warning if 
> umad_set_pkey() is called with an index != 0, but we can remove that.

How about if abi_version == 5, setting pkey_index to 0 regardless of
what is set ? Isn't that all that ABI v5 really supports ?

> >> old mode 100644
> >> new mode 100755
> > 
> > Why the mode change ?
> 
> This is just my editor being dumb, and me forgetting to tell git to 
> ignore mode changes.

OK.

> >> +	n = write(fd, data, size);
> >> +	if (n != size) {
> >> +		DEBUG("write returned %d != sizeof mad data %d (%m)", n, size);
> > 
> > Is this really the sizeof the mad data ?
> 
> This is not the size of the data field in a MAD.  It's the size of the 
> write = sizeof ib_user_mad + MAD data length.  I can change the comment 
> to clarify.

Thanks.

-- Hal

> - Sean


From jwong at datallegro.com  Fri Jun 15 14:59:23 2007
From: jwong at datallegro.com (Jeffrey Wong)
Date: Fri, 15 Jun 2007 17:59:23 -0400
Subject: [ofa-general] Trouble installing OFED 1.2-rc5,
	kernel SUSE 10.2.6.21-5 default x86_64
References: <A382D4292574EB47A85B8159A6AED1A18305C2@FPNYEXCBE02.opus-i.corp>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303B0E79F@xmb-sjc-216.amer.cisco.com>
Message-ID: <A382D4292574EB47A85B8159A6AED1A18305C3@FPNYEXCBE02.opus-i.corp>

Thanks I've downloaded the latest build.  Now I get a different error:

Make ipoibtools started
make -C src/userspace/ipoibtools
make[1]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools'
gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -g -include include-glibc/glibc-bugs.h -I/lib/modules/2.6.21.5-default/build/include     arping.c  -lresolv -o arping
gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -g -include include-glibc/glibc-bugs.h -I/lib/modules/2.6.21.5-default/build/include     mcasthandle.c  -lresolv -o mcasthandle
make[1]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools'
make -C src/userspace/ipoibtools/iproute2 ip
make[1]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools/iproute2'
make -w -C lib
make[2]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools/iproute2/lib'
gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -I../include -DRESOLVE_HOSTNAMES   -c -o ll_map.o ll_map.c
gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -I../include -DRESOLVE_HOSTNAMES   -c -o libnetlink.o libnetlink.c
ar rcs libnetlink.a ll_map.o libnetlink.o
gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -I../include -DRESOLVE_HOSTNAMES   -c -o utils.o utils.c
utils.c: In function inet_addr_match:
utils.c:333: warning: initialization discards qualifiers from pointer target type
utils.c:334: warning: initialization discards qualifiers from pointer target type
utils.c: In function __get_hz:
utils.c:368: error: HZ undeclared (first use in this function)
utils.c:368: error: (Each undeclared identifier is reported only once
utils.c:368: error: for each function it appears in.)
make[2]: *** [utils.o] Error 1
make[2]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools/iproute2/lib'
make[1]: *** [lib] Error 2
make[1]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools/iproute2'
make: *** [ipoibtools] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.18693 (%install)


RPM build errors:
    user vlad does not exist - using root
    group vlad does not exist - using root
    user vlad does not exist - using root
    group vlad does not exist - using root
    Bad exit status from /var/tmp/rpm-tmp.18693 (%install)
ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-dapl --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-sdpnetstat --with-srptools --with-mstflint --with-perftest --with-tvflash --sysconfdir=/etc --mandir=/usr/share/man' --define 'configure_options32 %{nil} --sysconfdir=/etc --mandir=/usr/share/man' --define 'build_32bit 0' --define '_mandir /usr/share/man' /root/OFED-1.2-20070615-0600/SRPMS/ofa_user-1.2-rc5.src.rpm"


-----Original Message-----
From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com]
Sent: Fri 6/15/2007 5:50 PM
To: Jeffrey Wong; general at lists.openfabrics.org
Subject: RE: [ofa-general] Trouble installing OFED 1.2-rc5,kernel SUSE 10.2.6.21-5 default x86_64
 
https://bugs.openfabrics.org/show_bug.cgi?id=558
<https://bugs.openfabrics.org/show_bug.cgi?id=558> , has been fixed
since rc5.


________________________________

	From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of Jeffrey Wong
	Sent: Friday, June 15, 2007 2:50 PM
	To: general at lists.openfabrics.org
	Subject: [ofa-general] Trouble installing OFED 1.2-rc5,kernel
SUSE 10.2.6.21-5 default x86_64
	
	
	Hello,
	I'm getting the following error when trying to install OFED
1.2-rc5:
	
	configure: error: libpci not found.
	
	Failed to execute: cd
/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/tvflash && env
ac_cv_lib_ibverbs_ibv_get_device_list=yes ac_cv_he
	ader_infiniband_driver_h=yes ac_cv_func_ibv_read_sysfs_file=yes
ac_cv_func_ibv_dontfork_range=yes ac_cv_func_ibv_dofork_range=yes
ac_cv_f
	unc_ibv_register_driver=yes
HAVE_IBV_DEVICE_LIBRARY_EXTENSION_TRUE=yes  ./configure
--cache-file=/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/conf
	igure.cache --disable-libcheck --prefix /usr --libdir /usr/lib64
--mandir=/usr/share/man --sysconfdir=/etc CPPFLAGS="-I../libibverbs/incl
	ude"
	error: Bad exit status from /var/tmp/rpm-tmp.30970 (%install)
	
	
	RPM build errors:
	    user vlad does not exist - using root
	    group vlad does not exist - using root
	    user vlad does not exist - using root
	    group vlad does not exist - using root
	    Bad exit status from /var/tmp/rpm-tmp.30970 (%install)
	ERROR: Failed executing "rpmbuild --rebuild --define '_topdir
/var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root
/var/tmp/OFE
	D' --define 'configure_options --with-dapl --with-ipoibtools
--with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad
--with-lib
	ibumad --with-libibverbs --with-libmthca --with-opensm
--with-librdmacm --with-libsdp --with-openib-diags --with-sdpnetstat
--with-srptoo
	ls --with-mstflint --with-perftest --with-tvflash
--sysconfdir=/etc --mandir=/usr/share/man' --define 'configure_options32
%{nil} --sysco
	nfdir=/etc --mandir=/usr/share/man' --define 'build_32bit 0'
--define '_mandir /usr/share/man'
/root/OFED-1.2-rc5/SRPMS/ofa_user-1.2-rc5.
	src.rpm"
	
	
	Thanks in advance,
	
	Jeff 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070615/adb5dfac/attachment.html>

From jwong at datallegro.com  Fri Jun 15 15:07:07 2007
From: jwong at datallegro.com (Jeffrey Wong)
Date: Fri, 15 Jun 2007 18:07:07 -0400
Subject: [ofa-general] Trouble installing OFED 1.2-rc5,
	kernel SUSE 10.2.6.21-5 default x86_64 - sdpnetstat
Message-ID: <A382D4292574EB47A85B8159A6AED1A18305C4@FPNYEXCBE02.opus-i.corp>

Hello,
I'm getting the following errors when trying to build sdp:


/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/sdpnetstat/include    -c -o inet_gr.o inet_gr.c
cc -D_GNU_SOURCE -O2 -Wall -g  -I. -idirafter ./include/ -Ilib -I/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/sdpnetstat -idirafter /var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/sdpnetstat/include    -c -o inet_sr.o inet_sr.c
inet_sr.c: In function INET_setroute:
inet_sr.c:201: error: HZ undeclared (first use in this function)
inet_sr.c:201: error: (Each undeclared identifier is reported only once
inet_sr.c:201: error: for each function it appears in.)
make[2]: *** [inet_sr.o] Error 1
make[2]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/sdpnetstat/lib'
make[1]: *** [subdirs] Error 2
make[1]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/sdpnetstat'
make: *** [sdpnetstat] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.34022 (%install)


RPM build errors:
    user vlad does not exist - using root
    group vlad does not exist - using root
    user vlad does not exist - using root
    group vlad does not exist - using root
    Bad exit status from /var/tmp/rpm-tmp.34022 (%install)
ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-dapl --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-sdpnetstat --with-srptools --with-mstflint --with-perftest --with-tvflash --sysconfdir=/etc --mandir=/usr/share/man' --define 'configure_options32 %{nil} --sysconfdir=/etc --mandir=/usr/share/man' --define 'build_32bit 0' --define '_mandir /usr/share/man' /root/OFED-1.2-20070615-0600/SRPMS/ofa_user-1.2-rc5.src.rpm"


Thanks in advance,
Jeff
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070615/41e693de/attachment.html>

From sean.hefty at intel.com  Fri Jun 15 15:11:32 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 15 Jun 2007 15:11:32 -0700
Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support
In-Reply-To: <1181944814.5681.385918.camel@hal.voltaire.com>
Message-ID: <000c01c7af9a$2267a270$3ccc180a@amr.corp.intel.com>

>How about if abi_version == 5, setting pkey_index to 0 regardless of
>what is set ? Isn't that all that ABI v5 really supports ?

This is what ends up happening.  The problem is that the pkey_index is set to 0
by the kernel code.  Nothing it passed down from userspace.  I added the warning
to umad_set_pkey() to notify the user that the value that they're trying to set
is ignored.

- Sean


From jwong at datallegro.com  Fri Jun 15 15:18:26 2007
From: jwong at datallegro.com (Jeffrey Wong)
Date: Fri, 15 Jun 2007 18:18:26 -0400
Subject: [ofa-general] Latest OFED builds everything except ipoibtools and
	sdpnetstat modules for SLES 10.2.6-21-5
Message-ID: <A382D4292574EB47A85B8159A6AED1A18305C5@FPNYEXCBE02.opus-i.corp>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070615/1b87595d/attachment.html>

From sashak at voltaire.com  Fri Jun 15 15:29:34 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Sat, 16 Jun 2007 01:29:34 +0300
Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node
	guid files options for fat-tree
In-Reply-To: <1181941040.5681.381698.camel@hal.voltaire.com>
References: <4670FA2D.7070708@dev.mellanox.co.il>
	<20070614121501.GC5908@sashak.voltaire.com>
	<4671363F.6060600@dev.mellanox.co.il>
	<20070614134519.GD5908@sashak.voltaire.com>
	<1181939959.5681.380508.camel@hal.voltaire.com>
	<20070615205958.GB10766@sashak.voltaire.com>
	<1181941040.5681.381698.camel@hal.voltaire.com>
Message-ID: <20070615222934.GC10766@sashak.voltaire.com>

On 16:57 Fri 15 Jun     , Hal Rosenstock wrote:
> On Fri, 2007-06-15 at 16:59, Sasha Khapyorsky wrote:
> > On 16:39 Fri 15 Jun     , Hal Rosenstock wrote:
> > > On Thu, 2007-06-14 at 09:45, Sasha Khapyorsky wrote:
> > > > On 15:36 Thu 14 Jun     , Yevgeny Kliteynik wrote:
> > > > >  Sasha Khapyorsky wrote:
> > > > > > Hi Yevgeny,
> > > > > > On 11:19 Thu 14 Jun     , Yevgeny Kliteynik wrote:
> > > > > >>  The following three patches are adding root and compute node guid files
> > > > > >>  options for fat-tree routing,
> > > > > > Is there any reason to not share root guids file option with up/down?
> > > > > 
> > > > >  There are two new options for fat-tree: roots and compute nodes (CN).
> > > > >  These two will be very "tightly coupled" and would have more implication
> > > > >  on the routing than in case of up/dn roots. For instance, having root
> > > > >  file but not CN file means that the topology doesn't have to be pure 
> > > > >  fat-tree,
> > > > >  but all the CAs are considered CNs and have to be on the same level of the 
> > > > >  tree.
> > > > >  And there is similar implication of all the combinations of these two 
> > > > >  options.
> > > > > 
> > > > >  Because of this coupling I wanted to differentiate these two options from
> > > > >  the up/dn roots.
> > > > > 
> > > > >  Thoughts?
> > > > 
> > > > I still not have strong option about two options against common one.
> > > 
> > > Me neither.
> > > 
> > > > Hypothetically if in some days we will implement routing engine chains
> > > > (so failed algo will fallback to next in chain and not just to default)
> > > > separate options could be useful.
> > > 
> > > So is this a(nother) reason to keep the roots separate or would that be
> > > dealt with when the routing fallback strategy changes ?
> > 
> > It is yet hypothetical. Currently I don't see a strong practical reasons
> > to have two separate root guids file options for up/down and fat-tree,
> > but guess this is minor and not showstopper.
> 
> Wouldn't a current practical reason be switching between up/down and fat
> tree and they each have different roots ? Is that a real scenario ?

Sure (but guess in many cases selected roots will be same for both
algos). I think this scenario will be handled well with single shared
option, like:

  opensm -R ftree --roots-file ftree-roots-file

, and

  opensm -R updn --roots-file updn-roots-file

Sasha


From halr at voltaire.com  Fri Jun 15 15:23:00 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 15 Jun 2007 18:23:00 -0400
Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support
In-Reply-To: <000c01c7af9a$2267a270$3ccc180a@amr.corp.intel.com>
References: <000c01c7af9a$2267a270$3ccc180a@amr.corp.intel.com>
Message-ID: <1181946179.5681.387443.camel@hal.voltaire.com>

On Fri, 2007-06-15 at 18:11, Sean Hefty wrote:
> >How about if abi_version == 5, setting pkey_index to 0 regardless of
> >what is set ? Isn't that all that ABI v5 really supports ?
> 
> This is what ends up happening.  The problem is that the pkey_index is set to 0
> by the kernel code.  Nothing it passed down from userspace.  I added the warning
> to umad_set_pkey() to notify the user that the value that they're trying to set
> is ignored.

Oh, right. So the question is whether we want that error message over
and over again. Sorry for being slow...

Also, should the index requested be validated (within range) or is that
handled with some error coming back from the lower levels when this is
misspecified ?

-- Hal

> - Sean


From sean.hefty at intel.com  Fri Jun 15 15:33:40 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 15 Jun 2007 15:33:40 -0700
Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support
In-Reply-To: <1181946179.5681.387443.camel@hal.voltaire.com>
Message-ID: <000d01c7af9d$395e8310$3ccc180a@amr.corp.intel.com>

>Also, should the index requested be validated (within range) or is that
>handled with some error coming back from the lower levels when this is
>misspecified ?

umad_set_pkey() can't fully validate the index, since it doesn't know what port
the mad will be used on.  The send will eventually fail if an invalid pkey index
is used, but checking could be added earlier in the send path.

- Sean


From sweitzen at cisco.com  Fri Jun 15 15:36:36 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Fri, 15 Jun 2007 15:36:36 -0700
Subject: [ofa-general] Latest OFED builds everything except ipoibtools
	andsdpnetstat modules for SLES 10.2.6-21-5
In-Reply-To: <A382D4292574EB47A85B8159A6AED1A18305C5@FPNYEXCBE02.opus-i.corp>
References: <A382D4292574EB47A85B8159A6AED1A18305C5@FPNYEXCBE02.opus-i.corp>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA303B0E7D1@xmb-sjc-216.amer.cisco.com>

Are you on SUSE 10 or SLES 10?  SLES 10 has 2.6.16 kernels AFAIK.
 
Scott


________________________________

	From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of Jeffrey Wong
	Sent: Friday, June 15, 2007 3:18 PM
	To: general at lists.openfabrics.org
	Subject: [ofa-general] Latest OFED builds everything except
ipoibtools andsdpnetstat modules for SLES 10.2.6-21-5
	
	
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070615/daa8dace/attachment.html>

From jwong at datallegro.com  Fri Jun 15 15:39:29 2007
From: jwong at datallegro.com (Jeffrey Wong)
Date: Fri, 15 Jun 2007 18:39:29 -0400
Subject: [ofa-general] Latest OFED builds everything except ipoibtools
	andsdpnetstat modules for SLES 10.2.6-21-5
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA303B0E7D1@xmb-sjc-216.amer.cisco.com>
Message-ID: <A382D4292574EB47A85B8159A6AED1A101750BCE@FPNYEXCBE02.opus-i.corp>

Sorry I meant SUSE 10.2.6-21-5.

 
Jeff

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070615/59b7d87b/attachment.html>

From rdreier at cisco.com  Fri Jun 15 21:06:17 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 15 Jun 2007 21:06:17 -0700
Subject: [ofa-general] [ANNOUNCE] libibverbs 1.1.1 released
Message-ID: <adafy4sh7ba.fsf@cisco.com>

I just tagged the 1.1.1 release of libibverbs and pushed it out to
my git tree on kernel.org:

    git://git.kernel.org/pub/scm/libs/infiniband/libibverbs.git

(the name of the tag is libibverbs-1.1.1).

I've also copied a tarball to openfabrics.org, and it should appear
eventually in <http://www.openfabrics.org//downloads/>.

The sha1sum of the release is:

    eac666bf1080deef6e0d52810c83aa5611683828  libibverbs-1.1.1.tar.gz

The most significant change since libibverbs 1.1 is fixing the
initialization of new QPs' state to RESET.  Without this fix, there
will be problems using libmlx4 and ConnectX HCAs.  I also fixed an
annoying bug in the pingpong example programs that caused a crash at
the end of a run in ibv_free_device_list() if a device name other than
the first device present is specified.

The git shortlog since libibverbs 1.1 is:

Dotan Barak (1):
      ibv_devinfo: Decode max_vl_num to actual number

Jack Morgenstein (1):
      Initialize QP state to RESET

Michael S. Tsirkin (1):
      Don't warn root if RLIMIT_MEMLOCK is low

Roland Dreier (5):
      Update Debian build
      Trivial whitespace fixes in examples/
      Fix call to ibv_free_device_list() in pingpong examples
      Add wc_wmb()
      Roll libibverbs 1.1.1 release


From rdreier at cisco.com  Fri Jun 15 21:06:58 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 15 Jun 2007 21:06:58 -0700
Subject: [ofa-general] [ANNOUNCE] libibverbs 1.0.5 released
Message-ID: <adabqfgh7a5.fsf@cisco.com>

I just tagged the 1.0.5 release of libibverbs and pushed it out to
my git tree on kernel.org:

    git://git.kernel.org/pub/scm/libs/infiniband/libibverbs.git

(the name of the tag is libibverbs-1.0.5).

I've also copied a tarball to openfabrics.org, and it should appear
eventually in <http://www.openfabrics.org//downloads/>.

The sha1sum of the release is:

    1c3537729774df8c7b7e31128fb28075681694ee  libibverbs-1.0.5.tar.gz

This is a maintenance release to flush out pending fixes for users of
the old 1.0 stable branch.  However, the 1.1 branch of libibverbs is
considered stable and suitable for all users.

The git shortlog since libibverbs 1.0.4 is:

Dotan Barak (2):
      Handle asprintf memory allocation failures
      Check asprintf() return in pingpong examples

Jack Morgenstein (1):
      Initialize QP state to RESET

Roland Dreier (9):
      Add final Debian changelog for libibverbs 1.0.4
      Bump version number
      Remove svn keywords
      Check return of calloc() in ibv_get_device_list()
      Fix checks of asprintf() return value
      The ibv_cmd_* create functions need to set context
      Revert "The ibv_cmd_* create functions need to set context"
      Fix ibv_srq_pingpong option handling
      Roll libibverbs 1.0.5 release


From wfdnz at psd.k12.co.us  Sat Jun 16 01:05:40 2007
From: wfdnz at psd.k12.co.us (Dinah K. Nash)
Date: Sat, 16 Jun 2007 04:05:40 -0400
Subject: [ofa-general] I think I can only decide after hearing the evidence
	you may bring up,
	otherwise I will resort to the Atlas of Creation where you can find
	hundreds of evidence on the side of Creation.
Message-ID: <467399D4.6080204@psd.k12.co.us>

SREA Gets In On $75 Million Project. Investors Respond!

Score One Inc.
SREA
$0.20 UP 33%

Investors are hyped about this new project. It will not only bring
increased revenues to Score but increased exposure on an international
project like this. Read the news and get on SREA firs thing Monday!

Advertisement Related BlogsThe Thyroid BlogMarie Lee's
BlogHonestMedicineThe Health Care BlogMost PopularQuiz: Could You Be
Hypothyroid? We are fighting what religions are doing, and we can make
specific cases and level specific charges. If organized religions were
truly about helping society and not just growing their wealth and
membership numbers I might have a little bit more respect for them.

Perhaps a little humility would help, but I doubt that it will ever
happen.

It is simply a set of arbitrary rules for a particular society.


From vlad at lists.openfabrics.org  Sat Jun 16 02:42:08 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Sat, 16 Jun 2007 02:42:08 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070616-0200 daily build status
Message-ID: <20070616094208.32983E60836@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.12
Passed on ia64 with linux-2.6.13
Passed on powerpc with linux-2.6.19
Passed on ia64 with linux-2.6.16
Passed on x86_64 with linux-2.6.12
Passed on ia64 with linux-2.6.18
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.16
Passed on powerpc with linux-2.6.18
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.13
Passed on ia64 with linux-2.6.12
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.19
Passed on ppc64 with linux-2.6.18
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.19
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.14
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on x86_64 with linux-2.6.20
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.16
Passed on x86_64 with linux-2.6.17
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.17
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.14
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.13
Passed on ia64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.17
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From mst at dev.mellanox.co.il  Sat Jun 16 12:27:02 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Sat, 16 Jun 2007 22:27:02 +0300
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <4672C0DC.8060308@linux.vnet.ibm.com>
References: <466F36C8.5010507@linux.vnet.ibm.com>
	<20070613163821.GB12277@mellanox.co.il> <adafy4v69ig.fsf@cisco.com>
	<20070613174930.GE12277@mellanox.co.il>
	<46716F3D.7050206@ichips.intel.com> <ada1wge4h4l.fsf@cisco.com>
	<20070614175030.GB29561@mellanox.co.il>
	<4671C541.4040503@linux.vnet.ibm.com>
	<20070615051846.GG2207@mellanox.co.il>
	<4672C0DC.8060308@linux.vnet.ibm.com>
Message-ID: <20070616192702.GM2207@mellanox.co.il>

> We need to make some decisions

Earlier, Roland suggested:
> However it may be a good approach to put an abstraction layer in IPoIB
> so that the CM code can use an SRQ-like interface to both HCAs that
> support SRQ and HCAs that don't.

And I think this might be a good approach, too - and maybe
this layer could be general enough to be reusable in other
ULPs later.

-- 
MST


From swise at opengridcomputing.com  Sat Jun 16 13:52:08 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Sat, 16 Jun 2007 15:52:08 -0500
Subject: [ofa-general] critical fixes for chelsio iwarp driver
Message-ID: <46744D78.9040602@opengridcomputing.com>

Tziporet,

I'll be posting 2 fixes soon that I'd like included in ofed-1.2.

Bugs 663 and 664.  These bugs cause crashes that force a reboot of the 
system and should be considered stop-ship for ofed-1.2.

Thanks,

Steve.


From mst at dev.mellanox.co.il  Sat Jun 16 22:57:23 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 17 Jun 2007 08:57:23 +0300
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <4672D953.3050506@ichips.intel.com>
References: <adafy4v69ig.fsf@cisco.com> <20070613174930.GE12277@mellanox.co.il>
	<46716F3D.7050206@ichips.intel.com> <ada1wge4h4l.fsf@cisco.com>
	<20070614175030.GB29561@mellanox.co.il>
	<4671C541.4040503@linux.vnet.ibm.com>
	<20070615051846.GG2207@mellanox.co.il>
	<4672B523.50502@ichips.intel.com>
	<20070615160709.GK2207@mellanox.co.il>
	<4672D953.3050506@ichips.intel.com>
Message-ID: <20070617055649.GN2207@mellanox.co.il>

> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [ofa-general] Re: [PATCH draft,?untested] ehca srq emulation (for IPoIB CM)
> 
> >Basically, I think that because of lack of SW level flow control,
> >generally IPoIB CM without SRQ does not make sense because of
> >the scalabilty problems.
> 
> Most clusters are only 16-32 nodes.
>
> If IPoIB CM without SRQ can support 
> this number of systems and outperforms IPoIB UD mode, then I do believe 
> that it makes sense.

Note that e.g. with mthca regular QP has lower overhead than SRQ
(less locking, etc). So if your assumption on the number of nodes in IB
clusters is generally correct, we need a generic layer that will start with
regular QPs for a small number of connections, then switch to
SRQ as the number of connections grows (and to datagram mode
if SRQ is not available).

> IPoIB CM support, with or without SRQ, is less 
> scalable than IPoIB UD mode,

I believe this is incorrect: datagram mode has AH per destination,
connected mode has a QP per destination, so with SRQ, I see no
inherent lack of scalability with connected as compared to datagram mode.

> but it was still added because it provided 
> a benefit under most conditions.

-- 
MST


From ogerlitz at voltaire.com  Sun Jun 17 02:17:24 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Sun, 17 Jun 2007 12:17:24 +0300 (IDT)
Subject: [ofa-general] disconnect implementation for rdma cm unconnected
	datagram service
Message-ID: <Pine.LNX.4.64.0706171158080.4098@zuben>

Hi Sean,

Looking on cm_sidr_rep_handler we see that the cm id state
is reseted to IB_CM_IDLE, and on the other hand ib_send_cm_dreq
returns -EINVAL if the id state is not IB_CM_ESTABLISHED. I gueess
this means that rdma_disconnect on RDMA_PS_UDP would never work?

Now, even with fixing that, the disconnect packets can get lost or the
remote side can reboot/etc before the CM manages to send the DREQ packet/s.

Thinking on remote qp/lid change, the equivalent I see for UDP based apps,
is that a remote qp/lid change would have been caught by the local stack
neighbouring system since it sends few unicast arps probes and the re-issues
a broadcast arp from which the new HW address (qpn / gid --> lid) would be learned.

What you think would be the correct way to solve that for rdmacm based apps?
is there a way for the RDMA/IB stack level to provide the solution? we were
considering few alternatives but they all at the app level (eg send probes
to the remote qp/lid, add another RC connection just for the sake of knowing
the remote process is still there, etc).

I guess that remote lid change can be emulated as disconnect if the rdmacm
would listen on IN/OUT traps, but the question if what can we do about the
remote process qp, eg in the case the process dies and then comes back again etc.

thanks,

Or.


From kliteyn at dev.mellanox.co.il  Sun Jun 17 02:28:20 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Sun, 17 Jun 2007 12:28:20 +0300
Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node	guid
	files options for fat-tree
In-Reply-To: <20070615222934.GC10766@sashak.voltaire.com>
References: <4670FA2D.7070708@dev.mellanox.co.il>
	<20070614121501.GC5908@sashak.voltaire.com>
	<4671363F.6060600@dev.mellanox.co.il>
	<20070614134519.GD5908@sashak.voltaire.com>
	<1181939959.5681.380508.camel@hal.voltaire.com>
	<20070615205958.GB10766@sashak.voltaire.com>
	<1181941040.5681.381698.camel@hal.voltaire.com>
	<20070615222934.GC10766@sashak.voltaire.com>
Message-ID: <4674FEB4.4000108@dev.mellanox.co.il>

Sasha Khapyorsky wrote:
> On 16:57 Fri 15 Jun     , Hal Rosenstock wrote:
>> On Fri, 2007-06-15 at 16:59, Sasha Khapyorsky wrote:
>>> On 16:39 Fri 15 Jun     , Hal Rosenstock wrote:
>>>> On Thu, 2007-06-14 at 09:45, Sasha Khapyorsky wrote:
>>>>> On 15:36 Thu 14 Jun     , Yevgeny Kliteynik wrote:
>>>>>>  Sasha Khapyorsky wrote:
>>>>>>> Hi Yevgeny,
>>>>>>> On 11:19 Thu 14 Jun     , Yevgeny Kliteynik wrote:
>>>>>>>>  The following three patches are adding root and compute node guid files
>>>>>>>>  options for fat-tree routing,
>>>>>>> Is there any reason to not share root guids file option with up/down?
>>>>>>  There are two new options for fat-tree: roots and compute nodes (CN).
>>>>>>  These two will be very "tightly coupled" and would have more implication
>>>>>>  on the routing than in case of up/dn roots. For instance, having root
>>>>>>  file but not CN file means that the topology doesn't have to be pure 
>>>>>>  fat-tree,
>>>>>>  but all the CAs are considered CNs and have to be on the same level of the 
>>>>>>  tree.
>>>>>>  And there is similar implication of all the combinations of these two 
>>>>>>  options.
>>>>>>
>>>>>>  Because of this coupling I wanted to differentiate these two options from
>>>>>>  the up/dn roots.
>>>>>>
>>>>>>  Thoughts?
>>>>> I still not have strong option about two options against common one.
>>>> Me neither.
>>>>
>>>>> Hypothetically if in some days we will implement routing engine chains
>>>>> (so failed algo will fallback to next in chain and not just to default)
>>>>> separate options could be useful.
>>>> So is this a(nother) reason to keep the roots separate or would that be
>>>> dealt with when the routing fallback strategy changes ?
>>> It is yet hypothetical. Currently I don't see a strong practical reasons
>>> to have two separate root guids file options for up/down and fat-tree,
>>> but guess this is minor and not showstopper.
>> Wouldn't a current practical reason be switching between up/down and fat
>> tree and they each have different roots ? Is that a real scenario ?
> 
> Sure (but guess in many cases selected roots will be same for both
> algos).

I think that selected roots will always be same for both algos.
I can't think of any topology that will require different set of roots
for two algorithms that see the fabric as tree with routes going up and
then down.

> I think this scenario will be handled well with single shared
> option, like:
> 
>   opensm -R ftree --roots-file ftree-roots-file
> 
> , and
> 
>   opensm -R updn --roots-file updn-roots-file

I agree with this.
I will rework the patch and replace the updn_guid_file with root_guid_file,
and add cn_guid_file.

This also means that the OSM command line options -a or --add_guid_file
will be replaced with -O or --root_guid_file, and we will have additional
options for CN file: -C or --cn_guid_file

Sounds OK?

-- Yevgeny
> 
> Sasha
> 


From vlad at lists.openfabrics.org  Sun Jun 17 02:43:19 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Sun, 17 Jun 2007 02:43:19 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070617-0200 daily build status
Message-ID: <20070617094319.4F88DE60839@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.14
Passed on powerpc with linux-2.6.18
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.12
Passed on ppc64 with linux-2.6.15
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.18
Passed on ia64 with linux-2.6.14
Passed on powerpc with linux-2.6.19
Passed on ia64 with linux-2.6.13
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.16
Passed on powerpc with linux-2.6.17
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.14
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.21.1
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.17
Passed on x86_64 with linux-2.6.13
Passed on ia64 with linux-2.6.17
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.17
Passed on ppc64 with linux-2.6.18
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on x86_64 with linux-2.6.19
Passed on ia64 with linux-2.6.16
Passed on powerpc with linux-2.6.16
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.15
Passed on ppc64 with linux-2.6.13
Passed on powerpc with linux-2.6.12
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on powerpc with linux-2.6.15
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.18-8.el5
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From kliteyn at dev.mellanox.co.il  Sun Jun 17 04:11:54 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Sun, 17 Jun 2007 14:11:54 +0300
Subject: [ofa-general] PATCH [0/3] osm: adding root and compute node	guid
	files options for fat-tree
In-Reply-To: <4674FEB4.4000108@dev.mellanox.co.il>
References: <4670FA2D.7070708@dev.mellanox.co.il>	<20070614121501.GC5908@sashak.voltaire.com>	<4671363F.6060600@dev.mellanox.co.il>	<20070614134519.GD5908@sashak.voltaire.com>	<1181939959.5681.380508.camel@hal.voltaire.com>	<20070615205958.GB10766@sashak.voltaire.com>	<1181941040.5681.381698.camel@hal.voltaire.com>	<20070615222934.GC10766@sashak.voltaire.com>
	<4674FEB4.4000108@dev.mellanox.co.il>
Message-ID: <467516FA.9000605@dev.mellanox.co.il>

Yevgeny Kliteynik wrote:
> Sasha Khapyorsky wrote:
>> On 16:57 Fri 15 Jun     , Hal Rosenstock wrote:
>>> On Fri, 2007-06-15 at 16:59, Sasha Khapyorsky wrote:
>>>> On 16:39 Fri 15 Jun     , Hal Rosenstock wrote:
>>>>> On Thu, 2007-06-14 at 09:45, Sasha Khapyorsky wrote:
>>>>>> On 15:36 Thu 14 Jun     , Yevgeny Kliteynik wrote:
>>>>>>>  Sasha Khapyorsky wrote:
>>>>>>>> Hi Yevgeny,
>>>>>>>> On 11:19 Thu 14 Jun     , Yevgeny Kliteynik wrote:
>>>>>>>>>  The following three patches are adding root and compute node 
>>>>>>>>> guid files
>>>>>>>>>  options for fat-tree routing,
>>>>>>>> Is there any reason to not share root guids file option with 
>>>>>>>> up/down?
>>>>>>>  There are two new options for fat-tree: roots and compute nodes 
>>>>>>> (CN).
>>>>>>>  These two will be very "tightly coupled" and would have more 
>>>>>>> implication
>>>>>>>  on the routing than in case of up/dn roots. For instance, having 
>>>>>>> root
>>>>>>>  file but not CN file means that the topology doesn't have to be 
>>>>>>> pure  fat-tree,
>>>>>>>  but all the CAs are considered CNs and have to be on the same 
>>>>>>> level of the  tree.
>>>>>>>  And there is similar implication of all the combinations of 
>>>>>>> these two  options.
>>>>>>>
>>>>>>>  Because of this coupling I wanted to differentiate these two 
>>>>>>> options from
>>>>>>>  the up/dn roots.
>>>>>>>
>>>>>>>  Thoughts?
>>>>>> I still not have strong option about two options against common one.
>>>>> Me neither.
>>>>>
>>>>>> Hypothetically if in some days we will implement routing engine 
>>>>>> chains
>>>>>> (so failed algo will fallback to next in chain and not just to 
>>>>>> default)
>>>>>> separate options could be useful.
>>>>> So is this a(nother) reason to keep the roots separate or would 
>>>>> that be
>>>>> dealt with when the routing fallback strategy changes ?
>>>> It is yet hypothetical. Currently I don't see a strong practical 
>>>> reasons
>>>> to have two separate root guids file options for up/down and fat-tree,
>>>> but guess this is minor and not showstopper.
>>> Wouldn't a current practical reason be switching between up/down and fat
>>> tree and they each have different roots ? Is that a real scenario ?
>>
>> Sure (but guess in many cases selected roots will be same for both
>> algos).
> 
> I think that selected roots will always be same for both algos.
> I can't think of any topology that will require different set of roots
> for two algorithms that see the fabric as tree with routes going up and
> then down.
> 
>> I think this scenario will be handled well with single shared
>> option, like:
>>
>>   opensm -R ftree --roots-file ftree-roots-file
>>
>> , and
>>
>>   opensm -R updn --roots-file updn-roots-file
> 
> I agree with this.
> I will rework the patch and replace the updn_guid_file with root_guid_file,
> and add cn_guid_file.
> 
> This also means that the OSM command line options -a or --add_guid_file
> will be replaced with -O or --root_guid_file, and we will have additional
> options for CN file: -C or --cn_guid_file

Sorry, -C is already taken. I'm running out of letters here... :)
Suggesting leaving 'a' for roots, and using 'u' for CNs:

  -a or --root_guid_file
  -u or --cn_guid_file

-- Yevgeny

> Sounds OK?
> 
> -- Yevgeny
>>
>> Sasha
>>
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 


From kliteyn at dev.mellanox.co.il  Sun Jun 17 05:26:02 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Sun, 17 Jun 2007 15:26:02 +0300
Subject: [ofa-general] [PATCH] osm: adding root_guid_file and cn_guid_file
	OpenSM options
Message-ID: <4675285A.6060309@dev.mellanox.co.il>

Hi Hal,

This patch replaces updn_guid_file in the Up/Down routing with
root_guid_file for Up/Down and Fat-Tree routing, and adds a new
option - cn_guid_file for Fat-Tree routing.
OpenSM command line options for these two files are:

  '-a' or '--root_guid_file' for roots
  '-u' or '--cn_guid_file' for compute nodes

Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
  opensm/include/opensm/osm_subnet.h |   12 +++++++++---
  opensm/opensm/main.c               |   29 ++++++++++++++++++++++-------
  opensm/opensm/osm_subnet.c         |   25 ++++++++++++++++++-------
  opensm/opensm/osm_ucast_updn.c     |    6 +++---
  4 files changed, 52 insertions(+), 20 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
index c62128b..a38fc49 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -278,7 +278,8 @@ typedef struct _osm_subn_opt
    char *                   routing_engine_name;
    char *                   lid_matrix_dump_file;
    char *                   ucast_dump_file;
-  char *                   updn_guid_file;
+  char *                   root_guid_file;
+  char *                   cn_guid_file;
    char *                   sa_db_file;
    boolean_t                exit_on_fatal;
    boolean_t                honor_guid2lid_file;
@@ -452,8 +453,13 @@ typedef struct _osm_subn_opt
  *		Name of the unicast routing dump file from where switch
  *		forwarding tables will be loaded
  *
-*	updn_guid_file
-*		Pointer to name of the UPDN guid file given by User
+*	root_guid_file
+*		Name of the file that contains list of root guids that
+*		will be used by fat-tree or up/dn routing (provided by User)
+*
+*	cn_guid_file
+*		Name of the file that contains list of compute node guids that
+*		will be used by fat-tree routing (provided by User)
  *
  *	sa_db_file
  *		Name of the SA database file.
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index 6b4cb4f..d17a994 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -189,8 +189,14 @@ show_usage(void)
            "          This option specifies the name of the SA DB dump file\n"
            "          from where SA database will be loaded.\n\n");
    printf ("-a\n"
-          "--add_guid_file <path to file>\n"
-          "          Set the root nodes for the Up/Down routing algorithm\n"
+          "--root_guid_file <path to file>\n"
+          "          Set the root nodes for the Up/Down or Fat-Tree routing\n"
+          "          algorithm to the guids provided in the given file (one\n"
+          "          to a line)\n"
+          "\n");
+  printf ("-u\n"
+          "--cn_guid_file <path to file>\n"
+          "          Set the compute nodes for the Fat-Tree routing algorithm\n"
            "          to the guids provided in the given file (one to a line)\n"
            "\n");
    printf( "-o\n"
@@ -585,7 +591,7 @@ main(
    char                 *ignore_guids_file_name = NULL;
    uint32_t              val;
    const char * const    short_option =
-	  "i:f:ed:g:l:L:s:t:a:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:";
+	  "i:f:ed:g:l:L:s:t:a:u:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:";

    /*
      In the array below, the 2nd parameter specifies the number
@@ -622,7 +628,8 @@ main(
        {  "lid_matrix_file",1, NULL, 'M'},
        {  "ucast_file",    1, NULL, 'U'},
        {  "sadb_file",     1, NULL, 'S'},
-      {  "add_guid_file", 1, NULL, 'a'},
+      {  "root_guid_file",1, NULL, 'a'},
+      {  "cn_guid_file",  1, NULL, 'u'},
        {  "cache-options", 0, NULL, 'c'},
        {  "stay_on_fatal", 0, NULL, 'y'},
        {  "honor_guid2lid",0, NULL, 'x'},
@@ -886,10 +893,18 @@ main(

      case 'a':
        /*
-        Specifies port guids file
+        Specifies root guids file
+      */
+      opt.root_guid_file = optarg;
+      printf (" Root Guid File: %s\n", opt.root_guid_file );
+      break;
+
+    case 'u':
+      /*
+        Specifies compute node guids file
        */
-      opt.updn_guid_file = optarg;
-      printf (" UPDN Guid File: %s\n", opt.updn_guid_file );
+      opt.cn_guid_file = optarg;
+      printf (" Compute Node Guid File: %s\n", opt.cn_guid_file );
        break;

      case 'c':
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 736f49a..4e080ba 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -500,7 +500,8 @@ osm_subn_set_default_opt(
    p_opt->routing_engine_name = NULL;
    p_opt->lid_matrix_dump_file = NULL;
    p_opt->ucast_dump_file = NULL;
-  p_opt->updn_guid_file = NULL;
+  p_opt->root_guid_file = NULL;
+  p_opt->cn_guid_file = NULL;
    p_opt->sa_db_file = NULL;
    p_opt->exit_on_fatal = TRUE;
    p_opt->enable_quirks = FALSE;
@@ -1323,8 +1324,12 @@ osm_subn_parse_conf_file(
          p_key, p_val, &p_opts->ucast_dump_file);

        __osm_subn_opts_unpack_charp(
-        "updn_guid_file",
-        p_key, p_val, &p_opts->updn_guid_file);
+        "root_guid_file",
+        p_key, p_val, &p_opts->root_guid_file);
+
+      __osm_subn_opts_unpack_charp(
+        "cn_guid_file",
+        p_key, p_val, &p_opts->cn_guid_file);

        __osm_subn_opts_unpack_charp(
          "sa_db_file",
@@ -1548,12 +1553,18 @@ osm_subn_write_conf_file(
               "# Ucast dump file name\n"
               "ucast_dump_file %s\n\n",
               p_opts->ucast_dump_file);
-  if (p_opts->updn_guid_file)
+  if (p_opts->root_guid_file)
+    fprintf( opts_file,
+             "# The file holding the root node guids (for fat-tree or Up/Down)\n"
+             "# One guid in each line\n"
+             "root_guid_file %s\n\n",
+             p_opts->root_guid_file);
+  if (p_opts->cn_guid_file)
      fprintf( opts_file,
-             "# The file holding the Up/Down root node guids\n"
+             "# The file holding the fat-tree compute node guids\n"
               "# One guid in each line\n"
-             "updn_guid_file %s\n\n",
-             p_opts->updn_guid_file);
+             "cn_guid_file %s\n\n",
+             p_opts->cn_guid_file);
    if (p_opts->sa_db_file)
      fprintf( opts_file,
               "# SA database file name\n"
diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c
index 2448246..af5ee4e 100644
--- a/opensm/opensm/osm_ucast_updn.c
+++ b/opensm/opensm/osm_ucast_updn.c
@@ -311,10 +311,10 @@ updn_init(
       Check the source for root node list, if file parse it, otherwise
       wait for a callback to activate auto detection
    */
-  if (p_osm->subn.opt.updn_guid_file)
+  if (p_osm->subn.opt.root_guid_file)
    {
      status = osm_ucast_mgr_read_guid_file( &p_osm->sm.ucast_mgr,
-                                           p_osm->subn.opt.updn_guid_file,
+                                           p_osm->subn.opt.root_guid_file,
                                             p_updn->p_root_nodes );
      if (status != IB_SUCCESS)
         goto Exit;
@@ -323,7 +323,7 @@ updn_init(
      osm_log( &p_osm->log, OSM_LOG_DEBUG,
               "updn_init: "
               "UPDN - Fetching root nodes from file %s\n",
-             p_osm->subn.opt.updn_guid_file );
+             p_osm->subn.opt.root_guid_file );
      guid_iterator = cl_list_head(p_updn->p_root_nodes);
      while( guid_iterator != cl_list_end(p_updn->p_root_nodes) )
      {
-- 
1.5.1.4


From sashak at voltaire.com  Sun Jun 17 05:22:29 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Sun, 17 Jun 2007 15:22:29 +0300
Subject: [ofa-general] PATCH [0/3] osm: adding root and compute
	node	guid files options for fat-tree
In-Reply-To: <467516FA.9000605@dev.mellanox.co.il>
References: <4670FA2D.7070708@dev.mellanox.co.il>
	<20070614121501.GC5908@sashak.voltaire.com>
	<4671363F.6060600@dev.mellanox.co.il>
	<20070614134519.GD5908@sashak.voltaire.com>
	<1181939959.5681.380508.camel@hal.voltaire.com>
	<20070615205958.GB10766@sashak.voltaire.com>
	<1181941040.5681.381698.camel@hal.voltaire.com>
	<20070615222934.GC10766@sashak.voltaire.com>
	<4674FEB4.4000108@dev.mellanox.co.il>
	<467516FA.9000605@dev.mellanox.co.il>
Message-ID: <1182082950.4517.9.camel@localhost>

On Sun, 2007-06-17 at 14:11 +0300, Yevgeny Kliteynik wrote:
> Yevgeny Kliteynik wrote:
> > Sasha Khapyorsky wrote:
> >> On 16:57 Fri 15 Jun     , Hal Rosenstock wrote:
> >>> On Fri, 2007-06-15 at 16:59, Sasha Khapyorsky wrote:
> >>>> On 16:39 Fri 15 Jun     , Hal Rosenstock wrote:
> >>>>> On Thu, 2007-06-14 at 09:45, Sasha Khapyorsky wrote:
> >>>>>> On 15:36 Thu 14 Jun     , Yevgeny Kliteynik wrote:
> >>>>>>>  Sasha Khapyorsky wrote:
> >>>>>>>> Hi Yevgeny,
> >>>>>>>> On 11:19 Thu 14 Jun     , Yevgeny Kliteynik wrote:
> >>>>>>>>>  The following three patches are adding root and compute node 
> >>>>>>>>> guid files
> >>>>>>>>>  options for fat-tree routing,
> >>>>>>>> Is there any reason to not share root guids file option with 
> >>>>>>>> up/down?
> >>>>>>>  There are two new options for fat-tree: roots and compute nodes 
> >>>>>>> (CN).
> >>>>>>>  These two will be very "tightly coupled" and would have more 
> >>>>>>> implication
> >>>>>>>  on the routing than in case of up/dn roots. For instance, having 
> >>>>>>> root
> >>>>>>>  file but not CN file means that the topology doesn't have to be 
> >>>>>>> pure  fat-tree,
> >>>>>>>  but all the CAs are considered CNs and have to be on the same 
> >>>>>>> level of the  tree.
> >>>>>>>  And there is similar implication of all the combinations of 
> >>>>>>> these two  options.
> >>>>>>>
> >>>>>>>  Because of this coupling I wanted to differentiate these two 
> >>>>>>> options from
> >>>>>>>  the up/dn roots.
> >>>>>>>
> >>>>>>>  Thoughts?
> >>>>>> I still not have strong option about two options against common one.
> >>>>> Me neither.
> >>>>>
> >>>>>> Hypothetically if in some days we will implement routing engine 
> >>>>>> chains
> >>>>>> (so failed algo will fallback to next in chain and not just to 
> >>>>>> default)
> >>>>>> separate options could be useful.
> >>>>> So is this a(nother) reason to keep the roots separate or would 
> >>>>> that be
> >>>>> dealt with when the routing fallback strategy changes ?
> >>>> It is yet hypothetical. Currently I don't see a strong practical 
> >>>> reasons
> >>>> to have two separate root guids file options for up/down and fat-tree,
> >>>> but guess this is minor and not showstopper.
> >>> Wouldn't a current practical reason be switching between up/down and fat
> >>> tree and they each have different roots ? Is that a real scenario ?
> >>
> >> Sure (but guess in many cases selected roots will be same for both
> >> algos).
> > 
> > I think that selected roots will always be same for both algos.
> > I can't think of any topology that will require different set of roots
> > for two algorithms that see the fabric as tree with routes going up and
> > then down.
> > 
> >> I think this scenario will be handled well with single shared
> >> option, like:
> >>
> >>   opensm -R ftree --roots-file ftree-roots-file
> >>
> >> , and
> >>
> >>   opensm -R updn --roots-file updn-roots-file
> > 
> > I agree with this.
> > I will rework the patch and replace the updn_guid_file with root_guid_file,
> > and add cn_guid_file.
> > 
> > This also means that the OSM command line options -a or --add_guid_file
> > will be replaced with -O or --root_guid_file, and we will have additional
> > options for CN file: -C or --cn_guid_file
> 
> Sorry, -C is already taken. I'm running out of letters here... :)
> Suggesting leaving 'a' for roots, and using 'u' for CNs:
> 
>   -a or --root_guid_file
>   -u or --cn_guid_file

Looks perfect for me.

Sasha


From tziporet at mellanox.co.il  Sun Jun 17 06:51:28 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Sun, 17 Jun 2007 16:51:28 +0300
Subject: [ofa-general] Re: [ewg] critical fixes for chelsio iwarp driver
In-Reply-To: <46744D78.9040602@opengridcomputing.com>
References: <46744D78.9040602@opengridcomputing.com>
Message-ID: <46753C60.90008@mellanox.co.il>

Steve Wise wrote:
> Tziporet,
>
> I'll be posting 2 fixes soon that I'd like included in ofed-1.2.
>
> Bugs 663 and 664.  These bugs cause crashes that force a reboot of the 
> system and should be considered stop-ship for ofed-1.2.
>
> Thanks,
>
> Steve.
>
OK - but make sure the patches are ready on Monday since we wish to do 
the GA release this week

Tziporet


From tziporet at mellanox.co.il  Sun Jun 17 07:50:07 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Sun, 17 Jun 2007 17:50:07 +0300
Subject: [ofa-general] crash in ipoib
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691C0285B8C1@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691C0285B8C1@orsmsx418.amr.corp.intel.com>
Message-ID: <46754A1F.9060106@mellanox.co.il>

Woodruff, Robert J wrote:
> This looks like it fixed the panic. 
>
> Should we try to put out a new RC with this latest ipoib fix ?
> I really think we need it in the release. If we could get another RC out
> today,
> that would only delay the release by a couple of more days and we could
> release on next Friday rather than wed. and still give people a week to 
> test the final RC.
>
> woody
>
>   
I agree we need this fix. I suggest we create RC6 once this and Steve 
fixes for 663 and 664 are fixed

Lets close all details in the meeting tomorrow

Tziporet


From vlad at dev.mellanox.co.il  Sun Jun 17 07:58:24 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Sun, 17 Jun 2007 17:58:24 +0300
Subject: [ofa-general] quick IPoIB config question
In-Reply-To: <467009FC.3070402@scalableinformatics.com>
References: <467009FC.3070402@scalableinformatics.com>
Message-ID: <46754C10.4010801@dev.mellanox.co.il>

Joe Landman wrote:
> Hi folks:
> 
>   Built OFED-1.2-rc4 on OpenSuSE 10.2, works fine as long as I turn of 
> 32-bit build, and update to a 2.6.20 kernel.  Installed the RPMs after 
> build, and the system appears to be fine/well behaved.  Is there a 
> OFED-specific technique to have the ib0 interface configure at boot 
> time, after drivers load?   This might be distribution specific.
> 
> I created a file named /etc/sysconfig/network/ifcfg-ib0 which contained
> 
> BOOTPROTO='static'
> MTU=''
> REMOTE_IPADDR=''
> STARTMODE='onboot'
> USERCONTROL='no'
> NETMASK='255.255.0.0'
> IPADDR='10.1.32.2'
> DEVICE='ib0'
> 
> Bringing the interface up with an 'ifconfig ib0 up' doesn't seem to 
> assign the IP address and netmask to it.
> 
> Hence my question.  Is there an OFED specific method of configuring this 
> (e.g. a config file I need to edit/create), or is it distribution 
> dependent?
> 
> If I force the issue with an ifconfig, it looks like it works fine. This 
> is ok as a work around, and I can create an /etc/init.d/ib or similar to 
> force the issue.  I would prefer to do this "the right way", and if 
> there is someone with guidance/pointers as to what that is, I would 
> prefer to follow that.
> 
> Thanks.
> 
> Joe
> 

Hi Joe,
You can do one of the following to set ib0 configuration from ifcfg-ib0:

* ifup ib0

* /etc/init.d/openibd restart

Regards,
Vladimir


From tziporet at mellanox.co.il  Sun Jun 17 08:05:58 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Sun, 17 Jun 2007 18:05:58 +0300
Subject: [ofa-general] Re: ipoib / bonding and OFED
In-Reply-To: <15ddcffd0706081420r79984701u4e385e28857cb68b@mail.gmail.com>
References: <3857BB049D83424D9DB82753D37CEA55459C41@taurus.voltaire.com>	<4657373E.2030903@hp.com>
	<465BDC90.5080305@voltaire.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA303951259@xmb-sjc-216.amer.cisco.com>	<466702A8.5080302@hp.com>
	<4667B5FD.4070600@voltaire.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA303A4544E@xmb-sjc-216.amer.cisco.com>
	<15ddcffd0706081420r79984701u4e385e28857cb68b@mail.gmail.com>
Message-ID: <46754DD6.2080807@mellanox.co.il>


> On 6/7/07, *Scott Weitzenkamp (sweitzen)* <sweitzen at cisco.com 
> <mailto:sweitzen at cisco.com>> wrote:
>
>     I don't know if I've said this in public, but I've stopped testing
>     ipoibtools HA as of OFED 1.2 rc2 and Cisco is only going to support
>     ib-bonding HA for our OFED 1.2 customers, as our testing has revealed
>     ib-bonding is more robust than ipoibtools.  I know I said this to
>     Tziporet at Sonoma, and she seemed to agree we could eventually
>     remove
>     ipoibtools from OFED.
>
>
> Scott,
>
>
the ipoibtools will be removed from OFED 1.3

Tziporet


From jackm at dev.mellanox.co.il  Sun Jun 17 08:18:34 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Sun, 17 Jun 2007 18:18:34 +0300
Subject: [ofa-general] Re: [PATCH/RFC] IB/mlx4: Handle new FW requirement for
	send request prefetching
In-Reply-To: <adaodjj69wc.fsf@cisco.com>
References: <200706051602.14182.jackm@dev.mellanox.co.il>
	<adaodjj69wc.fsf@cisco.com>
Message-ID: <200706171818.34690.jackm@dev.mellanox.co.il>

On Wednesday 13 June 2007 20:29, Roland Dreier wrote:
> I just queued this patch to handle new FW up.  Please let me know if
> it looks OK to you, and I will ask Linus to pull it.
> 
> Thanks.
> 

Looks good!

- Jack

> commit f22332295cb218ad12db2b521a34553ff5790c34
> Author: Roland Dreier <rolandd at cisco.com>
> Date:   Wed Jun 13 10:26:43 2007 -0700
> 
>     IB/mlx4: Handle new FW requirement for send request prefetching
>     
>     New ConnectX firmware introduces FW command interface revision 2,
>     which requires that for each QP, a chunk of send queue entries (the
>     "headroom") is kept marked as invalid, so that the HCA doesn't get
>     confused if it prefetches entries that haven't been posted yet.  Add
>     code to the driver to do this, and also update the user ABI so that
>     userspace can request that the prefetcher be turned off for userspace
>     QPs (we just leave the prefetcher on for all kernel QPs).
>     
>     Marking send queue entries this way is OK for older firmware too, so
>     we change the driver to allow FW command interface revisions 1 and 2.
>     
>     Based on a patch from Jack Morgenstein <jackm at dev.mellanox.co.il>.
>     
>     Signed-off-by: Roland Dreier <rolandd at cisco.com>
> 
>


From jackm at dev.mellanox.co.il  Sun Jun 17 08:24:47 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Sun, 17 Jun 2007 18:24:47 +0300
Subject: [ofa-general] Re: [PATCH/RFC] libmlx4: Handle new FW requirement for
	send request prefetching
In-Reply-To: <adak5u769n4.fsf_-_@cisco.com>
References: <200706051602.14182.jackm@dev.mellanox.co.il>
	<adaodjj69wc.fsf@cisco.com> <adak5u769n4.fsf_-_@cisco.com>
Message-ID: <200706171824.47371.jackm@dev.mellanox.co.il>

On Wednesday 13 June 2007 20:34, Roland Dreier wrote:
> Similarly I just added this to libmlx4.  The change to handle alignment
> for inline send segments will be a separate patch, and I'm still
> cleaning it up.  Anyway, let me know if you see any problems with
> this.
> 
Looks good!
(I like how you handled the (2K+ 1-wqe headroom).

- Jack


From landman at scalableinformatics.com  Sun Jun 17 08:54:25 2007
From: landman at scalableinformatics.com (Joe Landman)
Date: Sun, 17 Jun 2007 11:54:25 -0400
Subject: [ofa-general] quick IPoIB config question
In-Reply-To: <46754C10.4010801@dev.mellanox.co.il>
References: <467009FC.3070402@scalableinformatics.com>
	<46754C10.4010801@dev.mellanox.co.il>
Message-ID: <46755931.10802@scalableinformatics.com>

Hi Vladimir:

Vladimir Sokolovsky wrote:

> Hi Joe,
> You can do one of the following to set ib0 configuration from ifcfg-ib0:
> 
> * ifup ib0
> 
> * /etc/init.d/openibd restart

I had tried those, to no effect.  When I rebooted, after chkconfig'ing 
openibd on, it didnt come up properly either.  I had to force the issue 
in a special /etc/init.d/ipoib I created to come up after the openibd, 
where I sourced the /etc/sysconfig/network/ifcfg-ib0 file, and then did 
a simple "ifconfig" of ib0 after it.

This was/is strange.  The work-around is fine for the moment for this 
customer.  I will see if I can dig in and file a better report on what 
happened.

> 
> Regards,
> Vladimir


-- 

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From swise at opengridcomputing.com  Sun Jun 17 08:58:53 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Sun, 17 Jun 2007 10:58:53 -0500
Subject: [ofa-general] [GIT PULL ofed_1_2] iw_cxgb3 fixes for bugs 663/664
Message-ID: <46755A3D.1030300@opengridcomputing.com>

Vlad,

Please pull in these fixes for bugs 663/664 from

git://git.openfabrics.org/~swise/ofed_1_2 ofed_1_2

Thanks,

Steve.

git-log

commit bd3a007a1432ded7d5d538d2125249d111c2644f
Author: Steve Wise <swise at opengridcomputing.com>
Date:   Sat Jun 16 15:48:28 2007 -0500

     Don't count neg_adv abort_req_rss messages as real aborts.

     negative advice messages should _not_ count toward the 2 abort requests
     needed to indicate an abort request.

     Signed-off-by: Steve Wise <swise at opengridcomputing.com>

diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index ed56d55..a654bd5 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -1464,6 +1464,13 @@ static int peer_abort(struct t3cdev *tde
         int ret;
         int state;

+       if (is_neg_adv_abort(req->status)) {
+               PDBG("%s neg_adv_abort ep %p tid %d\n", __FUNCTION__, ep,
+                    ep->hwtid);
+               t3_l2t_send_event(ep->com.tdev, ep->l2t);
+               return CPL_RET_BUF_DONE;
+       }
+
         /*
          * We get 2 peer aborts from the HW.  The first one must
          * be ignored except for scribbling that we need one more.
@@ -1473,13 +1480,6 @@ static int peer_abort(struct t3cdev *tde
                 return CPL_RET_BUF_DONE;
         }

-       if (is_neg_adv_abort(req->status)) {
-               PDBG("%s neg_adv_abort ep %p tid %d\n", __FUNCTION__, ep,
-                    ep->hwtid);
-               t3_l2t_send_event(ep->com.tdev, ep->l2t);
-               return CPL_RET_BUF_DONE;
-       }
-
         state = state_read(&ep->com);
         PDBG("%s ep %p state %u\n", __FUNCTION__, ep, state);
         switch (state) {

commit 70549f1d9b8d0420bd97111d3c73973723a88a9c
Author: Steve Wise <swise at opengridcomputing.com>
Date:   Fri Jun 15 11:44:40 2007 -0500

     TERMINATE WRs can hang the tx ofld queue.

     Don't set the gen bits nor length bits in the terminate wr.  This is
     done by the LLD driver.

     Signed-off-by: Steve Wise <swise at opengridcomputing.com>

diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index 162d1fa..431a7e8 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -630,9 +630,9 @@ int iwch_post_terminate(struct iwch_qp *
         /* immediate data starts here. */
         term = (struct terminate_message *)wqe->send.sgl;
         build_term_codes(rsp_msg, &term->layer_etype, &term->ecode);
-       build_fw_riwrh((void *)wqe, T3_WR_SEND,
-                      T3_COMPLETION_FLAG | T3_NOTIFY_FLAG, 1,
-                      qhp->ep->hwtid, 5);
+       wqe->send.wrh.op_seop_flags = cpu_to_be32(V_FW_RIWR_OP(T3_WR_SEND) |
+                        V_FW_RIWR_FLAGS(T3_COMPLETION_FLAG | T3_NOTIFY_FLAG));
+       wqe->send.wrh.gen_tid_len = cpu_to_be32(V_FW_RIWR_TID(qhp->ep->hwtid));
         skb->priority = CPL_PRIORITY_DATA;
         return (cxgb3_ofld_send(qhp->rhp->rdev.t3cdev_p, skb));
  }


From mst at dev.mellanox.co.il  Sun Jun 17 12:10:56 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 17 Jun 2007 22:10:56 +0300
Subject: [ofa-general] [PATCH] IB/iser: make all fixes patches apply on full
	kernel source
Message-ID: <20070617191056.GC27233@mellanox.co.il>

Since we'll have RC6, I'd like to add the following cosmetic change:
move iscsi_scsi_makefile from kernel_patches/fixes to ofed_scripts,
and use a softlink to put in in place.

The solves the following problem: if I do "git clone" on ofed
tree without -n, and try to apply the fixes patchset,
I get a conflict on iscsi_scsi_makefile.patch simply because
the makefile that this patch attempts to create is part
of upstream kernel already.

I think this is 0-risk and carries real benefit for developers
who'll need to support OFED 1.2.

Erez, do you agree? If yes, I'll ask Tziporet to approve, too.

Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>

diff --git a/kernel_patches/fixes/iscsi_scsi_makefile.patch b/kernel_patches/fixes/iscsi_scsi_makefile.patch
deleted file mode 100644
index 9c4fd01..0000000
--- a/kernel_patches/fixes/iscsi_scsi_makefile.patch
+++ /dev/null
@@ -1,10 +0,0 @@
-Add a Makefile based on the kernel's drivers/scsi/Makefile in order to build open-iscsi.
-
-Signed-off-by: Erez Zilber <erezz at voltaire.com>
-
-diff -ruN ofa_1_2_kernel-20061228-0200/drivers/scsi/Makefile ofa_1_2_kernel-20061228-0200-open-iscsi/drivers/scsi/Makefile
---- ofa_1_2_kernel-20061228-0200/drivers/scsi/Makefile  1970-01-01 02:00:00.000000000 +0200
-+++ ofa_1_2_kernel-20061228-0200-open-iscsi/drivers/scsi/Makefile       2006-12-28 17:01:22.000000000 +0200
-@@ -0,0 +1,2 @@
-+obj-$(CONFIG_SCSI_ISCSI_ATTRS) += scsi_transport_iscsi.o
-+obj-$(CONFIG_ISCSI_TCP)        += libiscsi.o   iscsi_tcp.o
diff --git a/ofed_scripts/iscsi_scsi_makefile b/ofed_scripts/iscsi_scsi_makefile
new file mode 100644
index 0000000..cfdf3e0
--- /dev/null
+++ b/ofed_scripts/iscsi_scsi_makefile
@@ -0,0 +1,4 @@
+# Makefile based on the kernel's drivers/scsi/Makefile
+# to build open-iscsi.
+obj-$(CONFIG_SCSI_ISCSI_ATTRS) += scsi_transport_iscsi.o
+obj-$(CONFIG_ISCSI_TCP)        += libiscsi.o   iscsi_tcp.o
diff --git a/ofed_scripts/ofed_checkout.sh b/ofed_scripts/ofed_checkout.sh
index 037b391..86fc8b8 100755
--- a/ofed_scripts/ofed_checkout.sh
+++ b/ofed_scripts/ofed_checkout.sh
@@ -43,3 +43,4 @@ ex git update-ref HEAD $1
 ln -snf ofed_scripts/configure
 ln -snf ofed_scripts/Makefile
 ln -snf ofed_scripts/makefile
+(cd drivers/scsi/; ln -snf ../../ofed_scripts/iscsi_scsi_makefile Makefile)
-- 
MST


From mst at dev.mellanox.co.il  Sun Jun 17 14:02:14 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 18 Jun 2007 00:02:14 +0300
Subject: [ofa-general] ~mst/ofed_kernel.git updated to 2.6.22-rc5
Message-ID: <20070617210154.GD27233@mellanox.co.il>

FYI,

git://git.openfabrics.org/~mst/ofed_kernel.git

I've merged in 2.6.22-rc5 which will pull in multiple bug fixes.

I also added local sa patch back in (not sure how
but it went missing).


-- 
Michael S. Tsirkin - Staff Engineer, Mellanox Technologies Ltd.
Eternity is a very long time, especially towards the end.


From pradeeps at linux.vnet.ibm.com  Sun Jun 17 19:36:20 2007
From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana)
Date: Sun, 17 Jun 2007 19:36:20 -0700
Subject: [ofa-general] Re: [PATCH draft, untested] ehca srq emulation
	(for IPoIB CM)
In-Reply-To: <20070616192702.GM2207@mellanox.co.il>
References: <466F36C8.5010507@linux.vnet.ibm.com>
	<20070613163821.GB12277@mellanox.co.il> <adafy4v69ig.fsf@cisco.com>
	<20070613174930.GE12277@mellanox.co.il>
	<46716F3D.7050206@ichips.intel.com> <ada1wge4h4l.fsf@cisco.com>
	<20070614175030.GB29561@mellanox.co.il>
	<4671C541.4040503@linux.vnet.ibm.com>
	<20070615051846.GG2207@mellanox.co.il>
	<4672C0DC.8060308@linux.vnet.ibm.com>
	<20070616192702.GM2207@mellanox.co.il>
Message-ID: <4675EFA4.5050209@linux.vnet.ibm.com>

Michael S. Tsirkin wrote:
>> We need to make some decisions
> 
> Earlier, Roland suggested:
>> However it may be a good approach to put an abstraction layer in IPoIB
>> so that the CM code can use an SRQ-like interface to both HCAs that
>> support SRQ and HCAs that don't.

This approach would be a regression; no guarantees that anything else
would be better.

As Bernard King-Smith said changing to a different approach (mid-stream)
is not the right thing to do.

> 
> And I think this might be a good approach, too - and maybe
> this layer could be general enough to be reusable in other
> ULPs later.
> 

Pradeep


From shani.moideen at wipro.com  Sun Jun 17 20:16:41 2007
From: shani.moideen at wipro.com (Shani Moideen)
Date: Mon, 18 Jun 2007 08:46:41 +0530
Subject: [ofa-general] [KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with
	clear_page(<addr>) in drivers/infiniband/hw/mthca/mthca_allocator.c
Message-ID: <1182136601.9020.7.camel@shani-win>


Replacing memset(<addr>,0,PAGE_SIZE) with clear_page(<addr>) 
in drivers/infiniband/hw/mthca/mthca_allocator.c

Signed-off-by: Shani Moideen <shani.moideen at wipro.com>
----

diff --git a/drivers/infiniband/hw/mthca/mthca_allocator.c b/drivers/infiniband/hw/mthca/mthca_allocator.c
index f930e55..a763067 100644
--- a/drivers/infiniband/hw/mthca/mthca_allocator.c
+++ b/drivers/infiniband/hw/mthca/mthca_allocator.c
@@ -255,7 +255,7 @@ int mthca_buf_alloc(struct mthca_dev *dev, int size, int max_direct,
 			dma_list[i] = t;
 			pci_unmap_addr_set(&buf->page_list[i], mapping, t);
 
-			memset(buf->page_list[i].buf, 0, PAGE_SIZE);
+			clear_page(buf->page_list[i].buf);
 		}
 	}


-- 
Shani 


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
 
www.wipro.com


From shani.moideen at wipro.com  Sun Jun 17 20:23:00 2007
From: shani.moideen at wipro.com (Shani Moideen)
Date: Mon, 18 Jun 2007 08:53:00 +0530
Subject: [ofa-general] [KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with
	clear_page(<addr>) in drivers/infiniband/hw/mthca/mthca_eq.c
Message-ID: <1182136980.9020.13.camel@shani-win>


Replacing memset(<addr>,0,PAGE_SIZE) with clear_page(<addr>) 
in drivers/infiniband/hw/mthca/mthca_eq.c

Signed-off-by: Shani Moideen <shani.moideen at wipro.com>
----

diff --git a/drivers/infiniband/hw/mthca/mthca_eq.c b/drivers/infiniband/hw/mthca/mthca_eq.c
index 8ec9fa1..8592b26 100644
--- a/drivers/infiniband/hw/mthca/mthca_eq.c
+++ b/drivers/infiniband/hw/mthca/mthca_eq.c
@@ -522,7 +522,7 @@ static int mthca_create_eq(struct mthca_dev *dev,
 		dma_list[i] = t;
 		pci_unmap_addr_set(&eq->page_list[i], mapping, t);
 
-		memset(eq->page_list[i].buf, 0, PAGE_SIZE);
+		clear_page(eq->page_list[i].buf);
 	}
 
 	for (i = 0; i < eq->nent; ++i)


-- 
Shani 


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
 
www.wipro.com


From shani.moideen at wipro.com  Sun Jun 17 20:33:56 2007
From: shani.moideen at wipro.com (Shani Moideen)
Date: Mon, 18 Jun 2007 09:03:56 +0530
Subject: [ofa-general] [KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with
	clear_page(<addr>) in drivers/infiniband/hw/ipath/ipath_driver.c
Message-ID: <1182137636.9020.17.camel@shani-win>


Replacing memset(<addr>,0,PAGE_SIZE) with clear_page(<addr>) 
in drivers/infiniband/hw/ipath/ipath_driver.c

Signed-off-by: Shani Moideen <shani.moideen at wipro.com>
----

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index e3a2232..417e3ca 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -1509,7 +1509,7 @@ int ipath_create_rcvhdrq(struct ipath_devdata *dd,
 
 	/* clear for security and sanity on each use */
 	memset(pd->port_rcvhdrq, 0, pd->port_rcvhdrq_size);
-	memset(pd->port_rcvhdrtail_kvaddr, 0, PAGE_SIZE);
+	clear_page(pd->port_rcvhdrtail_kvaddr);
 
 	/*
 	 * tell chip each time we init it, even if we are re-using previous

-- 
Shani 


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
 
www.wipro.com


From vlad at dev.mellanox.co.il  Sun Jun 17 23:07:29 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Mon, 18 Jun 2007 09:07:29 +0300
Subject: [ofa-general] Re: [ewg] [GIT PULL ofed_1_2] iw_cxgb3 fixes for bugs
	663/664
In-Reply-To: <46755A3D.1030300@opengridcomputing.com>
References: <46755A3D.1030300@opengridcomputing.com>
Message-ID: <46762121.70404@dev.mellanox.co.il>

Steve Wise wrote:
> Vlad,
> 
> Please pull in these fixes for bugs 663/664 from
> 
> git://git.openfabrics.org/~swise/ofed_1_2 ofed_1_2
> 
> Thanks,
> 
> Steve.
> 

Done,

Regards,
Vladimir


From erezz at voltaire.com  Sun Jun 17 23:50:40 2007
From: erezz at voltaire.com (Erez Zilber)
Date: Mon, 18 Jun 2007 09:50:40 +0300
Subject: [ofa-general] Re: [PATCH] IB/iser: make all fixes patches apply on
 full kernel source
In-Reply-To: <20070617191056.GC27233@mellanox.co.il>
References: <20070617191056.GC27233@mellanox.co.il>
Message-ID: <46762B40.1010500@voltaire.com>

Michael S. Tsirkin wrote:

> Since we'll have RC6, I'd like to add the following cosmetic change:
> move iscsi_scsi_makefile from kernel_patches/fixes to ofed_scripts,
> and use a softlink to put in in place.
>
> The solves the following problem: if I do "git clone" on ofed
> tree without -n, and try to apply the fixes patchset,
> I get a conflict on iscsi_scsi_makefile.patch simply because
> the makefile that this patch attempts to create is part
> of upstream kernel already.
>
> I think this is 0-risk and carries real benefit for developers
> who'll need to support OFED 1.2.
>
> Erez, do you agree? If yes, I'll ask Tziporet to approve, too.
>
I'm ok with it.


Thanks,

Erez


From mst at dev.mellanox.co.il  Mon Jun 18 01:32:40 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 18 Jun 2007 11:32:40 +0300
Subject: [ofa-general] [PATCH for-2.6.22] ipoib/cm: initialize RX before
	moving QP to RTR
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691C0285B8C1@orsmsx418.amr.corp.intel.com>
References: <4672BE23.3050809@ichips.intel.com>
	<BAE9DCEF64577A439B3A37F36F9B691C0285B8C1@orsmsx418.amr.corp.intel.com>
Message-ID: <20070618083240.GK14335@mellanox.co.il>

Fix a crasher bug in IPoIB CM: once QP is in RTR, an RX completion (and even an
asynchronous error) might be observed on this QP, so we have to initialize all
RX fields beforehand.

This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=662>

Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>

---

> Quoting Woodruff, Robert J <robert.j.woodruff at intel.com>:
> Subject: RE: [ofa-general] crash in ipoib
> 
> Sean wrote,
> >> And here's a version with error handling fixed.
> >> Sean, does this solve your crash?
> 
> >We've been running this patch since yesterday and haven't seen any 
> >crashes.  We'll continue testing this over the week-end.
> 
> >- Sean
> 
> This looks like it fixed the panic. 
> 
> Should we try to put out a new RC with this latest ipoib fix ?
> I really think we need it in the release. If we could get another RC out
> today,
> that would only delay the release by a couple of more days and we could
> release on next Friday rather than wed. and still give people a week to 
> test the final RC.
> 
> woody

OK, the following patch has been added to OFED 1.2.
Roland, please consider this bugfix for 2.6.22.

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 076a0bb..c64249f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -309,6 +309,11 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 		return -ENOMEM;
 	p->dev = dev;
 	p->id = cm_id;
+	cm_id->context = p;
+	p->state = IPOIB_CM_RX_LIVE;
+	p->jiffies = jiffies;
+	INIT_LIST_HEAD(&p->list);
+
 	p->qp = ipoib_cm_create_rx_qp(dev, p);
 	if (IS_ERR(p->qp)) {
 		ret = PTR_ERR(p->qp);
@@ -320,24 +325,24 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 	if (ret)
 		goto err_modify;
 
+	spin_lock_irq(&priv->lock);
+	queue_delayed_work(ipoib_workqueue,
+			   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
+	/* Add this entry to passive ids list head, but do not re-add it
+	 * if IB_EVENT_QP_LAST_WQE_REACHED has moved it to flush list. */
+	p->jiffies = jiffies;
+	if (p->state == IPOIB_CM_RX_LIVE)
+		list_move(&p->list, &priv->cm.passive_ids);
+	spin_unlock_irq(&priv->lock);
+
 	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
 	if (ret) {
 		ipoib_warn(priv, "failed to send REP: %d\n", ret);
-		goto err_rep;
+		if (ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE))
+			ipoib_warn(priv, "unable to move qp to error state\n");
 	}
-
-	cm_id->context = p;
-	p->jiffies = jiffies;
-	p->state = IPOIB_CM_RX_LIVE;
-	spin_lock_irq(&priv->lock);
-	if (list_empty(&priv->cm.passive_ids))
-		queue_delayed_work(ipoib_workqueue,
-				   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
-	list_add(&p->list, &priv->cm.passive_ids);
-	spin_unlock_irq(&priv->lock);
 	return 0;
 
-err_rep:
 err_modify:
 	ib_destroy_qp(p->qp);
 err_qp:

-- 
MST


From vlad at lists.openfabrics.org  Mon Jun 18 02:45:10 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Mon, 18 Jun 2007 02:45:10 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070618-0200 daily build status
Message-ID: <20070618094510.B9DFAE6082B@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.12
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.12
Passed on powerpc with linux-2.6.19
Passed on ia64 with linux-2.6.18
Passed on x86_64 with linux-2.6.20
Passed on ppc64 with linux-2.6.12
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.15
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.15
Passed on ia64 with linux-2.6.19
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.14
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.14
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on x86_64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.16
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.15
Passed on x86_64 with linux-2.6.17
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.16
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.14
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.18-8.el5
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From mst at dev.mellanox.co.il  Mon Jun 18 04:48:43 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 18 Jun 2007 14:48:43 +0300
Subject: [ofa-general] hang at module removal with local sa patches applied
Message-ID: <20070618114843.GA25428@mellanox.co.il>

Hi!
I tried applying the latest local sa patches to 2.6.2-rc5, and applied the
patch at the bottom to disable sa cache by default. After this, "openib stop"
hangs forever.

You can see the exact patches I applied here:
http://git.openfabrics.org/git/?p=~mst/ofed_kernel.git;a=tree;f=kernel_patches/attic;hb=ofed_kernel

Here's sysrq trace of threads that look IB-related.


[14897.168101] mthca_catas   S 0000000000000001     0  8330      2 (L-TLB)
[14897.168104]  ffff8100764bded0 0000000000000046 0000000000000000 0000000000000000
[14897.168107]  ffff81007ebea950 0000000000000006 ffff81007ebea920 ffff81007ff1f4a0
[14897.168111]  00000d83a434d314 00000000000004b6 ffff81007ebeaad0 0000000000000046
[14897.168113] Call Trace:
[14897.168116]  [<ffffffff80241f94>] worker_thread+0x0/0xe7
[14897.168119]  [<ffffffff80242036>] worker_thread+0xa2/0xe7
[14897.168122]  [<ffffffff802450af>] autoremove_wake_function+0x0/0x38
[14897.168125]  [<ffffffff80244f8b>] kthread+0x49/0x76
[14897.168127]  [<ffffffff8020aaa8>] child_rip+0xa/0x12
[14897.168130]  [<ffffffff80244f42>] kthread+0x0/0x76
[14897.168133]  [<ffffffff8020aa9e>] child_rip+0x0/0x12
[14897.168134]
[14897.168136] ib_mad1       S 0000000000000003     0  8333      2 (L-TLB)
[14897.168139]  ffff81007ce53ed0 0000000000000046 0000000000000000 ffff81007fcdc400
[14897.168142]  000000007ebf4990 000000000000000a ffff81007ebf4960 ffff81007fe0b520
[14897.168146]  00000d853dc5974d 00000000000012c8 ffff81007ebf4b10 ffff81007fe0b520
[14897.168149] Call Trace:
[14897.168152]  [<ffffffff80241f94>] worker_thread+0x0/0xe7
[14897.168154]  [<ffffffff80242036>] worker_thread+0xa2/0xe7
[14897.168157]  [<ffffffff802450af>] autoremove_wake_function+0x0/0x38
[14897.168160]  [<ffffffff80244f8b>] kthread+0x49/0x76
[14897.168162]  [<ffffffff8020aaa8>] child_rip+0xa/0x12
[14897.168165]  [<ffffffff80244f42>] kthread+0x0/0x76
[14897.168168]  [<ffffffff8020aa9e>] child_rip+0x0/0x12
[14897.168169]
[14897.168171] ib_mad2       S 0000000000000000     0  8334      2 (L-TLB)
[14897.168174]  ffff81007ce51ed0 0000000000000046 0000000000000000 ffff81007edcdc00
[14897.168177]  000000007e86f710 000000000000000a ffff81007e86f6e0 ffffffff8070d4c0
[14897.168181]  00000d853dc7ba8f 00000000000012aa ffff81007e86f890 ffffffff8070d4c0
[14897.168184] Call Trace:
[14897.168187]  [<ffffffff80241f94>] worker_thread+0x0/0xe7
[14897.168189]  [<ffffffff80242036>] worker_thread+0xa2/0xe7
[14897.168192]  [<ffffffff802450af>] autoremove_wake_function+0x0/0x38
[14897.168195]  [<ffffffff80244f8b>] kthread+0x49/0x76
[14897.168198]  [<ffffffff8020aaa8>] child_rip+0xa/0x12
[14897.168201]  [<ffffffff80244f42>] kthread+0x0/0x76
[14897.168203]  [<ffffffff8020aa9e>] child_rip+0x0/0x12
[14897.168205]
[14897.168206] ib_mcast      S 0000000000000000     0  8359      2 (L-TLB)
[14897.168210]  ffff81007d3a3ed0 0000000000000046 0000000000000000 0000000000000000
[14897.168213]  0000ffff1b4012ff 000000000000000a ffff81007e8830c0 ffffffff8070d4c0
[14897.168216]  00000d84fe84fafa 0000000000001105 ffff81007e883270 0000000000010000
[14897.168219] Call Trace:
[14897.168222]  [<ffffffff80241f94>] worker_thread+0x0/0xe7
[14897.168225]  [<ffffffff80242036>] worker_thread+0xa2/0xe7
[14897.168228]  [<ffffffff802450af>] autoremove_wake_function+0x0/0x38
[14897.168230]  [<ffffffff80244f8b>] kthread+0x49/0x76
[14897.168233]  [<ffffffff8020aaa8>] child_rip+0xa/0x12
[14897.168236]  [<ffffffff80244f42>] kthread+0x0/0x76
[14897.168239]  [<ffffffff8020aa9e>] child_rip+0x0/0x12
[14897.168240]
[14897.168242] ib_inform     S ffff81007e4d1740     0  8360      2 (L-TLB)
[14897.168245]  ffff81007d0d1ed0 0000000000000046 0000000024000000 0000000000000000
[14897.168248]  ffff810076c60130 0000000000000006 ffff810076c60100 ffff81007d1c7560
[14897.168252]  00000d83ee2e6167 000000000000035a ffff810076c602b0 0000000000000046
[14897.168254] Call Trace:
[14897.168257]  [<ffffffff80241f94>] worker_thread+0x0/0xe7
[14897.168260]  [<ffffffff80242036>] worker_thread+0xa2/0xe7
[14897.168263]  [<ffffffff802450af>] autoremove_wake_function+0x0/0x38
[14897.168266]  [<ffffffff80244f8b>] kthread+0x49/0x76
[14897.168268]  [<ffffffff8020aaa8>] child_rip+0xa/0x12
[14897.168271]  [<ffffffff80244f42>] kthread+0x0/0x76
[14897.168274]  [<ffffffff8020aa9e>] child_rip+0x0/0x12
[14897.168275]
[14897.168277] local_sa      D 0000000000000001     0  8361      2 (L-TLB)
[14897.168280]  ffff81007d0d3c10 0000000000000046 0000000000000000 800000ce00000000
[14897.168283]  84000b0000000000 000000000000000a ffff81007e8f3420 ffff81007ff1f4a0
[14897.168287]  00000d8431895ed4 0000000000000d33 ffff81007e8f35d0 800000ce00000000
[14897.168290] Call Trace:
[14897.168294]  [<ffffffff80582e4a>] __mutex_lock_slowpath+0x69/0xaa
[14897.168303]  [<ffffffff8806369a>] :ib_sa:port_work_handler+0x0/0x34
[14897.168306]  [<ffffffff80582c87>] mutex_lock+0xe/0x10
[14897.168311]  [<ffffffff880636b6>] :ib_sa:port_work_handler+0x1c/0x34
[14897.168314]  [<ffffffff80241669>] run_workqueue+0x85/0x10f
[14897.168317]  [<ffffffff80241851>] flush_cpu_workqueue+0x28/0x7b
[14897.168320]  [<ffffffff80241ad0>] flush_workqueue+0x43/0x5d
[14897.168326]  [<ffffffff88063250>] :ib_sa:cleanup_port+0x25/0x7b
[14897.168331]  [<ffffffff88063307>] :ib_sa:process_updates+0x61/0x336
[14897.168335]  [<ffffffff8058212b>] thread_return+0x0/0xea
[14897.168341]  [<ffffffff88063656>] :ib_sa:add_update+0x7a/0x83
[14897.168347]  [<ffffffff8806369a>] :ib_sa:port_work_handler+0x0/0x34
[14897.168352]  [<ffffffff88063695>] :ib_sa:refresh_port_db+0x36/0x3b
[14897.168358]  [<ffffffff880636be>] :ib_sa:port_work_handler+0x24/0x34
[14897.168361]  [<ffffffff80241669>] run_workqueue+0x85/0x10f
[14897.168363]  [<ffffffff80241f94>] worker_thread+0x0/0xe7
[14897.168366]  [<ffffffff80242070>] worker_thread+0xdc/0xe7
[14897.168368]  [<ffffffff802450af>] autoremove_wake_function+0x0/0x38
[14897.168371]  [<ffffffff80244f8b>] kthread+0x49/0x76
[14897.168374]  [<ffffffff8020aaa8>] child_rip+0xa/0x12
[14897.168377]  [<ffffffff80244f42>] kthread+0x0/0x76
[14897.168379]  [<ffffffff8020aa9e>] child_rip+0x0/0x12
[14897.168381]
[14897.168382] openibd       S 0000000000000002     0  8598   6178 (NOTLB)
[14897.168386]  ffff81007fadbeb8 0000000000000082 0000000000000000 ffff81007d4b2678
[14897.168389]  00000000005a5640 0000000000000001 ffff81007f7e60c0 ffff81007ff574e0
[14897.168392]  00000d84e88f6e97 0000000000007060 ffff81007f7e6270 ffff81007c309600
[14897.168396] Call Trace:
[14897.168399]  [<ffffffff80235807>] do_wait+0xa0a/0xb1f
[14897.168402]  [<ffffffff8022d6ce>] default_wake_function+0x0/0xf
[14897.168405]  [<ffffffff80235944>] sys_wait4+0x28/0x2a
[14897.168408]  [<ffffffff80209c8e>] system_call+0x7e/0x83
[14897.168410]
[14897.168411] modprobe      D 0000000000000000     0  8640   8598 (NOTLB)
[14897.168415]  ffff81007c90bd78 0000000000000086 0000000000000000 ffffffff807186a0
[14897.168418]  ffff81007c90be68 0000000000000007 ffff81007730edc0 ffffffff8070d4c0
[14897.168422]  00000d852f6be2aa 0000000000000b50 ffff81007730ef70 0000000000000001
[14897.168424] Call Trace:
[14897.168428]  [<ffffffff805822f9>] wait_for_completion+0x82/0xc1
[14897.168431]  [<ffffffff8022d6ce>] default_wake_function+0x0/0xf
[14897.168434]  [<ffffffff80241898>] flush_cpu_workqueue+0x6f/0x7b
[14897.168436]  [<ffffffff802419d6>] wq_barrier_func+0x0/0xf
[14897.168439]  [<ffffffff80241ad0>] flush_workqueue+0x43/0x5d
[14897.168445]  [<ffffffff8806388b>] :ib_sa:sa_db_remove_dev+0x3d/0x9c
[14897.168448]  [<ffffffff8022d6ce>] default_wake_function+0x0/0xf
[14897.168458]  [<ffffffff8801069e>] :ib_core:ib_unregister_client+0x37/0xf0
[14897.168465]  [<ffffffff880637f4>] :ib_sa:sa_db_cleanup+0x10/0x2a
[14897.168470]  [<ffffffff8806459d>] :ib_sa:ib_sa_cleanup+0x9/0x2d
[14897.168474]  [<ffffffff8025110e>] sys_delete_module+0x1b5/0x1e6
[14897.168477]  [<ffffffff80209c8e>] system_call+0x7e/0x83
[14897.168479]


---

Disable SA cache by default.

Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>

---

Index: connectx/drivers/infiniband/core/local_sa.c
===================================================================
--- connectx.orig/drivers/infiniband/core/local_sa.c	2007-05-31 09:32:50.000000000 +0300
+++ connectx/drivers/infiniband/core/local_sa.c	2007-05-31 09:33:55.000000000 +0300
@@ -55,7 +55,7 @@ enum {
 };
 
 static int set_paths_per_dest(const char *val, struct kernel_param *kp);
-static unsigned long paths_per_dest = SA_DB_MAX_PATHS_PER_DEST;
+static unsigned long paths_per_dest = 0;
 module_param_call(paths_per_dest, set_paths_per_dest, param_get_ulong,
 		  &paths_per_dest, 0644);
 MODULE_PARM_DESC(paths_per_dest, "Maximum number of paths to retrieve "

-- 
MST


From support at qlogic.com  Mon Jun 18 05:51:05 2007
From: support at qlogic.com (QLogic Support)
Date: Mon, 18 Jun 2007 05:51:05 -0700 (PDT)
Subject: [ofa-general] Re: [KJ PATCH] Replacing memset(<addr>, 0,
	PAGE_SIZE) with	clear_page(<addr>) in
	drivers/infiniband/hw/ipath/ipath_driver.c [REF:7963312062]
In-Reply-To: <1182137636.9020.17.camel@shani-win>
Message-ID: <4658214.1182171064319.JavaMail.support@qlogic.com>

Regards,
 
Steve Newberger
QLogic Corporation
Support at QLogic.com

Please visit our web @
http://support.qlogic.com/  


---- Original Message ----
From: shani.moideen at wipro.com
Sent: 17-Jun-2007 22:33:56
To: support at pathscale.com
Cc: openib-general at openib.org; kernel-janitors at lists.osdl.org
Subject: [KJ PATCH] Replacing memset(&lt;addr&gt;,0,PAGE_SIZE) with	clear_page(&lt;addr&gt;) in drivers/infiniband/hw/ipath/ipath_driver.c


Replacing memset(<addr>,0,PAGE_SIZE) with clear_page(<addr>) 
in drivers/infiniband/hw/ipath/ipath_driver.c

Signed-off-by: Shani Moideen <shani.moideen at wipro.com>
----

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index e3a2232..417e3ca 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -1509,7 +1509,7 @@ int ipath_create_rcvhdrq(struct ipath_devdata *dd,
 
 	/* clear for security and sanity on each use */
 	memset(pd->port_rcvhdrq, 0, pd->port_rcvhdrq_size);
-	memset(pd->port_rcvhdrtail_kvaddr, 0, PAGE_SIZE);
+	clear_page(pd->port_rcvhdrtail_kvaddr);
 
 	/*
 	 * tell chip each time we init it, even if we are re-using previous

-- 
Shani 


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
 
www.wipro.com

From barr_robertpeters70 at yahoo.com  Mon Jun 18 06:37:44 2007
From: barr_robertpeters70 at yahoo.com (peter robert)
Date: Mon, 18 Jun 2007 06:37:44 -0700 (PDT)
Subject: [ofa-general] You are listed in Late Mr. Mark Patrick's Inheritance
Message-ID: <261622.76583.qm@web63906.mail.re1.yahoo.com>

Attention: Bequest Beneficiary,

We act as solicitors and our services have been retained by Mark Patrick, now late here in after referred to as our client. On behalf of late Mark Patrick, I write to notify you that our late client made you a beneficiary to the bequest sum of One Million, Seven Hundred Thousand British pound sterling in the codicil to his will and last testament.

Mark Patrick died on 8th day February 2005 after a brief illness at the age of 85. Until his death he was consultant to several oil and gas industries. He had a sojourn in the United States and so many other countries before he came to Cairn Energy PLC oil and gas exploration and Production Company based in the United Kingdom. He was a knight in the Church and belonged to several non-governmental and scientific organizations. He was also a great philanthropist and a Paul Harris Fellow of the Rotary Club International.

This bequest is to support your activities, humanitarian services and help to the less privileged. In accordance with our inheritance law you are required to apply for claims through this law firm to a Finance House in United Kingdom, where this fund was deposited. We are perfecting arrangements to complete the transfer of this inheritance to you. You are required to forward the following details of yours; full names, address, occupation, age, phone and fax numbers to Robert Peters (Attorney At Law) through this email address; barrister211 at gmail.com, Tell: +44-701-112-9478, for verification and re-confirmation.

Please acknowledge the receipt of this letter immediately by replying.

Yours in service,

Dynamic Law Firm,
Solicitors & Advocates.
12 Campshill Road,
London United Kingdom.
       
---------------------------------
Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070618/fa4e29b4/attachment.html>

From xma at us.ibm.com  Mon Jun 18 08:41:08 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Mon, 18 Jun 2007 08:41:08 -0700
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <20070617055649.GN2207@mellanox.co.il>
Message-ID: <OF0978E653.DBC32AED-ON872572FE.0055CDD0-882572FE.005B7C48@us.ibm.com>


Hello Michael,

>> IPoIB CM support, with or without SRQ, is less
>> scalable than IPoIB UD mode,

>I believe this is incorrect: datagram mode has AH per destination,
>connected mode has a QP per destination, so with SRQ, I see no
>inherent lack of scalability with connected as compared to datagram mode.

How many nodes of cluster have you tested for IPoIB-CM mode? What kind of
tests? Do you have any data to share?

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070618/1e9959be/attachment.html>

From rdreier at cisco.com  Mon Jun 18 08:49:20 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 18 Jun 2007 08:49:20 -0700
Subject: [ofa-general] [GIT PULL] please pull infiniband.git
Message-ID: <adad4ztfekf.fsf@cisco.com>

Linus, please pull from

    master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This will get a bunch of fixes to the new mlx4 driver.  This pull is
bigger than I would have liked after -rc5, but Mellanox discovered a
problem that required a firmware change and also some driver help to
fix.  Since this is a new driver for 2.6.22, which is for new hardware
that no one has in production yet, I think it's better to merge this
early even if it risks introducing a bug, rather than have a driver
in 2.6.22 that doesn't work at all with current adapter firmware.

Jack Morgenstein (1):
      IB/mlx4: Handle buffer wraparound in __mlx4_ib_cq_clean()

Roland Dreier (6):
      IB/mlx4: Fix handling of wq->tail for send completions
      IB/mlx4: Fix warning in rounding up queue sizes
      IB/mlx4: Handle new FW requirement for send request prefetching
      IB/mlx4: Get rid of max_inline_data calculation
      IB/mlx4: Handle FW command interface rev 3
      IB/mlx4: Make sure inline data segments don't cross a 64 byte boundary

 drivers/infiniband/hw/mlx4/cq.c      |   19 ++--
 drivers/infiniband/hw/mlx4/main.c    |   16 ++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h |    5 +-
 drivers/infiniband/hw/mlx4/qp.c      |  196 ++++++++++++++++++++++------------
 drivers/infiniband/hw/mlx4/user.h    |    9 +-
 drivers/net/mlx4/fw.c                |  110 +++++++++++++-------
 drivers/net/mlx4/fw.h                |   10 +-
 drivers/net/mlx4/main.c              |   14 ++-
 include/linux/mlx4/cmd.h             |    1 +
 include/linux/mlx4/device.h          |   13 ++-
 include/linux/mlx4/qp.h              |    4 +
 11 files changed, 259 insertions(+), 138 deletions(-)


diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index b2a290c..660b27a 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -354,8 +354,8 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq,
 	if (is_send) {
 		wq = &(*cur_qp)->sq;
 		wqe_ctr = be16_to_cpu(cqe->wqe_index);
-		wq->tail += wqe_ctr - (u16) wq->tail;
-		wc->wr_id = wq->wrid[wq->tail & (wq->max - 1)];
+		wq->tail += (u16) (wqe_ctr - (u16) wq->tail);
+		wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
 		++wq->tail;
 	} else if ((*cur_qp)->ibqp.srq) {
 		srq = to_msrq((*cur_qp)->ibqp.srq);
@@ -364,7 +364,7 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq,
 		mlx4_ib_free_srq_wqe(srq, wqe_ctr);
 	} else {
 		wq	  = &(*cur_qp)->rq;
-		wc->wr_id = wq->wrid[wq->tail & (wq->max - 1)];
+		wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
 		++wq->tail;
 	}
 
@@ -478,7 +478,8 @@ void __mlx4_ib_cq_clean(struct mlx4_ib_cq *cq, u32 qpn, struct mlx4_ib_srq *srq)
 {
 	u32 prod_index;
 	int nfreed = 0;
-	struct mlx4_cqe *cqe;
+	struct mlx4_cqe *cqe, *dest;
+	u8 owner_bit;
 
 	/*
 	 * First we need to find the current producer index, so we
@@ -501,9 +502,13 @@ void __mlx4_ib_cq_clean(struct mlx4_ib_cq *cq, u32 qpn, struct mlx4_ib_srq *srq)
 			if (srq && !(cqe->owner_sr_opcode & MLX4_CQE_IS_SEND_MASK))
 				mlx4_ib_free_srq_wqe(srq, be16_to_cpu(cqe->wqe_index));
 			++nfreed;
-		} else if (nfreed)
-			memcpy(get_cqe(cq, (prod_index + nfreed) & cq->ibcq.cqe),
-			       cqe, sizeof *cqe);
+		} else if (nfreed) {
+			dest = get_cqe(cq, (prod_index + nfreed) & cq->ibcq.cqe);
+			owner_bit = dest->owner_sr_opcode & MLX4_CQE_OWNER_MASK;
+			memcpy(dest, cqe, sizeof *cqe);
+			dest->owner_sr_opcode = owner_bit |
+				(dest->owner_sr_opcode & ~MLX4_CQE_OWNER_MASK);
+		}
 	}
 
 	if (nfreed) {
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 402f3a2..1095c82 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -125,7 +125,7 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
 	props->local_ca_ack_delay  = dev->dev->caps.local_ca_ack_delay;
 	props->atomic_cap	   = dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_ATOMIC ?
 		IB_ATOMIC_HCA : IB_ATOMIC_NONE;
-	props->max_pkeys	   = dev->dev->caps.pkey_table_len;
+	props->max_pkeys	   = dev->dev->caps.pkey_table_len[1];
 	props->max_mcast_grp	   = dev->dev->caps.num_mgms + dev->dev->caps.num_amgms;
 	props->max_mcast_qp_attach = dev->dev->caps.num_qp_per_mgm;
 	props->max_total_mcast_qp_attach = props->max_mcast_qp_attach *
@@ -168,9 +168,9 @@ static int mlx4_ib_query_port(struct ib_device *ibdev, u8 port,
 	props->state		= out_mad->data[32] & 0xf;
 	props->phys_state	= out_mad->data[33] >> 4;
 	props->port_cap_flags	= be32_to_cpup((__be32 *) (out_mad->data + 20));
-	props->gid_tbl_len	= to_mdev(ibdev)->dev->caps.gid_table_len;
+	props->gid_tbl_len	= to_mdev(ibdev)->dev->caps.gid_table_len[port];
 	props->max_msg_sz	= 0x80000000;
-	props->pkey_tbl_len	= to_mdev(ibdev)->dev->caps.pkey_table_len;
+	props->pkey_tbl_len	= to_mdev(ibdev)->dev->caps.pkey_table_len[port];
 	props->bad_pkey_cntr	= be16_to_cpup((__be16 *) (out_mad->data + 46));
 	props->qkey_viol_cntr	= be16_to_cpup((__be16 *) (out_mad->data + 48));
 	props->active_width	= out_mad->data[31] & 0xf;
@@ -280,8 +280,14 @@ static int mlx4_SET_PORT(struct mlx4_ib_dev *dev, u8 port, int reset_qkey_viols,
 		return PTR_ERR(mailbox);
 
 	memset(mailbox->buf, 0, 256);
-	*(u8 *) mailbox->buf	     = !!reset_qkey_viols << 6;
-	((__be32 *) mailbox->buf)[2] = cpu_to_be32(cap_mask);
+
+	if (dev->dev->flags & MLX4_FLAG_OLD_PORT_CMDS) {
+		*(u8 *) mailbox->buf	     = !!reset_qkey_viols << 6;
+		((__be32 *) mailbox->buf)[2] = cpu_to_be32(cap_mask);
+	} else {
+		((u8 *) mailbox->buf)[3]     = !!reset_qkey_viols;
+		((__be32 *) mailbox->buf)[1] = cpu_to_be32(cap_mask);
+	}
 
 	err = mlx4_cmd(dev->dev, mailbox->dma, port, 0, MLX4_CMD_SET_PORT,
 		       MLX4_CMD_TIME_CLASS_B);
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 93dac71..24ccadd 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -95,7 +95,8 @@ struct mlx4_ib_mr {
 struct mlx4_ib_wq {
 	u64		       *wrid;
 	spinlock_t		lock;
-	int			max;
+	int			wqe_cnt;
+	int			max_post;
 	int			max_gs;
 	int			offset;
 	int			wqe_shift;
@@ -113,6 +114,7 @@ struct mlx4_ib_qp {
 
 	u32			doorbell_qpn;
 	__be32			sq_signal_bits;
+	int			sq_spare_wqes;
 	struct mlx4_ib_wq	sq;
 
 	struct ib_umem	       *umem;
@@ -123,6 +125,7 @@ struct mlx4_ib_qp {
 	u8			alt_port;
 	u8			atomic_rd_en;
 	u8			resp_depth;
+	u8			sq_no_prefetch;
 	u8			state;
 };
 
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 5c6d054..f8a1a08 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -109,6 +109,20 @@ static void *get_send_wqe(struct mlx4_ib_qp *qp, int n)
 	return get_wqe(qp, qp->sq.offset + (n << qp->sq.wqe_shift));
 }
 
+/*
+ * Stamp a SQ WQE so that it is invalid if prefetched by marking the
+ * first four bytes of every 64 byte chunk with 0xffffffff, except for
+ * the very first chunk of the WQE.
+ */
+static void stamp_send_wqe(struct mlx4_ib_qp *qp, int n)
+{
+	u32 *wqe = get_send_wqe(qp, n);
+	int i;
+
+	for (i = 16; i < 1 << (qp->sq.wqe_shift - 2); i += 16)
+		wqe[i] = 0xffffffff;
+}
+
 static void mlx4_ib_qp_event(struct mlx4_qp *qp, enum mlx4_event type)
 {
 	struct ib_event event;
@@ -178,6 +192,8 @@ static int send_wqe_overhead(enum ib_qp_type type)
 	case IB_QPT_GSI:
 		return sizeof (struct mlx4_wqe_ctrl_seg) +
 			ALIGN(MLX4_IB_UD_HEADER_SIZE +
+			      DIV_ROUND_UP(MLX4_IB_UD_HEADER_SIZE,
+					   MLX4_INLINE_ALIGN) *
 			      sizeof (struct mlx4_wqe_inline_seg),
 			      sizeof (struct mlx4_wqe_data_seg)) +
 			ALIGN(4 +
@@ -201,18 +217,18 @@ static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 		if (cap->max_recv_wr)
 			return -EINVAL;
 
-		qp->rq.max = qp->rq.max_gs = 0;
+		qp->rq.wqe_cnt = qp->rq.max_gs = 0;
 	} else {
 		/* HW requires >= 1 RQ entry with >= 1 gather entry */
 		if (is_user && (!cap->max_recv_wr || !cap->max_recv_sge))
 			return -EINVAL;
 
-		qp->rq.max	 = roundup_pow_of_two(max(1, cap->max_recv_wr));
-		qp->rq.max_gs	 = roundup_pow_of_two(max(1, cap->max_recv_sge));
+		qp->rq.wqe_cnt	 = roundup_pow_of_two(max(1U, cap->max_recv_wr));
+		qp->rq.max_gs	 = roundup_pow_of_two(max(1U, cap->max_recv_sge));
 		qp->rq.wqe_shift = ilog2(qp->rq.max_gs * sizeof (struct mlx4_wqe_data_seg));
 	}
 
-	cap->max_recv_wr  = qp->rq.max;
+	cap->max_recv_wr  = qp->rq.max_post = qp->rq.wqe_cnt;
 	cap->max_recv_sge = qp->rq.max_gs;
 
 	return 0;
@@ -236,8 +252,6 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 	    cap->max_send_sge + 2 > dev->dev->caps.max_sq_sg)
 		return -EINVAL;
 
-	qp->sq.max = cap->max_send_wr ? roundup_pow_of_two(cap->max_send_wr) : 1;
-
 	qp->sq.wqe_shift = ilog2(roundup_pow_of_two(max(cap->max_send_sge *
 							sizeof (struct mlx4_wqe_data_seg),
 							cap->max_inline_data +
@@ -246,20 +260,27 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 	qp->sq.max_gs    = ((1 << qp->sq.wqe_shift) - send_wqe_overhead(type)) /
 		sizeof (struct mlx4_wqe_data_seg);
 
-	qp->buf_size = (qp->rq.max << qp->rq.wqe_shift) +
-		(qp->sq.max << qp->sq.wqe_shift);
+	/*
+	 * We need to leave 2 KB + 1 WQE of headroom in the SQ to
+	 * allow HW to prefetch.
+	 */
+	qp->sq_spare_wqes = (2048 >> qp->sq.wqe_shift) + 1;
+	qp->sq.wqe_cnt = roundup_pow_of_two(cap->max_send_wr + qp->sq_spare_wqes);
+
+	qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) +
+		(qp->sq.wqe_cnt << qp->sq.wqe_shift);
 	if (qp->rq.wqe_shift > qp->sq.wqe_shift) {
 		qp->rq.offset = 0;
-		qp->sq.offset = qp->rq.max << qp->rq.wqe_shift;
+		qp->sq.offset = qp->rq.wqe_cnt << qp->rq.wqe_shift;
 	} else {
-		qp->rq.offset = qp->sq.max << qp->sq.wqe_shift;
+		qp->rq.offset = qp->sq.wqe_cnt << qp->sq.wqe_shift;
 		qp->sq.offset = 0;
 	}
 
-	cap->max_send_wr     = qp->sq.max;
-	cap->max_send_sge    = qp->sq.max_gs;
-	cap->max_inline_data = (1 << qp->sq.wqe_shift) - send_wqe_overhead(type) -
-		sizeof (struct mlx4_wqe_inline_seg);
+	cap->max_send_wr  = qp->sq.max_post = qp->sq.wqe_cnt - qp->sq_spare_wqes;
+	cap->max_send_sge = qp->sq.max_gs;
+	/* We don't support inline sends for kernel QPs (yet) */
+	cap->max_inline_data = 0;
 
 	return 0;
 }
@@ -267,11 +288,11 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 static int set_user_sq_size(struct mlx4_ib_qp *qp,
 			    struct mlx4_ib_create_qp *ucmd)
 {
-	qp->sq.max       = 1 << ucmd->log_sq_bb_count;
+	qp->sq.wqe_cnt   = 1 << ucmd->log_sq_bb_count;
 	qp->sq.wqe_shift = ucmd->log_sq_stride;
 
-	qp->buf_size = (qp->rq.max << qp->rq.wqe_shift) +
-		(qp->sq.max << qp->sq.wqe_shift);
+	qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) +
+		(qp->sq.wqe_cnt << qp->sq.wqe_shift);
 
 	return 0;
 }
@@ -307,6 +328,8 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 			goto err;
 		}
 
+		qp->sq_no_prefetch = ucmd.sq_no_prefetch;
+
 		err = set_user_sq_size(qp, &ucmd);
 		if (err)
 			goto err;
@@ -334,6 +357,8 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 				goto err_mtt;
 		}
 	} else {
+		qp->sq_no_prefetch = 0;
+
 		err = set_kernel_sq_size(dev, &init_attr->cap, init_attr->qp_type, qp);
 		if (err)
 			goto err;
@@ -360,16 +385,13 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 		if (err)
 			goto err_mtt;
 
-		qp->sq.wrid  = kmalloc(qp->sq.max * sizeof (u64), GFP_KERNEL);
-		qp->rq.wrid  = kmalloc(qp->rq.max * sizeof (u64), GFP_KERNEL);
+		qp->sq.wrid  = kmalloc(qp->sq.wqe_cnt * sizeof (u64), GFP_KERNEL);
+		qp->rq.wrid  = kmalloc(qp->rq.wqe_cnt * sizeof (u64), GFP_KERNEL);
 
 		if (!qp->sq.wrid || !qp->rq.wrid) {
 			err = -ENOMEM;
 			goto err_wrid;
 		}
-
-		/* We don't support inline sends for kernel QPs (yet) */
-		init_attr->cap.max_inline_data = 0;
 	}
 
 	err = mlx4_qp_alloc(dev->dev, sqpn, &qp->mqp);
@@ -583,24 +605,6 @@ int mlx4_ib_destroy_qp(struct ib_qp *qp)
 	return 0;
 }
 
-static void init_port(struct mlx4_ib_dev *dev, int port)
-{
-	struct mlx4_init_port_param param;
-	int err;
-
-	memset(&param, 0, sizeof param);
-
-	param.port_width_cap = dev->dev->caps.port_width_cap;
-	param.vl_cap	     = dev->dev->caps.vl_cap;
-	param.mtu	     = ib_mtu_enum_to_int(dev->dev->caps.mtu_cap);
-	param.max_gid	     = dev->dev->caps.gid_table_len;
-	param.max_pkey	     = dev->dev->caps.pkey_table_len;
-
-	err = mlx4_INIT_PORT(dev->dev, &param, port);
-	if (err)
-		printk(KERN_WARNING "INIT_PORT failed, return code %d.\n", err);
-}
-
 static int to_mlx4_st(enum ib_qp_type type)
 {
 	switch (type) {
@@ -674,9 +678,9 @@ static int mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah,
 	path->counter_index = 0xff;
 
 	if (ah->ah_flags & IB_AH_GRH) {
-		if (ah->grh.sgid_index >= dev->dev->caps.gid_table_len) {
+		if (ah->grh.sgid_index >= dev->dev->caps.gid_table_len[port]) {
 			printk(KERN_ERR "sgid_index (%u) too large. max is %d\n",
-			       ah->grh.sgid_index, dev->dev->caps.gid_table_len - 1);
+			       ah->grh.sgid_index, dev->dev->caps.gid_table_len[port] - 1);
 			return -1;
 		}
 
@@ -743,14 +747,17 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 		context->mtu_msgmax = (attr->path_mtu << 5) | 31;
 	}
 
-	if (qp->rq.max)
-		context->rq_size_stride = ilog2(qp->rq.max) << 3;
+	if (qp->rq.wqe_cnt)
+		context->rq_size_stride = ilog2(qp->rq.wqe_cnt) << 3;
 	context->rq_size_stride |= qp->rq.wqe_shift - 4;
 
-	if (qp->sq.max)
-		context->sq_size_stride = ilog2(qp->sq.max) << 3;
+	if (qp->sq.wqe_cnt)
+		context->sq_size_stride = ilog2(qp->sq.wqe_cnt) << 3;
 	context->sq_size_stride |= qp->sq.wqe_shift - 4;
 
+	if (cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT)
+		context->sq_size_stride |= !!qp->sq_no_prefetch << 7;
+
 	if (qp->ibqp.uobject)
 		context->usr_page = cpu_to_be32(to_mucontext(ibqp->uobject->context)->uar.index);
 	else
@@ -789,13 +796,14 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 	}
 
 	if (attr_mask & IB_QP_ALT_PATH) {
-		if (attr->alt_pkey_index >= dev->dev->caps.pkey_table_len)
-			return -EINVAL;
-
 		if (attr->alt_port_num == 0 ||
 		    attr->alt_port_num > dev->dev->caps.num_ports)
 			return -EINVAL;
 
+		if (attr->alt_pkey_index >=
+		    dev->dev->caps.pkey_table_len[attr->alt_port_num])
+			return -EINVAL;
+
 		if (mlx4_set_path(dev, &attr->alt_ah_attr, &context->alt_path,
 				  attr->alt_port_num))
 			return -EINVAL;
@@ -884,16 +892,19 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 
 	/*
 	 * Before passing a kernel QP to the HW, make sure that the
-	 * ownership bits of the send queue are set so that the
-	 * hardware doesn't start processing stale work requests.
+	 * ownership bits of the send queue are set and the SQ
+	 * headroom is stamped so that the hardware doesn't start
+	 * processing stale work requests.
 	 */
 	if (!ibqp->uobject && cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT) {
 		struct mlx4_wqe_ctrl_seg *ctrl;
 		int i;
 
-		for (i = 0; i < qp->sq.max; ++i) {
+		for (i = 0; i < qp->sq.wqe_cnt; ++i) {
 			ctrl = get_send_wqe(qp, i);
 			ctrl->owner_opcode = cpu_to_be32(1 << 31);
+
+			stamp_send_wqe(qp, i);
 		}
 	}
 
@@ -923,7 +934,9 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 	 */
 	if (is_qp0(dev, qp)) {
 		if (cur_state != IB_QPS_RTR && new_state == IB_QPS_RTR)
-			init_port(dev, qp->port);
+			if (mlx4_INIT_PORT(dev->dev, qp->port))
+				printk(KERN_WARNING "INIT_PORT failed for port %d\n",
+				       qp->port);
 
 		if (cur_state != IB_QPS_RESET && cur_state != IB_QPS_ERR &&
 		    (new_state == IB_QPS_RESET || new_state == IB_QPS_ERR))
@@ -986,16 +999,17 @@ int mlx4_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 	if (!ib_modify_qp_is_ok(cur_state, new_state, ibqp->qp_type, attr_mask))
 		goto out;
 
-	if ((attr_mask & IB_QP_PKEY_INDEX) &&
-	     attr->pkey_index >= dev->dev->caps.pkey_table_len) {
-		goto out;
-	}
-
 	if ((attr_mask & IB_QP_PORT) &&
 	    (attr->port_num == 0 || attr->port_num > dev->dev->caps.num_ports)) {
 		goto out;
 	}
 
+	if (attr_mask & IB_QP_PKEY_INDEX) {
+		int p = attr_mask & IB_QP_PORT ? attr->port_num : qp->port;
+		if (attr->pkey_index >= dev->dev->caps.pkey_table_len[p])
+			goto out;
+	}
+
 	if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC &&
 	    attr->max_rd_atomic > dev->dev->caps.max_qp_init_rdma) {
 		goto out;
@@ -1037,6 +1051,7 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr,
 	u16 pkey;
 	int send_size;
 	int header_size;
+	int spc;
 	int i;
 
 	send_size = 0;
@@ -1112,10 +1127,43 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr,
 		printk("\n");
 	}
 
-	inl->byte_count = cpu_to_be32(1 << 31 | header_size);
-	memcpy(inl + 1, sqp->header_buf, header_size);
+	/*
+	 * Inline data segments may not cross a 64 byte boundary.  If
+	 * our UD header is bigger than the space available up to the
+	 * next 64 byte boundary in the WQE, use two inline data
+	 * segments to hold the UD header.
+	 */
+	spc = MLX4_INLINE_ALIGN -
+		((unsigned long) (inl + 1) & (MLX4_INLINE_ALIGN - 1));
+	if (header_size <= spc) {
+		inl->byte_count = cpu_to_be32(1 << 31 | header_size);
+		memcpy(inl + 1, sqp->header_buf, header_size);
+		i = 1;
+	} else {
+		inl->byte_count = cpu_to_be32(1 << 31 | spc);
+		memcpy(inl + 1, sqp->header_buf, spc);
 
-	return ALIGN(sizeof (struct mlx4_wqe_inline_seg) + header_size, 16);
+		inl = (void *) (inl + 1) + spc;
+		memcpy(inl + 1, sqp->header_buf + spc, header_size - spc);
+		/*
+		 * Need a barrier here to make sure all the data is
+		 * visible before the byte_count field is set.
+		 * Otherwise the HCA prefetcher could grab the 64-byte
+		 * chunk with this inline segment and get a valid (!=
+		 * 0xffffffff) byte count but stale data, and end up
+		 * processing generating a packet with bad headers.
+		 *
+		 * The first inline segment's byte_count field doesn't
+		 * need a barrier, because it comes after a
+		 * control/MLX segment and therefore is at an offset
+		 * of 16 mod 64.
+		 */
+		wmb();
+		inl->byte_count = cpu_to_be32(1 << 31 | (header_size - spc));
+		i = 2;
+	}
+
+	return ALIGN(i * sizeof (struct mlx4_wqe_inline_seg) + header_size, 16);
 }
 
 static int mlx4_wq_overflow(struct mlx4_ib_wq *wq, int nreq, struct ib_cq *ib_cq)
@@ -1124,7 +1172,7 @@ static int mlx4_wq_overflow(struct mlx4_ib_wq *wq, int nreq, struct ib_cq *ib_cq
 	struct mlx4_ib_cq *cq;
 
 	cur = wq->head - wq->tail;
-	if (likely(cur + nreq < wq->max))
+	if (likely(cur + nreq < wq->max_post))
 		return 0;
 
 	cq = to_mcq(ib_cq);
@@ -1132,7 +1180,7 @@ static int mlx4_wq_overflow(struct mlx4_ib_wq *wq, int nreq, struct ib_cq *ib_cq
 	cur = wq->head - wq->tail;
 	spin_unlock(&cq->lock);
 
-	return cur + nreq >= wq->max;
+	return cur + nreq >= wq->max_post;
 }
 
 int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
@@ -1165,8 +1213,8 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 			goto out;
 		}
 
-		ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.max - 1));
-		qp->sq.wrid[ind & (qp->sq.max - 1)] = wr->wr_id;
+		ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.wqe_cnt - 1));
+		qp->sq.wrid[ind & (qp->sq.wqe_cnt - 1)] = wr->wr_id;
 
 		ctrl->srcrb_flags =
 			(wr->send_flags & IB_SEND_SIGNALED ?
@@ -1301,7 +1349,16 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		}
 
 		ctrl->owner_opcode = mlx4_ib_opcode[wr->opcode] |
-			(ind & qp->sq.max ? cpu_to_be32(1 << 31) : 0);
+			(ind & qp->sq.wqe_cnt ? cpu_to_be32(1 << 31) : 0);
+
+		/*
+		 * We can improve latency by not stamping the last
+		 * send queue WQE until after ringing the doorbell, so
+		 * only stamp here if there are still more WQEs to post.
+		 */
+		if (wr->next)
+			stamp_send_wqe(qp, (ind + qp->sq_spare_wqes) &
+				       (qp->sq.wqe_cnt - 1));
 
 		++ind;
 	}
@@ -1324,6 +1381,9 @@ out:
 		 * and reach the HCA out of order.
 		 */
 		mmiowb();
+
+		stamp_send_wqe(qp, (ind + qp->sq_spare_wqes - 1) &
+			       (qp->sq.wqe_cnt - 1));
 	}
 
 	spin_unlock_irqrestore(&qp->rq.lock, flags);
@@ -1344,7 +1404,7 @@ int mlx4_ib_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
 
 	spin_lock_irqsave(&qp->rq.lock, flags);
 
-	ind = qp->rq.head & (qp->rq.max - 1);
+	ind = qp->rq.head & (qp->rq.wqe_cnt - 1);
 
 	for (nreq = 0; wr; ++nreq, wr = wr->next) {
 		if (mlx4_wq_overflow(&qp->rq, nreq, qp->ibqp.send_cq)) {
@@ -1375,7 +1435,7 @@ int mlx4_ib_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
 
 		qp->rq.wrid[ind] = wr->wr_id;
 
-		ind = (ind + 1) & (qp->rq.max - 1);
+		ind = (ind + 1) & (qp->rq.wqe_cnt - 1);
 	}
 
 out:
diff --git a/drivers/infiniband/hw/mlx4/user.h b/drivers/infiniband/hw/mlx4/user.h
index 88c72d5..e2d11be 100644
--- a/drivers/infiniband/hw/mlx4/user.h
+++ b/drivers/infiniband/hw/mlx4/user.h
@@ -39,7 +39,7 @@
  * Increment this value if any changes that break userspace ABI
  * compatibility are made.
  */
-#define MLX4_IB_UVERBS_ABI_VERSION	2
+#define MLX4_IB_UVERBS_ABI_VERSION	3
 
 /*
  * Make sure that all structs defined in this file remain laid out so
@@ -87,9 +87,10 @@ struct mlx4_ib_create_srq_resp {
 struct mlx4_ib_create_qp {
 	__u64	buf_addr;
 	__u64	db_addr;
-        __u8	log_sq_bb_count;
-        __u8	log_sq_stride;
-        __u8	reserved[6];
+	__u8	log_sq_bb_count;
+	__u8	log_sq_stride;
+	__u8	sq_no_prefetch;
+	__u8	reserved[5];
 };
 
 #endif /* MLX4_IB_USER_H */
diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
index e7ca118..d2b0653 100644
--- a/drivers/net/mlx4/fw.c
+++ b/drivers/net/mlx4/fw.c
@@ -38,7 +38,9 @@
 #include "icm.h"
 
 enum {
-	MLX4_COMMAND_INTERFACE_REV	= 1
+	MLX4_COMMAND_INTERFACE_MIN_REV		= 2,
+	MLX4_COMMAND_INTERFACE_MAX_REV		= 3,
+	MLX4_COMMAND_INTERFACE_NEW_PORT_CMDS	= 3,
 };
 
 extern void __buggy_use_of_MLX4_GET(void);
@@ -107,6 +109,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	u16 size;
 	u16 stat_rate;
 	int err;
+	int i;
 
 #define QUERY_DEV_CAP_OUT_SIZE		       0x100
 #define QUERY_DEV_CAP_MAX_SRQ_SZ_OFFSET		0x10
@@ -176,7 +179,6 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 
 	err = mlx4_cmd_box(dev, 0, mailbox->dma, 0, 0, MLX4_CMD_QUERY_DEV_CAP,
 			   MLX4_CMD_TIME_CLASS_A);
-
 	if (err)
 		goto out;
 
@@ -216,18 +218,10 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev_cap->max_rdma_global = 1 << (field & 0x3f);
 	MLX4_GET(field, outbox, QUERY_DEV_CAP_ACK_DELAY_OFFSET);
 	dev_cap->local_ca_ack_delay = field & 0x1f;
-	MLX4_GET(field, outbox, QUERY_DEV_CAP_MTU_WIDTH_OFFSET);
-	dev_cap->max_mtu	= field >> 4;
-	dev_cap->max_port_width = field & 0xf;
 	MLX4_GET(field, outbox, QUERY_DEV_CAP_VL_PORT_OFFSET);
-	dev_cap->max_vl    = field >> 4;
 	dev_cap->num_ports = field & 0xf;
-	MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GID_OFFSET);
-	dev_cap->max_gids = 1 << (field & 0xf);
 	MLX4_GET(stat_rate, outbox, QUERY_DEV_CAP_RATE_SUPPORT_OFFSET);
 	dev_cap->stat_rate_support = stat_rate;
-	MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_PKEY_OFFSET);
-	dev_cap->max_pkeys = 1 << (field & 0xf);
 	MLX4_GET(dev_cap->flags, outbox, QUERY_DEV_CAP_FLAGS_OFFSET);
 	MLX4_GET(field, outbox, QUERY_DEV_CAP_RSVD_UAR_OFFSET);
 	dev_cap->reserved_uars = field >> 4;
@@ -304,6 +298,42 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	MLX4_GET(dev_cap->max_icm_sz, outbox,
 		 QUERY_DEV_CAP_MAX_ICM_SZ_OFFSET);
 
+	if (dev->flags & MLX4_FLAG_OLD_PORT_CMDS) {
+		for (i = 1; i <= dev_cap->num_ports; ++i) {
+			MLX4_GET(field, outbox, QUERY_DEV_CAP_VL_PORT_OFFSET);
+			dev_cap->max_vl[i]	   = field >> 4;
+			MLX4_GET(field, outbox, QUERY_DEV_CAP_MTU_WIDTH_OFFSET);
+			dev_cap->max_mtu[i]	   = field >> 4;
+			dev_cap->max_port_width[i] = field & 0xf;
+			MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GID_OFFSET);
+			dev_cap->max_gids[i]	   = 1 << (field & 0xf);
+			MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_PKEY_OFFSET);
+			dev_cap->max_pkeys[i]	   = 1 << (field & 0xf);
+		}
+	} else {
+#define QUERY_PORT_MTU_OFFSET			0x01
+#define QUERY_PORT_WIDTH_OFFSET			0x06
+#define QUERY_PORT_MAX_GID_PKEY_OFFSET		0x07
+#define QUERY_PORT_MAX_VL_OFFSET		0x0b
+
+		for (i = 1; i <= dev_cap->num_ports; ++i) {
+			err = mlx4_cmd_box(dev, 0, mailbox->dma, i, 0, MLX4_CMD_QUERY_PORT,
+					   MLX4_CMD_TIME_CLASS_B);
+			if (err)
+				goto out;
+
+			MLX4_GET(field, outbox, QUERY_PORT_MTU_OFFSET);
+			dev_cap->max_mtu[i]	   = field & 0xf;
+			MLX4_GET(field, outbox, QUERY_PORT_WIDTH_OFFSET);
+			dev_cap->max_port_width[i] = field & 0xf;
+			MLX4_GET(field, outbox, QUERY_PORT_MAX_GID_PKEY_OFFSET);
+			dev_cap->max_gids[i]	   = 1 << (field >> 4);
+			dev_cap->max_pkeys[i]	   = 1 << (field & 0xf);
+			MLX4_GET(field, outbox, QUERY_PORT_MAX_VL_OFFSET);
+			dev_cap->max_vl[i]	   = field & 0xf;
+		}
+	}
+
 	if (dev_cap->bmme_flags & 1)
 		mlx4_dbg(dev, "Base MM extensions: yes "
 			 "(flags %d, rsvd L_Key %08x)\n",
@@ -338,8 +368,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	mlx4_dbg(dev, "Max CQEs: %d, max WQEs: %d, max SRQ WQEs: %d\n",
 		 dev_cap->max_cq_sz, dev_cap->max_qp_sz, dev_cap->max_srq_sz);
 	mlx4_dbg(dev, "Local CA ACK delay: %d, max MTU: %d, port width cap: %d\n",
-		 dev_cap->local_ca_ack_delay, 128 << dev_cap->max_mtu,
-		 dev_cap->max_port_width);
+		 dev_cap->local_ca_ack_delay, 128 << dev_cap->max_mtu[1],
+		 dev_cap->max_port_width[1]);
 	mlx4_dbg(dev, "Max SQ desc size: %d, max SQ S/G: %d\n",
 		 dev_cap->max_sq_desc_sz, dev_cap->max_sq_sg);
 	mlx4_dbg(dev, "Max RQ desc size: %d, max RQ S/G: %d\n",
@@ -491,7 +521,8 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev)
 		((fw_ver & 0x0000ffffull) << 16);
 
 	MLX4_GET(cmd_if_rev, outbox, QUERY_FW_CMD_IF_REV_OFFSET);
-	if (cmd_if_rev != MLX4_COMMAND_INTERFACE_REV) {
+	if (cmd_if_rev < MLX4_COMMAND_INTERFACE_MIN_REV ||
+	    cmd_if_rev > MLX4_COMMAND_INTERFACE_MAX_REV) {
 		mlx4_err(dev, "Installed FW has unsupported "
 			 "command interface revision %d.\n",
 			 cmd_if_rev);
@@ -499,12 +530,15 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev)
 			 (int) (dev->caps.fw_ver >> 32),
 			 (int) (dev->caps.fw_ver >> 16) & 0xffff,
 			 (int) dev->caps.fw_ver & 0xffff);
-		mlx4_err(dev, "This driver version supports only revision %d.\n",
-			 MLX4_COMMAND_INTERFACE_REV);
+		mlx4_err(dev, "This driver version supports only revisions %d to %d.\n",
+			 MLX4_COMMAND_INTERFACE_MIN_REV, MLX4_COMMAND_INTERFACE_MAX_REV);
 		err = -ENODEV;
 		goto out;
 	}
 
+	if (cmd_if_rev < MLX4_COMMAND_INTERFACE_NEW_PORT_CMDS)
+		dev->flags |= MLX4_FLAG_OLD_PORT_CMDS;
+
 	MLX4_GET(lg, outbox, QUERY_FW_MAX_CMD_OFFSET);
 	cmd->max_cmds = 1 << lg;
 
@@ -708,13 +742,15 @@ int mlx4_INIT_HCA(struct mlx4_dev *dev, struct mlx4_init_hca_param *param)
 	return err;
 }
 
-int mlx4_INIT_PORT(struct mlx4_dev *dev, struct mlx4_init_port_param *param, int port)
+int mlx4_INIT_PORT(struct mlx4_dev *dev, int port)
 {
 	struct mlx4_cmd_mailbox *mailbox;
 	u32 *inbox;
 	int err;
 	u32 flags;
+	u16 field;
 
+	if (dev->flags & MLX4_FLAG_OLD_PORT_CMDS) {
 #define INIT_PORT_IN_SIZE          256
 #define INIT_PORT_FLAGS_OFFSET     0x00
 #define INIT_PORT_FLAG_SIG         (1 << 18)
@@ -729,32 +765,32 @@ int mlx4_INIT_PORT(struct mlx4_dev *dev, struct mlx4_init_port_param *param, int
 #define INIT_PORT_NODE_GUID_OFFSET 0x18
 #define INIT_PORT_SI_GUID_OFFSET   0x20
 
-	mailbox = mlx4_alloc_cmd_mailbox(dev);
-	if (IS_ERR(mailbox))
-		return PTR_ERR(mailbox);
-	inbox = mailbox->buf;
+		mailbox = mlx4_alloc_cmd_mailbox(dev);
+		if (IS_ERR(mailbox))
+			return PTR_ERR(mailbox);
+		inbox = mailbox->buf;
 
-	memset(inbox, 0, INIT_PORT_IN_SIZE);
+		memset(inbox, 0, INIT_PORT_IN_SIZE);
 
-	flags = 0;
-	flags |= param->set_guid0     ? INIT_PORT_FLAG_G0  : 0;
-	flags |= param->set_node_guid ? INIT_PORT_FLAG_NG  : 0;
-	flags |= param->set_si_guid   ? INIT_PORT_FLAG_SIG : 0;
-	flags |= (param->vl_cap & 0xf) << INIT_PORT_VL_SHIFT;
-	flags |= (param->port_width_cap & 0xf) << INIT_PORT_PORT_WIDTH_SHIFT;
-	MLX4_PUT(inbox, flags,            INIT_PORT_FLAGS_OFFSET);
+		flags = 0;
+		flags |= (dev->caps.vl_cap[port] & 0xf) << INIT_PORT_VL_SHIFT;
+		flags |= (dev->caps.port_width_cap[port] & 0xf) << INIT_PORT_PORT_WIDTH_SHIFT;
+		MLX4_PUT(inbox, flags,		  INIT_PORT_FLAGS_OFFSET);
 
-	MLX4_PUT(inbox, param->mtu,       INIT_PORT_MTU_OFFSET);
-	MLX4_PUT(inbox, param->max_gid,   INIT_PORT_MAX_GID_OFFSET);
-	MLX4_PUT(inbox, param->max_pkey,  INIT_PORT_MAX_PKEY_OFFSET);
-	MLX4_PUT(inbox, param->guid0,     INIT_PORT_GUID0_OFFSET);
-	MLX4_PUT(inbox, param->node_guid, INIT_PORT_NODE_GUID_OFFSET);
-	MLX4_PUT(inbox, param->si_guid,   INIT_PORT_SI_GUID_OFFSET);
+		field = 128 << dev->caps.mtu_cap[port];
+		MLX4_PUT(inbox, field, INIT_PORT_MTU_OFFSET);
+		field = dev->caps.gid_table_len[port];
+		MLX4_PUT(inbox, field, INIT_PORT_MAX_GID_OFFSET);
+		field = dev->caps.pkey_table_len[port];
+		MLX4_PUT(inbox, field, INIT_PORT_MAX_PKEY_OFFSET);
 
-	err = mlx4_cmd(dev, mailbox->dma, port, 0, MLX4_CMD_INIT_PORT,
-		       MLX4_CMD_TIME_CLASS_A);
+		err = mlx4_cmd(dev, mailbox->dma, port, 0, MLX4_CMD_INIT_PORT,
+			       MLX4_CMD_TIME_CLASS_A);
 
-	mlx4_free_cmd_mailbox(dev, mailbox);
+		mlx4_free_cmd_mailbox(dev, mailbox);
+	} else
+		err = mlx4_cmd(dev, 0, port, 0, MLX4_CMD_INIT_PORT,
+			       MLX4_CMD_TIME_CLASS_A);
 
 	return err;
 }
diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h
index 2616fa5..296254a 100644
--- a/drivers/net/mlx4/fw.h
+++ b/drivers/net/mlx4/fw.h
@@ -59,13 +59,13 @@ struct mlx4_dev_cap {
 	int max_responder_per_qp;
 	int max_rdma_global;
 	int local_ca_ack_delay;
-	int max_mtu;
-	int max_port_width;
-	int max_vl;
 	int num_ports;
-	int max_gids;
+	int max_mtu[MLX4_MAX_PORTS + 1];
+	int max_port_width[MLX4_MAX_PORTS + 1];
+	int max_vl[MLX4_MAX_PORTS + 1];
+	int max_gids[MLX4_MAX_PORTS + 1];
+	int max_pkeys[MLX4_MAX_PORTS + 1];
 	u16 stat_rate_support;
-	int max_pkeys;
 	u32 flags;
 	int reserved_uars;
 	int uar_size;
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index d417293..41eafeb 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -88,6 +88,7 @@ static struct mlx4_profile default_profile = {
 static int __devinit mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 {
 	int err;
+	int i;
 
 	err = mlx4_QUERY_DEV_CAP(dev, dev_cap);
 	if (err) {
@@ -117,11 +118,15 @@ static int __devinit mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev
 	}
 
 	dev->caps.num_ports	     = dev_cap->num_ports;
+	for (i = 1; i <= dev->caps.num_ports; ++i) {
+		dev->caps.vl_cap[i]	    = dev_cap->max_vl[i];
+		dev->caps.mtu_cap[i]	    = dev_cap->max_mtu[i];
+		dev->caps.gid_table_len[i]  = dev_cap->max_gids[i];
+		dev->caps.pkey_table_len[i] = dev_cap->max_pkeys[i];
+		dev->caps.port_width_cap[i] = dev_cap->max_port_width[i];
+	}
+
 	dev->caps.num_uars	     = dev_cap->uar_size / PAGE_SIZE;
-	dev->caps.vl_cap	     = dev_cap->max_vl;
-	dev->caps.mtu_cap	     = dev_cap->max_mtu;
-	dev->caps.gid_table_len	     = dev_cap->max_gids;
-	dev->caps.pkey_table_len     = dev_cap->max_pkeys;
 	dev->caps.local_ca_ack_delay = dev_cap->local_ca_ack_delay;
 	dev->caps.bf_reg_size	     = dev_cap->bf_reg_size;
 	dev->caps.bf_regs_per_page   = dev_cap->bf_regs_per_page;
@@ -148,7 +153,6 @@ static int __devinit mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev
 	dev->caps.reserved_mrws	     = dev_cap->reserved_mrws;
 	dev->caps.reserved_uars	     = dev_cap->reserved_uars;
 	dev->caps.reserved_pds	     = dev_cap->reserved_pds;
-	dev->caps.port_width_cap     = dev_cap->max_port_width;
 	dev->caps.mtt_entry_sz	     = MLX4_MTT_ENTRY_PER_SEG * dev_cap->mtt_entry_sz;
 	dev->caps.page_size_cap	     = ~(u32) (dev_cap->min_page_sz - 1);
 	dev->caps.flags		     = dev_cap->flags;
diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h
index 4fb552d..7d1eaa9 100644
--- a/include/linux/mlx4/cmd.h
+++ b/include/linux/mlx4/cmd.h
@@ -54,6 +54,7 @@ enum {
 	MLX4_CMD_INIT_PORT	 = 0x9,
 	MLX4_CMD_CLOSE_PORT	 = 0xa,
 	MLX4_CMD_QUERY_HCA	 = 0xb,
+	MLX4_CMD_QUERY_PORT	 = 0x43,
 	MLX4_CMD_SET_PORT	 = 0xc,
 	MLX4_CMD_ACCESS_DDR	 = 0x2e,
 	MLX4_CMD_MAP_ICM	 = 0xffa,
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 8c5f8fd..b372f59 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -41,6 +41,7 @@
 
 enum {
 	MLX4_FLAG_MSI_X		= 1 << 0,
+	MLX4_FLAG_OLD_PORT_CMDS	= 1 << 1,
 };
 
 enum {
@@ -131,10 +132,10 @@ enum {
 struct mlx4_caps {
 	u64			fw_ver;
 	int			num_ports;
-	int			vl_cap;
-	int			mtu_cap;
-	int			gid_table_len;
-	int			pkey_table_len;
+	int			vl_cap[MLX4_MAX_PORTS + 1];
+	int			mtu_cap[MLX4_MAX_PORTS + 1];
+	int			gid_table_len[MLX4_MAX_PORTS + 1];
+	int			pkey_table_len[MLX4_MAX_PORTS + 1];
 	int			local_ca_ack_delay;
 	int			num_uars;
 	int			bf_reg_size;
@@ -174,7 +175,7 @@ struct mlx4_caps {
 	u32			page_size_cap;
 	u32			flags;
 	u16			stat_rate_support;
-	u8			port_width_cap;
+	u8			port_width_cap[MLX4_MAX_PORTS + 1];
 };
 
 struct mlx4_buf_list {
@@ -322,7 +323,7 @@ int mlx4_srq_alloc(struct mlx4_dev *dev, u32 pdn, struct mlx4_mtt *mtt,
 void mlx4_srq_free(struct mlx4_dev *dev, struct mlx4_srq *srq);
 int mlx4_srq_arm(struct mlx4_dev *dev, struct mlx4_srq *srq, int limit_watermark);
 
-int mlx4_INIT_PORT(struct mlx4_dev *dev, struct mlx4_init_port_param *param, int port);
+int mlx4_INIT_PORT(struct mlx4_dev *dev, int port);
 int mlx4_CLOSE_PORT(struct mlx4_dev *dev, int port);
 
 int mlx4_multicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16]);
diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h
index 9eeb61a..10c57d2 100644
--- a/include/linux/mlx4/qp.h
+++ b/include/linux/mlx4/qp.h
@@ -269,6 +269,10 @@ struct mlx4_wqe_data_seg {
 	__be64			addr;
 };
 
+enum {
+	MLX4_INLINE_ALIGN	= 64,
+};
+
 struct mlx4_wqe_inline_seg {
 	__be32			byte_count;
 };


From swise at opengridcomputing.com  Mon Jun 18 09:35:51 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 18 Jun 2007 11:35:51 -0500
Subject: [ofa-general] conf call today
Message-ID: <4676B467.2040606@opengridcomputing.com>

What is the info for today's ofed call?

Thanks,

Steve.


From mshefty at ichips.intel.com  Mon Jun 18 09:42:55 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 18 Jun 2007 09:42:55 -0700
Subject: [ofa-general] disconnect implementation for rdma cm unconnected
	datagram service
In-Reply-To: <Pine.LNX.4.64.0706171158080.4098@zuben>
References: <Pine.LNX.4.64.0706171158080.4098@zuben>
Message-ID: <4676B60F.3090002@ichips.intel.com>

Or Gerlitz wrote:
> Looking on cm_sidr_rep_handler we see that the cm id state
> is reseted to IB_CM_IDLE, and on the other hand ib_send_cm_dreq
> returns -EINVAL if the id state is not IB_CM_ESTABLISHED. I gueess
> this means that rdma_disconnect on RDMA_PS_UDP would never work?

Correct - there isn't a disconnect for UDP.

> Thinking on remote qp/lid change, the equivalent I see for UDP based apps,
> is that a remote qp/lid change would have been caught by the local stack
> neighbouring system since it sends few unicast arps probes and the re-issues
> a broadcast arp from which the new HW address (qpn / gid --> lid) would be learned.
> 
> What you think would be the correct way to solve that for rdmacm based apps?

I don't know that we can do anything about a QP change.

> is there a way for the RDMA/IB stack level to provide the solution? we were

Once the inform_info patches are in, we might be able to hook into that 
to at least provide notification that the remote address has changed.  I 
don't think there's a LID change notice, though, only GID IN/OUT.  LID 
changes would be difficult to hide from the app anyway, since the app 
must re-create their address vector.

If we ever go as far as adding an rdma_send() call, we might be able to 
hide it better.

> I guess that remote lid change can be emulated as disconnect if the rdmacm
> would listen on IN/OUT traps, but the question if what can we do about the
> remote process qp, eg in the case the process dies and then comes back again etc.

I think the current solution is that the app must detect that they are 
no longer getting responses from the remote side and try to 
re-'connect'.  I need to give this more thought to determine if there's 
anything that we can do here.  (This seems hard without the rdma_cm 
controlling the QP and CQs.)  Do you have any ideas?

- Sean


From jsquyres at cisco.com  Mon Jun 18 09:58:54 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Mon, 18 Jun 2007 12:58:54 -0400
Subject: [ofa-general] conf call today
In-Reply-To: <4676B467.2040606@opengridcomputing.com>
References: <4676B467.2040606@opengridcomputing.com>
Message-ID: <A2BE7400-A95E-4D04-B33E-26B0FB3A85E2@cisco.com>

I sent the info earlier this morning.  But regardless, the call was  
over in about 8 minutes.  I assume Tziporet will send out the minutes  
shortly.

On Jun 18, 2007, at 12:35 PM, Steve Wise wrote:

> What is the info for today's ofed call?
>
> Thanks,
>
> Steve.
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> openib-general


-- 
Jeff Squyres
Cisco Systems


From or.gerlitz at gmail.com  Mon Jun 18 11:54:10 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Mon, 18 Jun 2007 21:54:10 +0300
Subject: [ofa-general] disconnect implementation for rdma cm unconnected
	datagram service
In-Reply-To: <4676B60F.3090002@ichips.intel.com>
References: <Pine.LNX.4.64.0706171158080.4098@zuben>
	<4676B60F.3090002@ichips.intel.com>
Message-ID: <15ddcffd0706181154m26a61ad5u6fe82ff1df19ff4d@mail.gmail.com>

On 6/18/07, Sean Hefty <mshefty at ichips.intel.com> wrote:
>
> Or Gerlitz wrote:
> > Looking on cm_sidr_rep_handler we see that the cm id state
> > is reseted to IB_CM_IDLE, and on the other hand ib_send_cm_dreq
> > returns -EINVAL if the id state is not IB_CM_ESTABLISHED. I gueess
> > this means that rdma_disconnect on RDMA_PS_UDP would never work?
>
> Correct - there isn't a disconnect for UDP.


was that done on purpose? is there (eg implementation or spec related) any
problem to send DREQ through the CM?


> Thinking on remote qp/lid change, the equivalent I see for UDP based apps,
> > is that a remote qp/lid change would have been caught by the local stack
> > neighbouring system since it sends few unicast arps probes and the
> re-issues
> > a broadcast arp from which the new HW address (qpn / gid --> lid) would
> be learned.
> >
> > What you think would be the correct way to solve that for rdmacm based
> apps?
>
> I don't know that we can do anything about a QP change.


Just to emphesize, typical QP change here, is when a remote server process
exits and then spawned again so now the client has to reconnect else all its
packets go nowhere.


> > is there a way for the RDMA/IB stack level to provide the solution? we
> were
>
> Once the inform_info patches are in, we might be able to hook into that
> to at least provide notification that the remote address has changed.  I
> don't think there's a LID change notice, though, only GID IN/OUT.  LID
> changes would be difficult to hide from the app anyway, since the app
> must re-create their address vector.


I did not mean to totally hide from the app (eg to the extent of no need to
re create the address vector), I just wonder if the mechanics to realize
that an unconnected rdmacm id is not "connected" any more can be fully
implemented within the rdmacm.


> If we ever go as far as adding an rdma_send() call, we might be able to
> hide it better.


I don't think we want to go  there.


> > I guess that remote lid change can be emulated as disconnect if the
> rdmacm
> > would listen on IN/OUT traps, but the question if what can we do about
> the
> > remote process qp, eg in the case the process dies and then comes back
> again etc.
>
> I think the current solution is that the app must detect that they are
> no longer getting responses from the remote side and try to
> re-'connect'.  I need to give this more thought to determine if there's
> anything that we can do here.  (This seems hard without the rdma_cm
> controlling the QP and CQs.)  Do you have any ideas?


Indeed, this is somehow not easily possible in all cases for us, as we are
not always allowed to add a wire protocol on --this-- QP, but we are looking
into that. Other solution we consider is "invalidate" the app level "address
handle" (IB AH + remote QPN) every ten seconds or so and then re-connect,
but this is not very much efficient.

Or.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070618/e0016e55/attachment.html>

From mshefty at ichips.intel.com  Mon Jun 18 12:23:39 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 18 Jun 2007 12:23:39 -0700
Subject: [ofa-general] disconnect implementation for rdma cm unconnected
	datagram service
In-Reply-To: <15ddcffd0706181154m26a61ad5u6fe82ff1df19ff4d@mail.gmail.com>
References: <Pine.LNX.4.64.0706171158080.4098@zuben>	
	<4676B60F.3090002@ichips.intel.com>
	<15ddcffd0706181154m26a61ad5u6fe82ff1df19ff4d@mail.gmail.com>
Message-ID: <4676DBBB.1010202@ichips.intel.com>

> was that done on purpose? is there (eg implementation or spec related) 
> any problem to send DREQ through the CM?

This is spec related - DREQ doesn't apply to UD QPs - only connected.

> I did not mean to totally hide from the app (eg to the extent of no need 
> to re create the address vector), I just wonder if the mechanics to 
> realize that an unconnected rdmacm id is not "connected" any more can be 
> fully implemented within the rdmacm.

I don't see a way to do this underneath within the existing spec.  If 
the IB CM tracked SIDR lookups, maintaining state information, then we 
could make use of a DREQ type command to notify the remote side the the 
local QP is going away.  But this is outside of the spec, plus doesn't 
solve all of the issues (like a remote system reboot).

I don't think there's even an existing trap that we can use.

> Indeed, this is somehow not easily possible in all cases for us, as we 
> are not always allowed to add a wire protocol on --this-- QP, but we are 
> looking into that. Other solution we consider is "invalidate" the app 
> level "address handle" (IB AH + remote QPN) every ten seconds or so and 
> then re-connect, but this is not very much efficient.

How does IPoIB handle this?  Does it just time out the ARP entries every 
x minutes, which requires a new lookup?

Is there some way that you could map LIDs to QPNs, and use the 
SLID/src_qp data in the work completion to see if a remote service has 
moved QPs?

- Sean


From or.gerlitz at gmail.com  Mon Jun 18 13:46:33 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Mon, 18 Jun 2007 23:46:33 +0300
Subject: [ofa-general] disconnect implementation for rdma cm unconnected
	datagram service
In-Reply-To: <4676DBBB.1010202@ichips.intel.com>
References: <Pine.LNX.4.64.0706171158080.4098@zuben>
	<4676B60F.3090002@ichips.intel.com>
	<15ddcffd0706181154m26a61ad5u6fe82ff1df19ff4d@mail.gmail.com>
	<4676DBBB.1010202@ichips.intel.com>
Message-ID: <15ddcffd0706181346r6c38fcc2qf8d050f88a9e4ddf@mail.gmail.com>

On 6/18/07, Sean Hefty <mshefty at ichips.intel.com> wrote:
>
> > was that done on purpose? is there (eg implementation or spec related)
> > any problem to send DREQ through the CM?
>
> This is spec related - DREQ doesn't apply to UD QPs - only connected.


I see.

> I did not mean to totally hide from the app (eg to the extent of no need
> > to re create the address vector), I just wonder if the mechanics to
> > realize that an unconnected rdmacm id is not "connected" any more can be
> > fully implemented within the rdmacm.
>
> I don't see a way to do this underneath within the existing spec.  If
> the IB CM tracked SIDR lookups, maintaining state information, then we
> could make use of a DREQ type command to notify the remote side the the
> local QP is going away.  But this is outside of the spec, plus doesn't
> solve all of the issues (like a remote system reboot).
>
> I don't think there's even an existing trap that we can use.


I see.

> Indeed, this is somehow not easily possible in all cases for us, as we
> > are not always allowed to add a wire protocol on --this-- QP, but we are
> > looking into that. Other solution we consider is "invalidate" the app
> > level "address handle" (IB AH + remote QPN) every ten seconds or so and
> > then re-connect, but this is not very much efficient.
>
> How does IPoIB handle this?  Does it just time out the ARP entries every
> x minutes, which requires a new lookup?


its not  IPoIB but rather the neighbouring subsystem of the IP stack, it
sends unicast arp probes every n seconds, and if m probes fail, it sends a
broadcast arp. n and m are parameters that can be changed where I think the
default is n=20sec m=3

Is there some way that you could map LIDs to QPNs, and use the
> SLID/src_qp data in the work completion to see if a remote service has
> moved QPs?


if the communication pattern is that both A sends to B and B sends to A,
then there is some path to follow here, namely for each packet (work
completion) A gets to B it checks if B's QPN has been changes, and if yes,
it does re-connect.

Or
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070618/0cc0e9f3/attachment.html>

From mshefty at ichips.intel.com  Mon Jun 18 16:40:35 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 18 Jun 2007 16:40:35 -0700
Subject: [ofa-general] hang at module removal with local sa patches applied
In-Reply-To: <20070618114843.GA25428@mellanox.co.il>
References: <20070618114843.GA25428@mellanox.co.il>
Message-ID: <467717F3.9020806@ichips.intel.com>

> [14897.168277] local_sa      D 0000000000000001     0  8361      2 (L-TLB)
> [14897.168280]  ffff81007d0d3c10 0000000000000046 0000000000000000 800000ce00000000
> [14897.168283]  84000b0000000000 000000000000000a ffff81007e8f3420 ffff81007ff1f4a0
> [14897.168287]  00000d8431895ed4 0000000000000d33 ffff81007e8f35d0 800000ce00000000
> [14897.168290] Call Trace:
> [14897.168294]  [<ffffffff80582e4a>] __mutex_lock_slowpath+0x69/0xaa
> [14897.168303]  [<ffffffff8806369a>] :ib_sa:port_work_handler+0x0/0x34
> [14897.168306]  [<ffffffff80582c87>] mutex_lock+0xe/0x10
> [14897.168311]  [<ffffffff880636b6>] :ib_sa:port_work_handler+0x1c/0x34
> [14897.168314]  [<ffffffff80241669>] run_workqueue+0x85/0x10f
> [14897.168317]  [<ffffffff80241851>] flush_cpu_workqueue+0x28/0x7b
> [14897.168320]  [<ffffffff80241ad0>] flush_workqueue+0x43/0x5d
> [14897.168326]  [<ffffffff88063250>] :ib_sa:cleanup_port+0x25/0x7b
> [14897.168331]  [<ffffffff88063307>] :ib_sa:process_updates+0x61/0x336
> [14897.168335]  [<ffffffff8058212b>] thread_return+0x0/0xea
> [14897.168341]  [<ffffffff88063656>] :ib_sa:add_update+0x7a/0x83
> [14897.168347]  [<ffffffff8806369a>] :ib_sa:port_work_handler+0x0/0x34
> [14897.168352]  [<ffffffff88063695>] :ib_sa:refresh_port_db+0x36/0x3b
> [14897.168358]  [<ffffffff880636be>] :ib_sa:port_work_handler+0x24/0x34
> [14897.168361]  [<ffffffff80241669>] run_workqueue+0x85/0x10f
> [14897.168363]  [<ffffffff80241f94>] worker_thread+0x0/0xe7
> [14897.168366]  [<ffffffff80242070>] worker_thread+0xdc/0xe7
> [14897.168368]  [<ffffffff802450af>] autoremove_wake_function+0x0/0x38
> [14897.168371]  [<ffffffff80244f8b>] kthread+0x49/0x76
> [14897.168374]  [<ffffffff8020aaa8>] child_rip+0xa/0x12
> [14897.168377]  [<ffffffff80244f42>] kthread+0x0/0x76
> [14897.168379]  [<ffffffff8020aa9e>] child_rip+0x0/0x12

Reading through the code, I see two potential issues:

* It's possible for flush_workqueue to be called from the workqueue thread.

* We hold a mutex when calling flush_workqueue, and a queued work item 
will try to acquire that same mutex.

I'll need to spend some time studying the thread synchronization to fix 
this.

- Sean


From atmdepartment at sys-vibes.com  Mon Jun 18 17:13:06 2007
From: atmdepartment at sys-vibes.com (ATM OFFICE)
Date: Mon, 18 Jun 2007 19:13:06 -0500
Subject: [ofa-general] ATM-822
Message-ID: <20070618191306.b7io8g71cg48swgc@64.40.144.173>


OFFICE OF THE DIRECTOR OF OPERATION
INTERNATIONAL CREDIT SETTLEMENT,
ATM PAYMENT DEPARTMENT (CBN)
CENTRAL BANK OF NIGERIA .
DATE:18/06/2007
VERY URGENT ATTENTION!!!
DEAR: BENEFICIARY

This is to officially inform you that we the international credit
settlement of central bank of Nigeria has verified your
contract/inheritance
file and found out that
why you have not received your part payment of $16 million is because
you have not fulfilled the obligations given to you in respect of your
contract/inheritance payment.
Secondly we have been informed that for you not to deal with the non
officials in the bank and your entire entire attempt to secure the
release of your fund to
you will be in vane. So we wish to advise you that such an illegal act
like these have to stop if you wish to receive your payment since we
have decided to bring a solution to your problem. Right now we have
arranged your payment
through our swift card payment center Asia pacific that is the latest
instruction from MR. PRESIDENT. CHIEF OLUSEGUN OBASANJO (GCFR) FEDERAL
REPUBLIC OF NIGERIA . AND EFCC CHAIRMAN MALLAM NUHU RIBADU, which will
not involve any fraudulent act or money laundering and because the
CENTRAL BANK OF NIGERIA is running for the yearly payment thatâ??s why
the
order is given, As well as the INTERPOL and FBI in conjunction with
HOMELAND SECURITY so you have absolutely nothing to be afraid of and
due
the previous scam and fraud act from imposters in our country we have
mapped out that this card is sent to your personal address so, This
card
center will send you an ATM CARD which you will use to withdraw your
money in any ATM MACHINE in any part of the world, but the maximum is
twenty thousand dollars per day, so if you like to receive your fund
this
way please let us know by contacting the card payment center officer Dr
Daniel Watac on his,

Email address:atmoffice at mailmeasap.com
Telephone/fax line: +2348080556207
And also send the following information:
1. Your full name
2. Phone and fax number
3. Addresses were you want them to send the atm card
4. Your age and current occupation
5. Attach copy of your identification

The ATM CARD PAYMENT CENTER has been mandated to issue out
$6,000,000.00 as part payment for this fiscal year 2006/2007 payment
year. Also for
your information, you have to stop any further communication with any
other person(s) or office(s) to avoid any hitches in receiving your
Payment. For oral discussion, call and email me back as soon as you
receive
this important message for further direction and also update me on any
development from the above-mentioned office.

Regards,
Dr. Daniel Watac
Director Atm Payment Department (CBN ATM OFFICE)


From mst at dev.mellanox.co.il  Mon Jun 18 22:58:41 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 19 Jun 2007 08:58:41 +0300
Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git
In-Reply-To: <adad4ztfekf.fsf@cisco.com>
References: <adad4ztfekf.fsf@cisco.com>
Message-ID: <20070619055841.GC7069@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: [GIT PULL] please pull infiniband.git
> 
> Linus, please pull from
> 
>     master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus
> 
> This tree is also available from kernel.org mirrors at:
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus
> 
> This will get a bunch of fixes to the new mlx4 driver.  This pull is
> bigger than I would have liked after -rc5, but Mellanox discovered a
> problem that required a firmware change and also some driver help to
> fix.  Since this is a new driver for 2.6.22, which is for new hardware
> that no one has in production yet, I think it's better to merge this
> early even if it risks introducing a bug, rather than have a driver
> in 2.6.22 that doesn't work at all with current adapter firmware.
> 
> Jack Morgenstein (1):
>       IB/mlx4: Handle buffer wraparound in __mlx4_ib_cq_clean()
> 
> Roland Dreier (6):
>       IB/mlx4: Fix handling of wq->tail for send completions
>       IB/mlx4: Fix warning in rounding up queue sizes
>       IB/mlx4: Handle new FW requirement for send request prefetching
>       IB/mlx4: Get rid of max_inline_data calculation
>       IB/mlx4: Handle FW command interface rev 3
>       IB/mlx4: Make sure inline data segments don't cross a 64 byte boundary

BTW, have you seen the patch for ipoib cm crasher race?
I think we need it in 2.6.22 too.

-- 
MST


From erezz at voltaire.com  Mon Jun 18 23:20:33 2007
From: erezz at voltaire.com (Erez Zilber)
Date: Tue, 19 Jun 2007 09:20:33 +0300
Subject: [ofa-general] [Fwd: [PATCH 2/2] iscsi_iser: convert to use the data
	buffer accessors]
Message-ID: <467775B1.4000208@voltaire.com>

Roland,

Can you add the patch below to 2.6.23?

Thanks,
Erez

-------- Original Message --------
Subject: 	[PATCH 2/2] iscsi_iser: convert to use the data buffer accessors
Date: 	Fri, 1 Jun 2007 12:56:21 +0300
From: 	FUJITA Tomonori <fujita.tomonori at lab.ntt.co.jp>
Reply-To: 	<open-iscsi at googlegroups.com>
To: 	<linux-scsi at vger.kernel.org>
CC: 	<James.Bottomley at steeleye.com>, <michaelc at cs.wisc.edu>,
<rdreier at cisco.com>, <open-iscsi at googlegroups.com>
References:
<dcbf60d819ad4d62643fc074968699563f25621b.1180539510.git.fujita.tomonori at lab.ntt.co.jp>


iscsi_iser: convert to use the data buffer accessors

- remove the unnecessary map_single path.

- convert to use the new accessors for the sg lists and the
parameters.

TODO: use scsi_for_each_sg().

Signed-off-by: FUJITA Tomonori <fujita.tomonori at lab.ntt.co.jp>
Signed-off-by: Erez Zilber <erezz at voltaire.com>
---
 drivers/infiniband/ulp/iser/iscsi_iser.c     |    4 ++--
 drivers/infiniband/ulp/iser/iser_initiator.c |   14 ++++----------
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 1bf173d..effdee2 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -210,10 +210,10 @@ iscsi_iser_ctask_xmit(struct iscsi_conn *conn,
        int error = 0;

        if (ctask->sc->sc_data_direction == DMA_TO_DEVICE) {
-               BUG_ON(ctask->sc->request_bufflen == 0);
+               BUG_ON(scsi_bufflen(ctask->sc) == 0);

                debug_scsi("cmd [itt %x total %d imm %d unsol_data %d\n",
-                          ctask->itt, ctask->sc->request_bufflen,
+                          ctask->itt, scsi_bufflen(ctask->sc),
                           ctask->imm_count, ctask->unsol_count);
        }

diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c
b/drivers/infiniband/ulp/iser/iser_initiator.c
index 3651072..9ea5b9a 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -351,18 +351,12 @@ int iser_send_command(struct iscsi_conn     *conn,
        else
                data_buf = &iser_ctask->data[ISER_DIR_OUT];

-       if (sc->use_sg) { /* using a scatter list */
-               data_buf->buf  = sc->request_buffer;
-               data_buf->size = sc->use_sg;
-       } else if (sc->request_bufflen) {
-               /* using a single buffer - convert it into one entry SG */
-               sg_init_one(&data_buf->sg_single,
-                           sc->request_buffer, sc->request_bufflen);
-               data_buf->buf   = &data_buf->sg_single;
-               data_buf->size  = 1;
+       if (scsi_sg_count(sc)) { /* using a scatter list */
+               data_buf->buf  = scsi_sglist(sc);
+               data_buf->size = scsi_sg_count(sc);
        }

-       data_buf->data_len = sc->request_bufflen;
+       data_buf->data_len = scsi_bufflen(sc);

        if (hdr->flags & ISCSI_FLAG_CMD_READ) {
                err = iser_prepare_read_cmd(ctask, edtl);
--
1.4.4.4


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "open-iscsi" group.
To post to this group, send email to open-iscsi at googlegroups.com
To unsubscribe from this group, send email to
open-iscsi-unsubscribe at googlegroups.com
For more options, visit this group at
http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---
  

From fujita.tomonori at lab.ntt.co.jp  Mon Jun 18 23:32:04 2007
From: fujita.tomonori at lab.ntt.co.jp (FUJITA Tomonori)
Date: Tue, 19 Jun 2007 15:32:04 +0900
Subject: [ofa-general] Re: [Fwd: [PATCH 2/2] iscsi_iser: convert to use the
 data buffer accessors]
In-Reply-To: <467775B1.4000208@voltaire.com>
References: <467775B1.4000208@voltaire.com>
Message-ID: <20070619153204D.fujita.tomonori@lab.ntt.co.jp>

From: Erez Zilber <erezz at voltaire.com>
Subject: [Fwd: [PATCH 2/2] iscsi_iser: convert to use the data buffer accessors]
Date: Tue, 19 Jun 2007 09:20:33 +0300

> 
> Roland,
> 
> Can you add the patch below to 2.6.23?

Thanks, but the patch was already added to James' scsi-misc tree (for
2.6.23). It's easier to add this to his tree since it depends on the
patch to add the accessors in his tree. So you don't worry about it.


From erezz at voltaire.com  Mon Jun 18 23:47:15 2007
From: erezz at voltaire.com (Erez Zilber)
Date: Tue, 19 Jun 2007 09:47:15 +0300
Subject: [ofa-general] Re: [Fwd: [PATCH 2/2] iscsi_iser: convert to use
	the data buffer accessors]
In-Reply-To: <20070619153204D.fujita.tomonori@lab.ntt.co.jp>
References: <467775B1.4000208@voltaire.com>
	<20070619153204D.fujita.tomonori@lab.ntt.co.jp>
Message-ID: <46777BF3.7090805@voltaire.com>

FUJITA Tomonori wrote:

> From: Erez Zilber <erezz at voltaire.com>
> Subject: [Fwd: [PATCH 2/2] iscsi_iser: convert to use the data buffer
> accessors]
> Date: Tue, 19 Jun 2007 09:20:33 +0300
>
> >
> > Roland,
> >
> > Can you add the patch below to 2.6.23?
>
> Thanks, but the patch was already added to James' scsi-misc tree (for
> 2.6.23). It's easier to add this to his tree since it depends on the
> patch to add the accessors in his tree. So you don't worry about it.
> _______________________________________________
>

OK. Roland - please ignore this patch.

Erez


From Lawandasigmaawhile at mbta.com  Tue Jun 19 00:01:48 2007
From: Lawandasigmaawhile at mbta.com (Lina Rowell)
Date: Tue, 19 Jun 2007 00:01:48 -0700 (PDT)
Subject: [ofa-general] Would you like to be paying less each month?
Message-ID: <20070619070149.A114DE603CA@openfabrics.org>

As a business you have been preapproved to receive 43995 USD TODAY!

No hassle at all, completely unsecured.
There are no hidden costs or fees.
Worried that your credit is less than perfect? Not an issue.

Give us a ring, now..

877.208.5661

Turn your dream, into a reality, is that not worth two minutes of your time?

877.208.5661

Eat it, Annie, suck on it, go on and eat it, be a Do-Bee and eat your book all up. But sometimes the sounds - like the pain - faded, and then there was only the haze.

Lina Cullen


From vlad at lists.openfabrics.org  Tue Jun 19 02:42:50 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Tue, 19 Jun 2007 02:42:50 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070619-0200 daily build status
Message-ID: <20070619094250.A068AE6083B@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.12
Passed on ppc64 with linux-2.6.18
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.19
Passed on ia64 with linux-2.6.18
Passed on x86_64 with linux-2.6.16
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.12
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.13
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on ia64 with linux-2.6.14
Passed on powerpc with linux-2.6.15
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.14
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.17
Passed on x86_64 with linux-2.6.17
Passed on ia64 with linux-2.6.15
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.14
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.13
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.17
Passed on x86_64 with linux-2.6.19
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.16
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.18-8.el5
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From rdreier at cisco.com  Tue Jun 19 03:09:02 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 19 Jun 2007 03:09:02 -0700
Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git
References: <adad4ztfekf.fsf@cisco.com> <20070619055841.GC7069@mellanox.co.il>
Message-ID: <adafy4odznl.fsf@cisco.com>

 > BTW, have you seen the patch for ipoib cm crasher race?
 > I think we need it in 2.6.22 too.

I saw the discussion and it does seem like something to fix for
2.6.22.  But I didn't see a final conclusion on which patch to use,
and I don't think I ever saw a patch with a good changelog and
signed-off-by line either... please resend if I missed it.

 - R.


From mst at dev.mellanox.co.il  Tue Jun 19 03:36:10 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 19 Jun 2007 13:36:10 +0300
Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git
In-Reply-To: <adafy4odznl.fsf@cisco.com>
References: <adad4ztfekf.fsf@cisco.com> <20070619055841.GC7069@mellanox.co.il>
	<adafy4odznl.fsf@cisco.com>
Message-ID: <20070619103610.GA15224@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [GIT PULL] please pull infiniband.git
> 
>  > BTW, have you seen the patch for ipoib cm crasher race?
>  > I think we need it in 2.6.22 too.
> 
> I saw the discussion and it does seem like something to fix for
> 2.6.22.  But I didn't see a final conclusion on which patch to use,
> and I don't think I ever saw a patch with a good changelog and
> signed-off-by line either... please resend if I missed it.

Resending.

-- 
MST


From mst at dev.mellanox.co.il  Tue Jun 19 03:40:41 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 19 Jun 2007 13:40:41 +0300
Subject: [ofa-general] Re: [PATCH for-2.6.22] ipoib/cm: initialize RX before
	moving QP to RTR
In-Reply-To: <20070618083240.GK14335@mellanox.co.il>
References: <4672BE23.3050809@ichips.intel.com>
	<BAE9DCEF64577A439B3A37F36F9B691C0285B8C1@orsmsx418.amr.corp.intel.com>
	<20070618083240.GK14335@mellanox.co.il>
Message-ID: <20070619104041.GB15224@mellanox.co.il>

Fix a crasher bug in IPoIB CM: once QP is in RTR, an RX completion (and even an
asynchronous error) might be observed on this QP, so we have to initialize all
RX fields beforehand.

As an optimization (since modify_qp might take a long time),
the jiffies update done when moving RX to the passive_ids list is also
left in place to reduce the chance of the RX being mis-detected as stale.

This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=662>

Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>

---

Resending - Roland, is the changelog OK?
Please consider this bugfix for 2.6.22.

> > Quoting Woodruff, Robert J <robert.j.woodruff at intel.com>:
> > Subject: RE: [ofa-general] crash in ipoib
> > 
> > Sean wrote,
> > >> And here's a version with error handling fixed.
> > >> Sean, does this solve your crash?
> > 
> > >We've been running this patch since yesterday and haven't seen any 
> > >crashes.  We'll continue testing this over the week-end.
> > 
> > >- Sean
> > 
> > This looks like it fixed the panic. 
> > 
> > Should we try to put out a new RC with this latest ipoib fix ?
> > I really think we need it in the release. If we could get another RC out
> > today,
> > that would only delay the release by a couple of more days and we could
> > release on next Friday rather than wed. and still give people a week to 
> > test the final RC.
> > 
> > woody
> 
> OK, the following patch has been added to OFED 1.2.
> Roland, please consider this bugfix for 2.6.22.

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 076a0bb..c64249f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -309,6 +309,11 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 		return -ENOMEM;
 	p->dev = dev;
 	p->id = cm_id;
+	cm_id->context = p;
+	p->state = IPOIB_CM_RX_LIVE;
+	p->jiffies = jiffies;
+	INIT_LIST_HEAD(&p->list);
+
 	p->qp = ipoib_cm_create_rx_qp(dev, p);
 	if (IS_ERR(p->qp)) {
 		ret = PTR_ERR(p->qp);
@@ -320,24 +325,24 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 	if (ret)
 		goto err_modify;
 
+	spin_lock_irq(&priv->lock);
+	queue_delayed_work(ipoib_workqueue,
+			   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
+	/* Add this entry to passive ids list head, but do not re-add it
+	 * if IB_EVENT_QP_LAST_WQE_REACHED has moved it to flush list. */
+	p->jiffies = jiffies;
+	if (p->state == IPOIB_CM_RX_LIVE)
+		list_move(&p->list, &priv->cm.passive_ids);
+	spin_unlock_irq(&priv->lock);
+
 	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
 	if (ret) {
 		ipoib_warn(priv, "failed to send REP: %d\n", ret);
-		goto err_rep;
+		if (ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE))
+			ipoib_warn(priv, "unable to move qp to error state\n");
 	}
-
-	cm_id->context = p;
-	p->jiffies = jiffies;
-	p->state = IPOIB_CM_RX_LIVE;
-	spin_lock_irq(&priv->lock);
-	if (list_empty(&priv->cm.passive_ids))
-		queue_delayed_work(ipoib_workqueue,
-				   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
-	list_add(&p->list, &priv->cm.passive_ids);
-	spin_unlock_irq(&priv->lock);
 	return 0;
 
-err_rep:
 err_modify:
 	ib_destroy_qp(p->qp);
 err_qp:

-- 
MST


From hanafim.ctr at asc.hpc.mil  Tue Jun 19 06:31:29 2007
From: hanafim.ctr at asc.hpc.mil (MAHMOUD HANAFI)
Date: Tue, 19 Jun 2007 09:31:29 -0400
Subject: [ofa-general] Build error 1.2rc5
Message-ID: <4677DAB1.2080002@asc.hpc.mil>

Any one else seen these build error with 1.2rc5?


RPM build errors:
     user vlad does not exist - using root
     group vlad does not exist - using root
     user vlad does not exist - using root
     group vlad does not exist - using root
     File listed twice: /usr/lib/libibverbs.so.1
     File listed twice: /usr/lib/libibverbs.so.1.0.0
     File listed twice: /usr/lib/libibverbs.a
     File listed twice: /usr/lib/libibverbs.so
     File listed twice: /usr/lib/libibcm.so.1
     File listed twice: /usr/lib/libibcm.so.1.0
     File listed twice: /usr/lib/libibcm.so.1.0.0
     File listed twice: /usr/lib/libibcm.so
     File listed twice: /usr/lib/libmthca-rdmav2.so
     File listed twice: /usr/lib/libmthca.so
     File listed twice: /usr/lib/libmthca.a
     File listed twice: /usr/lib/libcxgb3-rdmav2.so
     File listed twice: /usr/lib/libcxgb3.so
     File listed twice: /usr/lib/libcxgb3.a
     File listed twice: /usr/lib/libipathverbs-rdmav2.so
     File listed twice: /usr/lib/libipathverbs.so
     File listed twice: /usr/lib/libipathverbs.a
     File listed twice: /usr/lib/libsdp.so
     File listed twice: /usr/lib/libsdp.so.1
     File listed twice: /usr/lib/libsdp.so.1.0.0
     File listed twice: /usr/lib/libibcommon.so.1
     File listed twice: /usr/lib/libibcommon.so.1.0.0
     File listed twice: /usr/lib/libibcommon.a
     File listed twice: /usr/lib/libibcommon.so
     File listed twice: /usr/lib/libibmad.so.1
     File listed twice: /usr/lib/libibmad.so.1.2.0
     File listed twice: /usr/lib/libibmad.a
     File listed twice: /usr/lib/libibmad.so
     File listed twice: /usr/lib/libibumad.so.1
     File listed twice: /usr/lib/libibumad.so.1.0.0
     File listed twice: /usr/lib/libibumad.a
     File listed twice: /usr/lib/libibumad.so
     File listed twice: /usr/lib/libosmcomp.so.1
     File listed twice: /usr/lib/libosmcomp.so.1.0.1
     File listed twice: /usr/lib/libosmcomp-2.1.3.so
     File listed twice: /usr/lib/libosmcomp.a
     File listed twice: /usr/lib/libosmcomp.so
     File listed twice: /usr/lib/libopensm.so.1
     File listed twice: /usr/lib/libopensm.so.1.1.0
     File listed twice: /usr/lib/libopensm-2.1.4.so
     File listed twice: /usr/lib/libopensm.a
     File listed twice: /usr/lib/libopensm.so
     File listed twice: /usr/lib/libosmvendor.so.2
     File listed twice: /usr/lib/libosmvendor.so.2.0.0
     File listed twice: /usr/lib/libosmvendor-2.1.3.so
     File listed twice: /usr/lib/libosmvendor.a
     File listed twice: /usr/lib/libosmvendor.so
     File listed twice: /usr/lib/libosmvendor_openib.so
     File listed twice: /usr/lib/librdmacm.so.1
     File listed twice: /usr/lib/librdmacm.so.1.0.0
     File listed twice: /usr/lib/librdmacm.so.1.0.1
     File listed twice: /usr/lib/librdmacm.so
     File not found: /var/tmp/OFED/etc/dat.conf
     File listed twice: /usr/lib/libdaplcma.a
     File listed twice: /usr/lib/libdat.a
ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix 
/usr' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-dapl --with-ipoibtools 
--with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs 
--with-libipathverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp 
--with-openib-diags --with-qlvnictools --with-sdpnetstat --with-srptools --with-mstflint 
--with-perftest --with-tvflash --sysconfdir=/usr/etc --mandir=/usr/man' --define 
'configure_options32 --with-dapl --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon 
--with-libibmad --with-libibumad --with-libibverbs --with-libipathverbs --with-libmthca 
--with-opensm --with-librdmacm --with-libsdp --with-openib-diags --with-qlvnictools 
--with-sdpnetstat --with-srptools --sysconfdir=/usr/etc --mandir=/usr/man' --define 'build_32bit 1' 
--define '_mandir /usr/man' /tmp/OFED-1.2-rc5/SRPMS/ofa_user-1.2-rc5.src.rpm"

-- 
Mahmoud Hanafi
Senior System Administrator
ASC/MSRC
www.asc.hpc.mil
2435 5th Street
WPAFB, OHIO 45433
(937) 255-1536


From jackm at dev.mellanox.co.il  Tue Jun 19 06:41:52 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Tue, 19 Jun 2007 16:41:52 +0300
Subject: [ofa-general] [PATCH 1 of 2] net-mlx4: Show board_id string in sysfs
	under the pci device
Message-ID: <200706191641.52831.jackm@dev.mellanox.co.il>

Show the board_id string in sysfs under the pci device (not under the infiniband
device, as with other HCAs). ConnectX will also have an enet device (which will
not be under the infiniband class) and users of this device must also have 
access to the board_id string.

This requires a small modification in the libibverbs example "ibv_devinfo"; the app
must also look under the pci device for the board_id if it does not find it 
directly under the infiniband device.

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

Index: connectx_kernel/drivers/net/mlx4/main.c
===================================================================
--- connectx_kernel.orig/drivers/net/mlx4/main.c	2007-05-07 18:36:02.000000000 +0300
+++ connectx_kernel/drivers/net/mlx4/main.c	2007-05-08 12:52:49.000000000 +0300
@@ -711,6 +711,18 @@
 		priv->eq_table.eq[i].irq = dev->pdev->irq;
 }
 
+static ssize_t mlx4_show_board_id(struct device *dev,
+				  struct device_attribute *attr,
+				  char *buf)
+{
+	struct mlx4_dev  *mdev  = dev->driver_data;
+	struct mlx4_priv *priv = mlx4_priv(mdev);
+
+	return snprintf(buf, MLX4_BOARD_ID_LEN, "%s\n", (char *)priv->board_id);
+}
+
+static DEVICE_ATTR(board_id, S_IRUGO, mlx4_show_board_id, NULL);
+
 static int __devinit mlx4_init_one(struct pci_dev *pdev,
 				   const struct pci_device_id *id)
 {
@@ -827,6 +839,7 @@
 		goto err_cleanup;
 
 	pci_set_drvdata(pdev, dev);
+	device_create_file(&pdev->dev, &dev_attr_board_id);
 
 	return 0;
 
@@ -875,6 +888,7 @@
 	int p;
 
 	if (dev) {
+		device_remove_file(&pdev->dev, &dev_attr_board_id);
 		mlx4_unregister_device(dev);
 
 		for (p = 1; p <= dev->caps.num_ports; ++p)


From jackm at dev.mellanox.co.il  Tue Jun 19 06:44:27 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Tue, 19 Jun 2007 16:44:27 +0300
Subject: [ofa-general] [PATCH 2 of 2] libibverbs: modify ibv_devinfo to look
	under pci device as well if it does not find board_id under
	the ib device
Message-ID: <200706191644.27675.jackm@dev.mellanox.co.il>

devinfo needs to look under the pci device directory for board_id if
it does not find it under the infiniband device directory.

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

--- a/src/userspace/libibverbs/examples/devinfo.c	2007-05-01 11:15:29.409126000 +0300
+++ b/src/userspace/libibverbs/examples/devinfo.c	2007-05-08 14:56:02.000000000 +0300
@@ -195,6 +195,14 @@ static int print_hca_cap(struct ibv_devi
 
 	if (ibv_read_sysfs_file(ib_dev->ibdev_path, "board_id", buf, sizeof buf) > 0)
 		printf("\tboard_id:\t\t\t%s\n", buf);
+	else {
+		char syspath[256];
+		strcpy((char *) syspath, ib_dev->ibdev_path);
+		strcat((char *) syspath, "/device");
+		if (ibv_read_sysfs_file((char *) syspath,
+		    "board_id", buf, sizeof buf) > 0)
+			printf("\tboard_id:\t\t\t%s\n", buf);
+	}
 
 	printf("\tphys_port_cnt:\t\t\t%d\n", device_attr.phys_port_cnt);
 

From erezz at voltaire.com  Tue Jun 19 06:45:03 2007
From: erezz at voltaire.com (Erez Zilber)
Date: Tue, 19 Jun 2007 16:45:03 +0300
Subject: [ofa-general] Build error 1.2rc5
In-Reply-To: <4677DAB1.2080002@asc.hpc.mil>
References: <4677DAB1.2080002@asc.hpc.mil>
Message-ID: <4677DDDF.9080109@voltaire.com>

MAHMOUD HANAFI wrote:

> Any one else seen these build error with 1.2rc5?

Try to send this to ewg at lists.openfabrics.org.

Erez


From jackm at dev.mellanox.co.il  Tue Jun 19 06:47:41 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Tue, 19 Jun 2007 16:47:41 +0300
Subject: [ofa-general] [PATCH] libmlx4: fix adjustments for minimum qp
	capabilities in mlx4_create_qp
Message-ID: <200706191647.41336.jackm@dev.mellanox.co.il>

Need to adjust minimum qp capability values prior to size and max resource
calculations.

Correct the rq values afterwards (as before) if have an srq.

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

Index: new_connectx_user/src/userspace/libmlx4/src/verbs.c
===================================================================
--- new_connectx_user.orig/src/userspace/libmlx4/src/verbs.c	2007-06-18 09:33:04.000000000 +0300
+++ new_connectx_user/src/userspace/libmlx4/src/verbs.c	2007-06-19 09:47:10.000000000 +0300
@@ -355,6 +355,12 @@ struct ibv_qp *mlx4_create_qp(struct ibv
 	if (!qp)
 		return NULL;
 
+	/* adjust minimum cap values */
+	attr->cap.max_recv_wr = attr->cap.max_recv_wr ? attr->cap.max_recv_wr : 1;
+	attr->cap.max_recv_sge = attr->cap.max_recv_sge ? attr->cap.max_recv_sge : 1;
+	attr->cap.max_send_wr = attr->cap.max_send_wr ? attr->cap.max_send_wr : 1;
+	attr->cap.max_send_sge = attr->cap.max_send_sge ? attr->cap.max_send_sge : 1;
+
 	mlx4_calc_sq_wqe_size(&attr->cap, attr->qp_type, qp);
 
 	/*
@@ -366,9 +372,7 @@ struct ibv_qp *mlx4_create_qp(struct ibv
 	qp->rq.wqe_cnt = align_queue_size(attr->cap.max_recv_wr);
 
 	if (attr->srq)
-		attr->cap.max_recv_wr = qp->rq.wqe_cnt = 0;
-	else if (attr->cap.max_recv_sge < 1)
-		attr->cap.max_recv_sge = 1;
+		attr->cap.max_recv_wr = attr->cap.max_recv_sge = qp->rq.wqe_cnt = 0;
 
 	if (mlx4_alloc_qp_buf(pd, &attr->cap, attr->qp_type, qp))
 		goto err;


From Eric.Baur at gs.com  Tue Jun 19 07:25:52 2007
From: Eric.Baur at gs.com (Baur, Eric)
Date: Tue, 19 Jun 2007 10:25:52 -0400
Subject: [ofa-general] Build error 1.2rc5
In-Reply-To: <4677DAB1.2080002@asc.hpc.mil>
References: <4677DAB1.2080002@asc.hpc.mil>
Message-ID: <4DCBAA39733E8048992FB7737126041902829564@gsmbnbp23es.firmwide.corp.gs.com>

Yes. The issue seems to be caused by the fact that both 32-bit and
64-bit libs are written to /usr/lib rather than /usr/lib and
/usr/lib64/.

A quick workaround is to modify ofed.conf to only build 64 bit
(build_32bit=0).

-Eric

-----Original Message-----
From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of MAHMOUD
HANAFI
Sent: Tuesday, June 19, 2007 9:31 AM
To: general at lists.openfabrics.org
Subject: [ofa-general] Build error 1.2rc5

Any one else seen these build error with 1.2rc5?


RPM build errors:
     user vlad does not exist - using root
     group vlad does not exist - using root
     user vlad does not exist - using root
     group vlad does not exist - using root
     File listed twice: /usr/lib/libibverbs.so.1
     File listed twice: /usr/lib/libibverbs.so.1.0.0
     File listed twice: /usr/lib/libibverbs.a
     File listed twice: /usr/lib/libibverbs.so
     File listed twice: /usr/lib/libibcm.so.1
     File listed twice: /usr/lib/libibcm.so.1.0
     File listed twice: /usr/lib/libibcm.so.1.0.0
     File listed twice: /usr/lib/libibcm.so
     File listed twice: /usr/lib/libmthca-rdmav2.so
     File listed twice: /usr/lib/libmthca.so
     File listed twice: /usr/lib/libmthca.a
     File listed twice: /usr/lib/libcxgb3-rdmav2.so
     File listed twice: /usr/lib/libcxgb3.so
     File listed twice: /usr/lib/libcxgb3.a
     File listed twice: /usr/lib/libipathverbs-rdmav2.so
     File listed twice: /usr/lib/libipathverbs.so
     File listed twice: /usr/lib/libipathverbs.a
     File listed twice: /usr/lib/libsdp.so
     File listed twice: /usr/lib/libsdp.so.1
     File listed twice: /usr/lib/libsdp.so.1.0.0
     File listed twice: /usr/lib/libibcommon.so.1
     File listed twice: /usr/lib/libibcommon.so.1.0.0
     File listed twice: /usr/lib/libibcommon.a
     File listed twice: /usr/lib/libibcommon.so
     File listed twice: /usr/lib/libibmad.so.1
     File listed twice: /usr/lib/libibmad.so.1.2.0
     File listed twice: /usr/lib/libibmad.a
     File listed twice: /usr/lib/libibmad.so
     File listed twice: /usr/lib/libibumad.so.1
     File listed twice: /usr/lib/libibumad.so.1.0.0
     File listed twice: /usr/lib/libibumad.a
     File listed twice: /usr/lib/libibumad.so
     File listed twice: /usr/lib/libosmcomp.so.1
     File listed twice: /usr/lib/libosmcomp.so.1.0.1
     File listed twice: /usr/lib/libosmcomp-2.1.3.so
     File listed twice: /usr/lib/libosmcomp.a
     File listed twice: /usr/lib/libosmcomp.so
     File listed twice: /usr/lib/libopensm.so.1
     File listed twice: /usr/lib/libopensm.so.1.1.0
     File listed twice: /usr/lib/libopensm-2.1.4.so
     File listed twice: /usr/lib/libopensm.a
     File listed twice: /usr/lib/libopensm.so
     File listed twice: /usr/lib/libosmvendor.so.2
     File listed twice: /usr/lib/libosmvendor.so.2.0.0
     File listed twice: /usr/lib/libosmvendor-2.1.3.so
     File listed twice: /usr/lib/libosmvendor.a
     File listed twice: /usr/lib/libosmvendor.so
     File listed twice: /usr/lib/libosmvendor_openib.so
     File listed twice: /usr/lib/librdmacm.so.1
     File listed twice: /usr/lib/librdmacm.so.1.0.0
     File listed twice: /usr/lib/librdmacm.so.1.0.1
     File listed twice: /usr/lib/librdmacm.so
     File not found: /var/tmp/OFED/etc/dat.conf
     File listed twice: /usr/lib/libdaplcma.a
     File listed twice: /usr/lib/libdat.a
ERROR: Failed executing "rpmbuild --rebuild --define '_topdir
/var/tmp/OFEDRPM' --define '_prefix 
/usr' --define 'build_root /var/tmp/OFED' --define 'configure_options
--with-dapl --with-ipoibtools 
--with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad
--with-libibumad --with-libibverbs 
--with-libipathverbs --with-libmthca --with-opensm --with-librdmacm
--with-libsdp 
--with-openib-diags --with-qlvnictools --with-sdpnetstat --with-srptools
--with-mstflint 
--with-perftest --with-tvflash --sysconfdir=/usr/etc --mandir=/usr/man'
--define 
'configure_options32 --with-dapl --with-ipoibtools --with-libcxgb3
--with-libibcm --with-libibcommon 
--with-libibmad --with-libibumad --with-libibverbs --with-libipathverbs
--with-libmthca 
--with-opensm --with-librdmacm --with-libsdp --with-openib-diags
--with-qlvnictools 
--with-sdpnetstat --with-srptools --sysconfdir=/usr/etc
--mandir=/usr/man' --define 'build_32bit 1' 
--define '_mandir /usr/man'
/tmp/OFED-1.2-rc5/SRPMS/ofa_user-1.2-rc5.src.rpm"

-- 
Mahmoud Hanafi
Senior System Administrator
ASC/MSRC
www.asc.hpc.mil
2435 5th Street
WPAFB, OHIO 45433
(937) 255-1536
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


From tziporet at mellanox.co.il  Tue Jun 19 07:47:43 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Tue, 19 Jun 2007 17:47:43 +0300
Subject: [ofa-general] Anouncement: OFED 1.2 rc6 is avilable
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90156362A@mtlexch01.mtl.com>
References: <43AA3CB3C1BF5A499F5AAD31CA5023AC06624A26@mtlexch01.mtl.com>
	<6C2C79E72C305246B504CBA17B5500C9015634B7@mtlexch01.mtl.com>
	<6C2C79E72C305246B504CBA17B5500C90156362A@mtlexch01.mtl.com>
Message-ID: <6C2C79E72C305246B504CBA17B5500C9015636A9@mtlexch01.mtl.com>

 
Hi, 

OFED 1.2-RC6 is available on
http://www.openfabrics.org/builds/ofed-1.2/ 
File: OFED-1.2-rc6.tgz 
To get BUILD_ID run ofed_info 

Please report any issues in bugzilla https://bugs.openfabrics.org/

The GA release is expected this Friday (June 22)

I attach the OFED RN - please review and send me comments to the final
release

Thanks,
Tziporet

========================================================================

Release information: 

OS support: 
Novell: 
    - SLES 9.0 SP3 
    - SLES10 
    - SLES10 SP1 RC5
Redhat: 
    - Redhat EL4 up3, up4 and up5 
    - Redhat EL5 
kernel.org: 
    - 2.6.20 
    - 2.6.19 

Note: Kernel 2.6.21, Fedora C6 and SuSE Pro 10 are not part of the
official list. 
We keep the backport patches for these OSes and make sure OFED compile
and loaded properly but will not do full QA cycle.

Systems: 
    * x86_64 
    * x86 
    * ia64 
    * ppc64 

Main changes from OFED-1.1-rc5: 
===============================
1. Fixed 6 bugs (see attached for fixed issues)

See bugzilla for all open issues. 

Tasks that should be completed for the GA release: 
1. Complete all documentation (release notes, README, etc.) 
2. Run all QA tests on all platforms
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rc6_fixed_bugs.csv
Type: application/octet-stream
Size: 636 bytes
Desc: rc6_fixed_bugs.csv
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070619/ad94d792/attachment.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: OFED_release_notes.txt
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070619/ad94d792/attachment.txt>

From mhanafi at csc.com  Tue Jun 19 08:05:50 2007
From: mhanafi at csc.com (Mahmoud Hanafi)
Date: Tue, 19 Jun 2007 11:05:50 -0400
Subject: [ofa-general] Anouncement: OFED 1.2 rc6 is avilable
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9015636A9@mtlexch01.mtl.com>
Message-ID: <OFAEAAE34D.6E969F93-ON852572FF.00523E99-852572FF.0052EE81@csc.com>

Changing the default install from /usr to /usr/local/ofed1.2 these files 
are copied to ../usr/etc and using the default install location dat.conf 
is still copied to ../usr/etc

RPM build errors:
    user vlad does not exist - using root
    group vlad does not exist - using root
    user vlad does not exist - using root
    group vlad does not exist - using root
    File not found: 
/var/tmp/OFED/usr/local/ofed1.2/etc/libibverbs.d/mthca.driver
    File not found: 
/var/tmp/OFED/usr/local/ofed1.2/etc/libibverbs.d/cxgb3.driver
    File not found: 
/var/tmp/OFED/usr/local/ofed1.2/etc/libibverbs.d/ipath.driver
    File not found: /var/tmp/OFED/usr/local/ofed1.2/etc/libsdp.conf
    File not found: /var/tmp/OFED/etc/dat.conf
ERROR: Failed executing "rpmbuild --rebuild --define '_topdir 
/var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed1.2' --define 
'build_root /var/tmp/OFED' --define 'configure_options --with-dapl 
--with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon 
--with-libibmad --with-libibumad --with-libibverbs --with-libipathverbs 
--with-libmthca --with-opensm --with-librdmacm --with-libsdp 
--with-openib-diags --with-qlvnictools --with-sdpnetstat --with-srptools 
--with-mstflint --with-perftest --with-tvflash --sysconfdir=/usr/etc 
--mandir=/usr/man' --define 'configure_options32 %{nil} 
--sysconfdir=/usr/etc --mandir=/usr/man' --define 'build_32bit 0' --define 
'_mandir /usr/man' /tmp/OFED-1.2-rc5/SRPMS/ofa_user-1.2-rc5.src.rpm"


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This is a PRIVATE message. If you are not the intended recipient, please 
delete without copying and kindly advise us by e-mail of the mistake in 
delivery. NOTE: Regardless of content, this e-mail shall not operate to 
bind CSC to any order or other contract unless pursuant to explicit 
written agreement or government initiative expressly permitting the use of 
e-mail for such purpose.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


"Tziporet Koren" <tziporet at mellanox.co.il> 
Sent by: general-bounces at lists.openfabrics.org
06/19/2007 10:47 AM

To
<ewg at lists.openfabrics.org>
cc
general at lists.openfabrics.org
Subject
[ofa-general] Anouncement: OFED 1.2 rc6 is avilable


Hi, 

OFED 1.2-RC6 is available on
http://www.openfabrics.org/builds/ofed-1.2/ 
File: OFED-1.2-rc6.tgz 
To get BUILD_ID run ofed_info 

Please report any issues in bugzilla https://bugs.openfabrics.org/

The GA release is expected this Friday (June 22)

I attach the OFED RN - please review and send me comments to the final
release

Thanks,
Tziporet

========================================================================

Release information: 

OS support: 
Novell: 
    - SLES 9.0 SP3 
    - SLES10 
    - SLES10 SP1 RC5
Redhat: 
    - Redhat EL4 up3, up4 and up5 
    - Redhat EL5 
kernel.org: 
    - 2.6.20 
    - 2.6.19 

Note: Kernel 2.6.21, Fedora C6 and SuSE Pro 10 are not part of the
official list. 
We keep the backport patches for these OSes and make sure OFED compile
and loaded properly but will not do full QA cycle.

Systems: 
    * x86_64 
    * x86 
    * ia64 
    * ppc64 

Main changes from OFED-1.1-rc5: 
===============================
1. Fixed 6 bugs (see attached for fixed issues)

See bugzilla for all open issues. 

Tasks that should be completed for the GA release: 
1. Complete all documentation (release notes, README, etc.) 
2. Run all QA tests on all platforms
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070619/8bf857f3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rc6_fixed_bugs.csv
Type: application/octet-stream
Size: 636 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070619/8bf857f3/attachment.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: OFED_release_notes.txt
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070619/8bf857f3/attachment.txt>

From mhanafi at csc.com  Tue Jun 19 08:15:02 2007
From: mhanafi at csc.com (Mahmoud Hanafi)
Date: Tue, 19 Jun 2007 11:15:02 -0400
Subject: [ofa-general] Anouncement: OFED 1.2 rc6 is avilable
In-Reply-To: <OFAEAAE34D.6E969F93-ON852572FF.00523E99-852572FF.0052EE81@csc.com>
Message-ID: <OF578F6ABB.5B852EF0-ON852572FF.00539ABF-852572FF.0053C62A@csc.com>

Sorry, I got rc5 and rc6 mixed up. 

Here is the rc6 issue. (looks like base.h is missing) 
 gcc 
-Wp,-MD,/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/scsi/.attribute_container.o.d 
-nostdinc -iwithprefix include -D__KERNEL__ 
-I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.9_U4/include/ 
 -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include 
-I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/include 
-Iinclude  -Iinclude2 
-I/usr/src/linux-2.6.9-42.0.10.EL_lustre.1.4.10/include  -include 
include/linux/autoconf.h  -include 
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include/linux/autoconf.h 
-I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/scsi -Wall 
-Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Os 
-fomit-frame-pointer -Wdeclaration-after-statement -mno-red-zone 
-mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare 
-fno-asynchronous-unwind-tables -funit-at-a-time 
-I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include 
-I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/include 
-I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drive
 rs/infiniband/ulp/ipoib 
-I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/debug 
-I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/cxgb3/core 
-I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3 
-I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/rds 
-I/usr/src/linux-2.6.9-42.0.10.EL_lustre.1.4.10/kernel_addons/backport/2.6.9_U4/include/src/ 
-DMODULE -DKBUILD_BASENAME=attribute_container 
-DKBUILD_MODNAME=scsi_transport_iscsi -c -o 
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/scsi/.tmp_attribute_container.o 
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/scsi/attribute_container.c
In file included from 
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/scsi/attribute_container.c:1:
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include/../drivers/base/attribute_container.c:22:18: 
base.h: No such file or directory
make[5]: *** 
[/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/scsi/attribute_container.o] 
Error 1
make[4]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/scsi] Error 2
make[3]: *** [_module_/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2] Error 2
make[2]: *** [modules] Error 2
make[1]: *** [modules] Error 2
make[1]: Leaving directory 
`/usr/src/linux-2.6.9-42.0.10.EL_lustre.1.4.10-obj/x86_64/smp'
make: *** [kernel] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.36297 (%install)


RPM build errors:
    user vlad does not exist - using root
    group vlad does not exist - using root
    user vlad does not exist - using root
    group vlad does not exist - using root
    Bad exit status from /var/tmp/rpm-tmp.36297 (%install)
ERROR: Failed executing "rpmbuild --rebuild --define '_topdir 
/var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root 
/var/tmp/OFED' --define 'configure_options --with-cxgb3-mod 
--with-ipath_inf-mod --with-ipoib-mod --with-iser-mod --with-mthca-mod 
--with-sdp-mod --with-srp-mod --with-core-mod --with-user_mad-mod 
--with-user_access-mod --with-addr_trans-mod --with-rds-mod 
--with-vnic-mod ' --define 'KVERSION 2.6.9-42.0.10.EL_lustre.1.4.10smp' 
--define 'KSRC /lib/modules/2.6.9-42.0.10.EL_lustre.1.4.10smp/build' 
--define 'build_kernel_ib 1' --define 'build_kernel_ib_devel 1' --define 
'NETWORK_CONF_DIR /etc/sysconfig/network-scripts' --define 
'modprobe_update 1' --define 'include_ipoib_conf 1' 
/root/OFED-1.2-rc6/SRPMS/ofa_kernel-1.2-rc6.src.rpm"


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This is a PRIVATE message. If you are not the intended recipient, please 
delete without copying and kindly advise us by e-mail of the mistake in 
delivery. NOTE: Regardless of content, this e-mail shall not operate to 
bind CSC to any order or other contract unless pursuant to explicit 
written agreement or government initiative expressly permitting the use of 
e-mail for such purpose.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Mahmoud Hanafi/DEF/CSC at CSC 
Sent by: general-bounces at lists.openfabrics.org
06/19/2007 11:05 AM

To
"Tziporet Koren" <tziporet at mellanox.co.il>
cc
general-bounces at lists.openfabrics.org, ewg at lists.openfabrics.org, 
general at lists.openfabrics.org
Subject
Re: [ofa-general] Anouncement: OFED 1.2 rc6 is avilable


Changing the default install from /usr to /usr/local/ofed1.2 these files 
are copied to ../usr/etc and using the default install location dat.conf 
is still copied to ../usr/etc 

RPM build errors:
   user vlad does not exist - using root
   group vlad does not exist - using root
   user vlad does not exist - using root
   group vlad does not exist - using root
   File not found: 
/var/tmp/OFED/usr/local/ofed1.2/etc/libibverbs.d/mthca.driver
   File not found: 
/var/tmp/OFED/usr/local/ofed1.2/etc/libibverbs.d/cxgb3.driver
   File not found: 
/var/tmp/OFED/usr/local/ofed1.2/etc/libibverbs.d/ipath.driver
   File not found: /var/tmp/OFED/usr/local/ofed1.2/etc/libsdp.conf
   File not found: /var/tmp/OFED/etc/dat.conf
ERROR: Failed executing "rpmbuild --rebuild --define '_topdir 
/var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed1.2' --define 
'build_root /var/tmp/OFED' --define 'configure_options --with-dapl 
--with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon 
--with-libibmad --with-libibumad --with-libibverbs --with-libipathverbs 
--with-libmthca --with-opensm --with-librdmacm --with-libsdp 
--with-openib-diags --with-qlvnictools --with-sdpnetstat --with-srptools 
--with-mstflint --with-perftest --with-tvflash --sysconfdir=/usr/etc 
--mandir=/usr/man' --define 'configure_options32 %{nil} 
--sysconfdir=/usr/etc --mandir=/usr/man' --define 'build_32bit 0' --define 
'_mandir /usr/man' /tmp/OFED-1.2-rc5/SRPMS/ofa_user-1.2-rc5.src.rpm"


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This is a PRIVATE message. If you are not the intended recipient, please 
delete without copying and kindly advise us by e-mail of the mistake in 
delivery. NOTE: Regardless of content, this e-mail shall not operate to 
bind CSC to any order or other contract unless pursuant to explicit 
written agreement or government initiative expressly permitting the use of 
e-mail for such purpose.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


"Tziporet Koren" <tziporet at mellanox.co.il> 
Sent by: general-bounces at lists.openfabrics.org 
06/19/2007 10:47 AM 


To
<ewg at lists.openfabrics.org> 
cc
general at lists.openfabrics.org 
Subject
[ofa-general] Anouncement: OFED 1.2 rc6 is avilable


Hi, 

OFED 1.2-RC6 is available on
http://www.openfabrics.org/builds/ofed-1.2/ 
File: OFED-1.2-rc6.tgz 
To get BUILD_ID run ofed_info 

Please report any issues in bugzilla https://bugs.openfabrics.org/

The GA release is expected this Friday (June 22)

I attach the OFED RN - please review and send me comments to the final
release

Thanks,
Tziporet

========================================================================

Release information: 

OS support: 
Novell: 
   - SLES 9.0 SP3 
   - SLES10 
   - SLES10 SP1 RC5
Redhat: 
   - Redhat EL4 up3, up4 and up5 
   - Redhat EL5 
kernel.org: 
   - 2.6.20 
   - 2.6.19 

Note: Kernel 2.6.21, Fedora C6 and SuSE Pro 10 are not part of the
official list. 
We keep the backport patches for these OSes and make sure OFED compile
and loaded properly but will not do full QA cycle.

Systems: 
   * x86_64 
   * x86 
   * ia64 
   * ppc64 

Main changes from OFED-1.1-rc5: 
===============================
1. Fixed 6 bugs (see attached for fixed issues)

See bugzilla for all open issues. 

Tasks that should be completed for the GA release: 
1. Complete all documentation (release notes, README, etc.) 
2. Run all QA tests on all platforms
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general 
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070619/0f99f0eb/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rc6_fixed_bugs.csv
Type: application/octet-stream
Size: 636 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070619/0f99f0eb/attachment.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: OFED_release_notes.txt
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070619/0f99f0eb/attachment.txt>

From jackm at dev.mellanox.co.il  Tue Jun 19 08:20:46 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Tue, 19 Jun 2007 18:20:46 +0300
Subject: [ofa-general] [PATCH] IB-mlx4: query_device needs to return one less
	srq wqe for max_srq_wr
Message-ID: <200706191820.46443.jackm@dev.mellanox.co.il>

Need to have 1 spare wqe for srq (so that there is always a "next wqe" available
when posting).

Found by Mellanox QA
Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 402f3a2..6cb0ba1 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -120,7 +120,7 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
 	props->max_qp_init_rd_atom = dev->dev->caps.max_qp_init_rdma;
 	props->max_res_rd_atom	   = props->max_qp_rd_atom * props->max_qp;
 	props->max_srq		   = dev->dev->caps.num_srqs - dev->dev->caps.reserved_srqs;
-	props->max_srq_wr	   = dev->dev->caps.max_srq_wqes;
+	props->max_srq_wr	   = dev->dev->caps.max_srq_wqes - 1;
 	props->max_srq_sge	   = dev->dev->caps.max_srq_sge;
 	props->local_ca_ack_delay  = dev->dev->caps.local_ca_ack_delay;
 	props->atomic_cap	   = dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_ATOMIC ?


From john.russo at qlogic.com  Tue Jun 19 08:19:46 2007
From: john.russo at qlogic.com (John Russo)
Date: Tue, 19 Jun 2007 10:19:46 -0500
Subject: [ofa-general] Supported list of Kernels
In-Reply-To: <20070619150629.E2CA7E60871@openfabrics.org>
References: <20070619150629.E2CA7E60871@openfabrics.org>
Message-ID: <99863D2ED484D449811D97A4C44C9CBD4239A1@EPEXCH2.qlogic.org>

The list below shows the same kernel for 3 versions of RedHat 
	- RedHat EL4 up4: 2.6.9-42.ELsmp
	- RedHat EL4 up5: 2.6.9-42.ELsmp
	- RedHat EL5: 2.6.9-42.ELsmp

The kernels that exist "out of the box" for each release are 

	- RedHat EL4 up4: 2.6.9-42.ELsmp  (no change)
	- RedHat EL4 up5: 2.6.9-55.ELsmp
	- RedHat EL5: 2.6.18-8.ELsmp

Is 2.6.9-42 really the only kernel supported/tested or is this a
cut-and-paste mistake:


-----Original Message-----
From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of
general-request at lists.openfabrics.org
Sent: Tuesday, June 19, 2007 11:06 AM
To: general at lists.openfabrics.org
Subject: general Digest, Vol 5, Issue 67

Send general mailing list submissions to
	general at lists.openfabrics.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
or, via email, send a message with subject or body 'help' to
	general-request at lists.openfabrics.org

You can reach the person managing the list at
	general-owner at lists.openfabrics.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of general digest..."


Today's Topics:

   1. Re:  Build error 1.2rc5 (Erez Zilber)
   2.  [PATCH] libmlx4: fix adjustments for minimum qp	capabilities
      in mlx4_create_qp (Jack Morgenstein)
   3. RE:  Build error 1.2rc5 (Baur, Eric)
   4.  Anouncement: OFED 1.2 rc6 is avilable (Tziporet Koren)
   5. Re:  Anouncement: OFED 1.2 rc6 is avilable (Mahmoud Hanafi)


----------------------------------------------------------------------

Message: 1
Date: Tue, 19 Jun 2007 16:45:03 +0300
From: Erez Zilber <erezz at voltaire.com>
Subject: Re: [ofa-general] Build error 1.2rc5
To: MAHMOUD HANAFI <hanafim.ctr at asc.hpc.mil>
Cc: general at lists.openfabrics.org
Message-ID: <4677DDDF.9080109 at voltaire.com>
Content-Type: text/plain; charset=ISO-8859-1

MAHMOUD HANAFI wrote:

> Any one else seen these build error with 1.2rc5?

Try to send this to ewg at lists.openfabrics.org.

Erez


------------------------------

Message: 2
Date: Tue, 19 Jun 2007 16:47:41 +0300
From: Jack Morgenstein <jackm at dev.mellanox.co.il>
Subject: [ofa-general] [PATCH] libmlx4: fix adjustments for minimum qp
	capabilities in mlx4_create_qp
To: Roland Dreier <rdreier at cisco.com>
Cc: general at lists.openfabrics.org
Message-ID: <200706191647.41336.jackm at dev.mellanox.co.il>
Content-Type: text/plain;  charset="us-ascii"

Need to adjust minimum qp capability values prior to size and max
resource
calculations.

Correct the rq values afterwards (as before) if have an srq.

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

Index: new_connectx_user/src/userspace/libmlx4/src/verbs.c
===================================================================
--- new_connectx_user.orig/src/userspace/libmlx4/src/verbs.c
2007-06-18 09:33:04.000000000 +0300
+++ new_connectx_user/src/userspace/libmlx4/src/verbs.c	2007-06-19
09:47:10.000000000 +0300
@@ -355,6 +355,12 @@ struct ibv_qp *mlx4_create_qp(struct ibv
 	if (!qp)
 		return NULL;
 
+	/* adjust minimum cap values */
+	attr->cap.max_recv_wr = attr->cap.max_recv_wr ?
attr->cap.max_recv_wr : 1;
+	attr->cap.max_recv_sge = attr->cap.max_recv_sge ?
attr->cap.max_recv_sge : 1;
+	attr->cap.max_send_wr = attr->cap.max_send_wr ?
attr->cap.max_send_wr : 1;
+	attr->cap.max_send_sge = attr->cap.max_send_sge ?
attr->cap.max_send_sge : 1;
+
 	mlx4_calc_sq_wqe_size(&attr->cap, attr->qp_type, qp);
 
 	/*
@@ -366,9 +372,7 @@ struct ibv_qp *mlx4_create_qp(struct ibv
 	qp->rq.wqe_cnt = align_queue_size(attr->cap.max_recv_wr);
 
 	if (attr->srq)
-		attr->cap.max_recv_wr = qp->rq.wqe_cnt = 0;
-	else if (attr->cap.max_recv_sge < 1)
-		attr->cap.max_recv_sge = 1;
+		attr->cap.max_recv_wr = attr->cap.max_recv_sge =
qp->rq.wqe_cnt = 0;
 
 	if (mlx4_alloc_qp_buf(pd, &attr->cap, attr->qp_type, qp))
 		goto err;


------------------------------

Message: 3
Date: Tue, 19 Jun 2007 10:25:52 -0400
From: "Baur, Eric" <Eric.Baur at gs.com>
Subject: RE: [ofa-general] Build error 1.2rc5
To: <general at lists.openfabrics.org>
Message-ID:
	
<4DCBAA39733E8048992FB7737126041902829564 at gsmbnbp23es.firmwide.corp.gs.c
om>
	
Content-Type: text/plain;	charset="us-ascii"

Yes. The issue seems to be caused by the fact that both 32-bit and
64-bit libs are written to /usr/lib rather than /usr/lib and
/usr/lib64/.

A quick workaround is to modify ofed.conf to only build 64 bit
(build_32bit=0).

-Eric

-----Original Message-----
From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of MAHMOUD
HANAFI
Sent: Tuesday, June 19, 2007 9:31 AM
To: general at lists.openfabrics.org
Subject: [ofa-general] Build error 1.2rc5

Any one else seen these build error with 1.2rc5?


RPM build errors:
     user vlad does not exist - using root
     group vlad does not exist - using root
     user vlad does not exist - using root
     group vlad does not exist - using root
     File listed twice: /usr/lib/libibverbs.so.1
     File listed twice: /usr/lib/libibverbs.so.1.0.0
     File listed twice: /usr/lib/libibverbs.a
     File listed twice: /usr/lib/libibverbs.so
     File listed twice: /usr/lib/libibcm.so.1
     File listed twice: /usr/lib/libibcm.so.1.0
     File listed twice: /usr/lib/libibcm.so.1.0.0
     File listed twice: /usr/lib/libibcm.so
     File listed twice: /usr/lib/libmthca-rdmav2.so
     File listed twice: /usr/lib/libmthca.so
     File listed twice: /usr/lib/libmthca.a
     File listed twice: /usr/lib/libcxgb3-rdmav2.so
     File listed twice: /usr/lib/libcxgb3.so
     File listed twice: /usr/lib/libcxgb3.a
     File listed twice: /usr/lib/libipathverbs-rdmav2.so
     File listed twice: /usr/lib/libipathverbs.so
     File listed twice: /usr/lib/libipathverbs.a
     File listed twice: /usr/lib/libsdp.so
     File listed twice: /usr/lib/libsdp.so.1
     File listed twice: /usr/lib/libsdp.so.1.0.0
     File listed twice: /usr/lib/libibcommon.so.1
     File listed twice: /usr/lib/libibcommon.so.1.0.0
     File listed twice: /usr/lib/libibcommon.a
     File listed twice: /usr/lib/libibcommon.so
     File listed twice: /usr/lib/libibmad.so.1
     File listed twice: /usr/lib/libibmad.so.1.2.0
     File listed twice: /usr/lib/libibmad.a
     File listed twice: /usr/lib/libibmad.so
     File listed twice: /usr/lib/libibumad.so.1
     File listed twice: /usr/lib/libibumad.so.1.0.0
     File listed twice: /usr/lib/libibumad.a
     File listed twice: /usr/lib/libibumad.so
     File listed twice: /usr/lib/libosmcomp.so.1
     File listed twice: /usr/lib/libosmcomp.so.1.0.1
     File listed twice: /usr/lib/libosmcomp-2.1.3.so
     File listed twice: /usr/lib/libosmcomp.a
     File listed twice: /usr/lib/libosmcomp.so
     File listed twice: /usr/lib/libopensm.so.1
     File listed twice: /usr/lib/libopensm.so.1.1.0
     File listed twice: /usr/lib/libopensm-2.1.4.so
     File listed twice: /usr/lib/libopensm.a
     File listed twice: /usr/lib/libopensm.so
     File listed twice: /usr/lib/libosmvendor.so.2
     File listed twice: /usr/lib/libosmvendor.so.2.0.0
     File listed twice: /usr/lib/libosmvendor-2.1.3.so
     File listed twice: /usr/lib/libosmvendor.a
     File listed twice: /usr/lib/libosmvendor.so
     File listed twice: /usr/lib/libosmvendor_openib.so
     File listed twice: /usr/lib/librdmacm.so.1
     File listed twice: /usr/lib/librdmacm.so.1.0.0
     File listed twice: /usr/lib/librdmacm.so.1.0.1
     File listed twice: /usr/lib/librdmacm.so
     File not found: /var/tmp/OFED/etc/dat.conf
     File listed twice: /usr/lib/libdaplcma.a
     File listed twice: /usr/lib/libdat.a
ERROR: Failed executing "rpmbuild --rebuild --define '_topdir
/var/tmp/OFEDRPM' --define '_prefix 
/usr' --define 'build_root /var/tmp/OFED' --define 'configure_options
--with-dapl --with-ipoibtools 
--with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad
--with-libibumad --with-libibverbs 
--with-libipathverbs --with-libmthca --with-opensm --with-librdmacm
--with-libsdp 
--with-openib-diags --with-qlvnictools --with-sdpnetstat --with-srptools
--with-mstflint 
--with-perftest --with-tvflash --sysconfdir=/usr/etc --mandir=/usr/man'
--define 
'configure_options32 --with-dapl --with-ipoibtools --with-libcxgb3
--with-libibcm --with-libibcommon 
--with-libibmad --with-libibumad --with-libibverbs --with-libipathverbs
--with-libmthca 
--with-opensm --with-librdmacm --with-libsdp --with-openib-diags
--with-qlvnictools 
--with-sdpnetstat --with-srptools --sysconfdir=/usr/etc
--mandir=/usr/man' --define 'build_32bit 1' 
--define '_mandir /usr/man'
/tmp/OFED-1.2-rc5/SRPMS/ofa_user-1.2-rc5.src.rpm"

-- 
Mahmoud Hanafi
Senior System Administrator
ASC/MSRC
www.asc.hpc.mil
2435 5th Street
WPAFB, OHIO 45433
(937) 255-1536
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


------------------------------

Message: 4
Date: Tue, 19 Jun 2007 17:47:43 +0300
From: "Tziporet Koren" <tziporet at mellanox.co.il>
Subject: [ofa-general] Anouncement: OFED 1.2 rc6 is avilable
To: <ewg at lists.openfabrics.org>
Cc: general at lists.openfabrics.org
Message-ID:
	<6C2C79E72C305246B504CBA17B5500C9015636A9 at mtlexch01.mtl.com>
Content-Type: text/plain; charset="us-ascii"

 
Hi, 

OFED 1.2-RC6 is available on
http://www.openfabrics.org/builds/ofed-1.2/ 
File: OFED-1.2-rc6.tgz 
To get BUILD_ID run ofed_info 

Please report any issues in bugzilla https://bugs.openfabrics.org/

The GA release is expected this Friday (June 22)

I attach the OFED RN - please review and send me comments to the final
release

Thanks,
Tziporet

========================================================================

Release information: 

OS support: 
Novell: 
    - SLES 9.0 SP3 
    - SLES10 
    - SLES10 SP1 RC5
Redhat: 
    - Redhat EL4 up3, up4 and up5 
    - Redhat EL5 
kernel.org: 
    - 2.6.20 
    - 2.6.19 

Note: Kernel 2.6.21, Fedora C6 and SuSE Pro 10 are not part of the
official list. 
We keep the backport patches for these OSes and make sure OFED compile
and loaded properly but will not do full QA cycle.

Systems: 
    * x86_64 
    * x86 
    * ia64 
    * ppc64 

Main changes from OFED-1.1-rc5: 
===============================
1. Fixed 6 bugs (see attached for fixed issues)

See bugzilla for all open issues. 

Tasks that should be completed for the GA release: 
1. Complete all documentation (release notes, README, etc.) 
2. Run all QA tests on all platforms
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rc6_fixed_bugs.csv
Type: application/octet-stream
Size: 636 bytes
Desc: rc6_fixed_bugs.csv
Url :
http://lists.openfabrics.org/pipermail/general/attachments/20070619/ad94
d792/rc6_fixed_bugs-0001.obj
-------------- next part --------------
	    Open Fabrics Enterprise Distribution (OFED)
			    Version 1.2
			   Release Notes

			   June 2007


========================================================================
=======
Table of Contents
========================================================================
=======
1. Overview, which includes:
	- OFED Distribution Rev 1.2 Contents
	- Supported Platforms and Operating Systems
	- Supported HCA and RNIC Adapter Cards and Firmware Versions
	- Tested Switch Platforms
	- Third party Test Packages
	- OFED sources
2. Main Changes from OFED 1.1
3. Fixed Bugs
4. Known Issues


========================================================================
=======
1. Overview
========================================================================
=======
These are the release notes of OpenFabrics Enterprise Distribution
(OFED)
release 1.2. The OFED software package is composed of several software
modules,
and is intended for use on a computer cluster constructed as an
InfiniBand
subnet or iWARP network.

Note: If you plan to upgrade the OFED package on your cluster, please
upgrade
all of its nodes to this new version.


1.1 OFED 1.2 Contents
---------------------
The OFED package contains the following components:
  o   OpenFabrics core and ULPs:
        - IB HCA drivers (mthca, ipath, ehca)
	- iWARP RNIC driver (cxgb3)
        - core
        - Upper Layer Protocols: IPoIB, SDP, SRP Initiator, iSER Host,
RDS, 
	  uDAPL and VNIC.
  o   OpenFabrics utilities:
        - OpenSM (OSM): InfiniBand Subnet Manager
        - Diagnostic tools
        - Performance tests
  o   MPI:
        - OSU MPI stack supporting the InfiniBand and iWARP interface
        - Open MPI stack supporting the InfiniBand and iWARP interface
	- OSU MVAPICH2 stack supporting the InfiniBand and iWARP
interface
        - MPI benchmark tests (OSU benchmarks, Intel MPI benchmarks,
Presta)
  o   Extra packages:
        - open-iscsi: open-iscsi initiator with iSER support
	- ib-bonding: Bonding driver for IPoIB interface
  o   Sources of all software modules (under conditions mentioned in the
modules'
      LICENSE files)
  o   Documentation

Notes:
1. The cxgb3 driver is in technology preview state.
2. The Virtual NIC (VNIC) driver is presented as a technology preview on
OFED 1.2.
3. All other OFED components are of production quality.
4. See release notes for each package in the docs directory.
5. Any Topspin copyright belongs to Cisco Systems, Inc.


1.2 Supported Platforms and Operating Systems
---------------------------------------------
  o   CPU architectures:
	- x86_64
	- x86
	- ia64
	- ppc64
	
  o   Linux Operating Systems:
	- RedHat EL4 up3: 2.6.9-34.ELsmp
	- RedHat EL4 up4: 2.6.9-42.ELsmp
	- RedHat EL4 up5: 2.6.9-42.ELsmp
	- RedHat EL5: 2.6.9-42.ELsmp
	- SLES9 SP3: 2.6.5-7.244-smp
	- SLES10: 2.6.16.21-0.8-smp
	- kernel.org: 2.6.19.x and 2.6.20.x
	
1.3 HCAs and RNICs Supported
----------------------------
This release supports IB HCAs by Mellanox Technologies, Qlogic and IBM
as
well as iWARP RNICs by Chelsio Communications.

  o   Mellanox Technologies HCAs:
	- InfiniHost (fw-23108 Rev 3.5.000)
	- InfiniHost III Ex (MemFree: fw-25218 Rev 5.2.000 
	                     with memory: fw-25208 Rev 4.8.200)
	- InfiniHost III Lx (fw-25204 Rev 1.2.000)
	The SDR and DDR modes of the InfiniHost III family are
supported.

	For official firmware versions please see:
	http://www.mellanox.com/support/firmware_table.php

  o   Qlogic HCAs:
	- QHT6040 (PathScale InfiniPath HT-460)
	- QHT6140 (PathScale InfiniPath HT-465)
	- QLE6140 (PathScale InfiniPath PE-880)

  o   IBM HCAs:
	- GX Dual-port 4x IB HCA 
	- GX Dual-port 12x IB HCA 

  o   Chelsio RNICs:
	- S310/S320 10GbE Storage Accelerators
	- R310E 10GbE iWARP Adapters

1.4 Switches Supported
----------------------
This release was tested with switches and gateways provided by the
following
companies:
	- Cisco
	- Voltaire
	- Qlogic
	- Flextronics

1.5 Third Party Packages
------------------------
The following third party packages have been tested with OFED 1.2:
1. Intel MPI, Version 3.0 - Package ID: l_mpi_p_3.0.043
2. HP MPI, Version 2.2.5

1.6 OFED Sources
----------------
Source repositories: 
http://www.openfabrics.org/git/
Kernel sources: ~vlad/ofed_1_2/.git
User level Sources are located in all git trees starting with: ofed_1_2/


The kernel sources are based on Linux 2.6.20 mainline kernel. Its
patches
are included in the OFED sources directory.
For details see HOWTO.build_ofed.


========================================================================
=======
2. Main Changes from OFED 1.1
========================================================================
=======
Note: For details regarding the various changes,  please see the release
notes
for each package in the docs directory.

    2.1 General changes
	o Kernel code based on 2.6.20
	o New kernel modules: SA Cache, RDS, VNIC, bonding
	o High availability of SRP and IPoIB in GA level
	o Added iWARP support (with Chelsio driver)
	o MAN pages for libraries (libibverbs and librdmacm)

    2.1 IPoIB
        o IPoIB Connected Mode
	o High availability support using the bonding module.

    2.2 SDP
	o netstat is now available
        o Improved message BW
          - 10X for small messages 
          - 5X for medium messages
        o Scalability
          - Added a memory consumption limit

    2.3 SRP
        o High availability is now supported for all systems.

    2.4 iSER
	o Testing more platforms (e.g., ppc64 and ia64)
	o Updated packages for ISCSI kernel & user components bundled
with OFED.

    2.5 uDAPL
     	o Scalability features needed for Intel MPI 

    2.6 Libraries
        a. libibverbs 1.1
	   o Fork support (requires apps change) 
	   o Better low-level driver handling, including multiple
drivers linked 
	     in statically
	   o Documentation: man pages
        b. librdmacm (uCMA) 1.0
	   o Multicast joining from user space
	   o UD support
	   o Documentation: man pages

    2.7 OSM
        o Routing improvements
        o Performance improvement to min hop and up/down of over an
order of magnitude
        o New fat-tree and LASH algorithms
        o SA optional record support "virtually" complete
        o IB router enablement
        o SA database dump/restore

    2.8 Management 
        o Many diagnostic improvements since OFED 1.1 (see detailed RN)
        o ibdiagui: A GUI for ibdiagnet

    2.9 Install
        o Default prefix directory is now /usr

    2.6 MPI: 
	a. OSU MVAPICH
      		o Version was updated to 0.9.9
	b. Open MPI
		o Version was updated to 1.2.1
		o See http://www.open-mpi.org/svn/new.php for details
	c. OSU MVAPICH2
		o MVAPICH2 version 0.98 was added to the OFED package.
	d. Common MPI setup sourcing
	   Simple menu-driven interface to choose which MPI
implementation to set as
	   the default on a per-user and/or system-wide basis

    2.7 iWARP Support
        o Chelsio NIC supported
        o Verbs and CMA APIs are the same as InfiniBand
        o ULPs supported
          - MPI (mvapich2 tested)
          - uDAPL
        o Basic Testing
          - uDAPL
          - mvapich2
          - NFS-RDMA
        o Status: Beta


========================================================================
=======
3. Fixed Bugs
========================================================================
=======
1. OFED installation now supports installing lib32 on 64-bit systems.
2. Hotplug removal does not hang the system when the device is used by
   the uverbs interface.
3. MVAPICH now works on ppc64.
4. libibcm is now thread safe.

Bugs fixed in each package are reported in the package's release notes.


========================================================================
=======
4. Known Issues
========================================================================
=======
The following is a list of major limitations and known issues of the
various
components of the OFED 1.2 release.

1. Memory registration by  theuser is limited according to the
administrator
   setting. See "Pinning (Locking) User Memory Pages" in OFED_tips.txt
for
   system configuration.
2. Fork support from kernel 2.6.12 and above is available provided
   that applications do not use threads. The fork() is supported as long
   as the parent process does not run before the child exits or calls
exec().
   The former can be achieved by calling wait(childpid), and the latter
can be
   achieved by application specific means.  The Posix system() call is
   supported.
3. The ipath driver is supported only on 64-bit platforms.
4. There are issues using Intel's MPI with the Qlogic card driver that
cause
   failures.

Note: See the release notes of each component for additional issues.

------------------------------

Message: 5
Date: Tue, 19 Jun 2007 11:05:50 -0400
From: Mahmoud Hanafi <mhanafi at csc.com>
Subject: Re: [ofa-general] Anouncement: OFED 1.2 rc6 is avilable
To: "Tziporet Koren" <tziporet at mellanox.co.il>
Cc: general-bounces at lists.openfabrics.org, ewg at lists.openfabrics.org,
	general at lists.openfabrics.org
Message-ID:
	
<OFAEAAE34D.6E969F93-ON852572FF.00523E99-852572FF.0052EE81 at csc.com>
Content-Type: text/plain; charset="us-ascii"

Skipped content of type multipart/alternative-------------- next part
--------------
A non-text attachment was scrubbed...
Name: rc6_fixed_bugs.csv
Type: application/octet-stream
Size: 636 bytes
Desc: not available
Url :
http://lists.openfabrics.org/pipermail/general/attachments/20070619/8bf8
57f3/rc6_fixed_bugs.obj
-------------- next part --------------
	    Open Fabrics Enterprise Distribution (OFED)
			    Version 1.2
			   Release Notes

			   June 2007


========================================================================
=======
Table of Contents
========================================================================
=======
1. Overview, which includes:
	- OFED Distribution Rev 1.2 Contents
	- Supported Platforms and Operating Systems
	- Supported HCA and RNIC Adapter Cards and Firmware Versions
	- Tested Switch Platforms
	- Third party Test Packages
	- OFED sources
2. Main Changes from OFED 1.1
3. Fixed Bugs
4. Known Issues


========================================================================
=======
1. Overview
========================================================================
=======
These are the release notes of OpenFabrics Enterprise Distribution
(OFED)
release 1.2. The OFED software package is composed of several software
modules,
and is intended for use on a computer cluster constructed as an
InfiniBand
subnet or iWARP network.

Note: If you plan to upgrade the OFED package on your cluster, please
upgrade
all of its nodes to this new version.


1.1 OFED 1.2 Contents
---------------------
The OFED package contains the following components:
  o   OpenFabrics core and ULPs:
        - IB HCA drivers (mthca, ipath, ehca)
	- iWARP RNIC driver (cxgb3)
        - core
        - Upper Layer Protocols: IPoIB, SDP, SRP Initiator, iSER Host,
RDS, 
	  uDAPL and VNIC.
  o   OpenFabrics utilities:
        - OpenSM (OSM): InfiniBand Subnet Manager
        - Diagnostic tools
        - Performance tests
  o   MPI:
        - OSU MPI stack supporting the InfiniBand and iWARP interface
        - Open MPI stack supporting the InfiniBand and iWARP interface
	- OSU MVAPICH2 stack supporting the InfiniBand and iWARP
interface
        - MPI benchmark tests (OSU benchmarks, Intel MPI benchmarks,
Presta)
  o   Extra packages:
        - open-iscsi: open-iscsi initiator with iSER support
	- ib-bonding: Bonding driver for IPoIB interface
  o   Sources of all software modules (under conditions mentioned in the
modules'
      LICENSE files)
  o   Documentation

Notes:
1. The cxgb3 driver is in technology preview state.
2. The Virtual NIC (VNIC) driver is presented as a technology preview on
OFED 1.2.
3. All other OFED components are of production quality.
4. See release notes for each package in the docs directory.
5. Any Topspin copyright belongs to Cisco Systems, Inc.


1.2 Supported Platforms and Operating Systems
---------------------------------------------
  o   CPU architectures:
	- x86_64
	- x86
	- ia64
	- ppc64
	
  o   Linux Operating Systems:
	- RedHat EL4 up3: 2.6.9-34.ELsmp
	- RedHat EL4 up4: 2.6.9-42.ELsmp
	- RedHat EL4 up5: 2.6.9-42.ELsmp
	- RedHat EL5: 2.6.9-42.ELsmp
	- SLES9 SP3: 2.6.5-7.244-smp
	- SLES10: 2.6.16.21-0.8-smp
	- kernel.org: 2.6.19.x and 2.6.20.x
	
1.3 HCAs and RNICs Supported
----------------------------
This release supports IB HCAs by Mellanox Technologies, Qlogic and IBM
as
well as iWARP RNICs by Chelsio Communications.

  o   Mellanox Technologies HCAs:
	- InfiniHost (fw-23108 Rev 3.5.000)
	- InfiniHost III Ex (MemFree: fw-25218 Rev 5.2.000 
	                     with memory: fw-25208 Rev 4.8.200)
	- InfiniHost III Lx (fw-25204 Rev 1.2.000)
	The SDR and DDR modes of the InfiniHost III family are
supported.

	For official firmware versions please see:
	http://www.mellanox.com/support/firmware_table.php

  o   Qlogic HCAs:
	- QHT6040 (PathScale InfiniPath HT-460)
	- QHT6140 (PathScale InfiniPath HT-465)
	- QLE6140 (PathScale InfiniPath PE-880)

  o   IBM HCAs:
	- GX Dual-port 4x IB HCA 
	- GX Dual-port 12x IB HCA 

  o   Chelsio RNICs:
	- S310/S320 10GbE Storage Accelerators
	- R310E 10GbE iWARP Adapters

1.4 Switches Supported
----------------------
This release was tested with switches and gateways provided by the
following
companies:
	- Cisco
	- Voltaire
	- Qlogic
	- Flextronics

1.5 Third Party Packages
------------------------
The following third party packages have been tested with OFED 1.2:
1. Intel MPI, Version 3.0 - Package ID: l_mpi_p_3.0.043
2. HP MPI, Version 2.2.5

1.6 OFED Sources
----------------
Source repositories: 
http://www.openfabrics.org/git/
Kernel sources: ~vlad/ofed_1_2/.git
User level Sources are located in all git trees starting with: ofed_1_2/


The kernel sources are based on Linux 2.6.20 mainline kernel. Its
patches
are included in the OFED sources directory.
For details see HOWTO.build_ofed.


========================================================================
=======
2. Main Changes from OFED 1.1
========================================================================
=======
Note: For details regarding the various changes,  please see the release
notes
for each package in the docs directory.

    2.1 General changes
	o Kernel code based on 2.6.20
	o New kernel modules: SA Cache, RDS, VNIC, bonding
	o High availability of SRP and IPoIB in GA level
	o Added iWARP support (with Chelsio driver)
	o MAN pages for libraries (libibverbs and librdmacm)

    2.1 IPoIB
        o IPoIB Connected Mode
	o High availability support using the bonding module.

    2.2 SDP
	o netstat is now available
        o Improved message BW
          - 10X for small messages 
          - 5X for medium messages
        o Scalability
          - Added a memory consumption limit

    2.3 SRP
        o High availability is now supported for all systems.

    2.4 iSER
	o Testing more platforms (e.g., ppc64 and ia64)
	o Updated packages for ISCSI kernel & user components bundled
with OFED.

    2.5 uDAPL
     	o Scalability features needed for Intel MPI 

    2.6 Libraries
        a. libibverbs 1.1
	   o Fork support (requires apps change) 
	   o Better low-level driver handling, including multiple
drivers linked 
	     in statically
	   o Documentation: man pages
        b. librdmacm (uCMA) 1.0
	   o Multicast joining from user space
	   o UD support
	   o Documentation: man pages

    2.7 OSM
        o Routing improvements
        o Performance improvement to min hop and up/down of over an
order of magnitude
        o New fat-tree and LASH algorithms
        o SA optional record support "virtually" complete
        o IB router enablement
        o SA database dump/restore

    2.8 Management 
        o Many diagnostic improvements since OFED 1.1 (see detailed RN)
        o ibdiagui: A GUI for ibdiagnet

    2.9 Install
        o Default prefix directory is now /usr

    2.6 MPI: 
	a. OSU MVAPICH
      		o Version was updated to 0.9.9
	b. Open MPI
		o Version was updated to 1.2.1
		o See http://www.open-mpi.org/svn/new.php for details
	c. OSU MVAPICH2
		o MVAPICH2 version 0.98 was added to the OFED package.
	d. Common MPI setup sourcing
	   Simple menu-driven interface to choose which MPI
implementation to set as
	   the default on a per-user and/or system-wide basis

    2.7 iWARP Support
        o Chelsio NIC supported
        o Verbs and CMA APIs are the same as InfiniBand
        o ULPs supported
          - MPI (mvapich2 tested)
          - uDAPL
        o Basic Testing
          - uDAPL
          - mvapich2
          - NFS-RDMA
        o Status: Beta


========================================================================
=======
3. Fixed Bugs
========================================================================
=======
1. OFED installation now supports installing lib32 on 64-bit systems.
2. Hotplug removal does not hang the system when the device is used by
   the uverbs interface.
3. MVAPICH now works on ppc64.
4. libibcm is now thread safe.

Bugs fixed in each package are reported in the package's release notes.


========================================================================
=======
4. Known Issues
========================================================================
=======
The following is a list of major limitations and known issues of the
various
components of the OFED 1.2 release.

1. Memory registration by  theuser is limited according to the
administrator
   setting. See "Pinning (Locking) User Memory Pages" in OFED_tips.txt
for
   system configuration.
2. Fork support from kernel 2.6.12 and above is available provided
   that applications do not use threads. The fork() is supported as long
   as the parent process does not run before the child exits or calls
exec().
   The former can be achieved by calling wait(childpid), and the latter
can be
   achieved by application specific means.  The Posix system() call is
   supported.
3. The ipath driver is supported only on 64-bit platforms.
4. There are issues using Intel's MPI with the Qlogic card driver that
cause
   failures.

Note: See the release notes of each component for additional issues.

------------------------------

_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

End of general Digest, Vol 5, Issue 67
**************************************


From mst at dev.mellanox.co.il  Tue Jun 19 09:56:58 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 19 Jun 2007 19:56:58 +0300
Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git
In-Reply-To: <adafy4odznl.fsf@cisco.com>
References: <adad4ztfekf.fsf@cisco.com> <20070619055841.GC7069@mellanox.co.il>
	<adafy4odznl.fsf@cisco.com>
Message-ID: <20070619165658.GA31019@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [GIT PULL] please pull infiniband.git
> 
>  > BTW, have you seen the patch for ipoib cm crasher race?
>  > I think we need it in 2.6.22 too.
> 
> I saw the discussion and it does seem like something to fix for
> 2.6.22.  But I didn't see a final conclusion on which patch to use,
> and I don't think I ever saw a patch with a good changelog and
> signed-off-by line either... please resend if I missed it.

Did you get the one I reposted? Is the log OK?

-- 
MST


From sean.hefty at intel.com  Tue Jun 19 10:51:05 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 19 Jun 2007 10:51:05 -0700
Subject: [ofa-general] hang at module removal with local sa patches applied
In-Reply-To: <467717F3.9020806@ichips.intel.com>
Message-ID: <001001c7b29a$68c0ec10$9c98070a@amr.corp.intel.com>

>* It's possible for flush_workqueue to be called from the workqueue thread.
>
>* We hold a mutex when calling flush_workqueue, and a queued work item
>will try to acquire that same mutex.

There's no need to call flush_workqueue unless we're destroying the
port as a result of removing the device.  Can you see if the following patch
fixes your unload issue?  (I wasn't able to reproduce the original problem.)

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
Btw, I will have the cache disabled by default when I request the pull for
2.6.23.


diff --git a/drivers/infiniband/core/local_sa.c b/drivers/infiniband/core/local_sa.c
index aac3f2d..7c9a922 100644
--- a/drivers/infiniband/core/local_sa.c
+++ b/drivers/infiniband/core/local_sa.c
@@ -633,7 +633,6 @@ static void unsubscribe_port(struct sa_db_port *port)
 static void cleanup_port(struct sa_db_port *port)
 {
 	unsubscribe_port(port);
-	flush_workqueue(sa_wq);
 
 	clean_update_list(port);
 	remove_all_attrs(&port->paths);
@@ -1173,6 +1172,7 @@ static void destroy_port(struct sa_db_port *port)
 
 	ib_unregister_mad_agent(port->agent);
 	cleanup_port(port);
+	flush_workqueue(sa_wq);
 }
 
 static void sa_db_add_dev(struct ib_device *device)


From arthur.jones at qlogic.com  Tue Jun 19 16:40:30 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:40:30 -0700
Subject: [ofa-general] [PATCH] IB/ipath -- changes in for-roland for 2.6.23
Message-ID: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>

hi roland,  sorry for the first false alarm!  i
had the wrong CC.  here, again, is our current backlog
of patches that we'd like to go upstream into 2.6.23.
these changes are avail via git-pull from:

git://git.qlogic.com/ipath-linux-2.6 for-roland

which is based on the kernel.org linux-2.6 tree.
i wasn't sure if i should spam the list with all
the patches, as they are avail via the git server
above.  how would you like that done in the future?

thanks...

arthur


From arthur.jones at qlogic.com  Tue Jun 19 16:40:35 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:40:35 -0700
Subject: [ofa-general] [PATCH 01/28] IB/ipath: include <linux/vmalloc.h> to
	fix ppc64 build
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234035.3794.7544.stgit@bauxite.internal.keyresearch.com>

From: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_iba6110.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c
index 4171198..ba73dd0 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c
@@ -36,6 +36,7 @@
  * HT chip.
  */
 
+#include <linux/vmalloc.h>
 #include <linux/pci.h>
 #include <linux/delay.h>
 #include <linux/htirq.h>


From arthur.jones at qlogic.com  Tue Jun 19 16:40:40 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:40:40 -0700
Subject: [ofa-general] [PATCH 02/28] IB/ipath -- support blinking LEDs with
	an led_override file
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234040.3794.82782.stgit@bauxite.internal.keyresearch.com>

From: Michael Albaugh <michael.albaugh at qlogic.com>

When we want to find an InfiniPath HCA in a rack of
nodes, it is often expeditious to blink the status
LEDs via a userspace /sys file.

A write-only led_override "file" is published per device. Writes to this file
are interpreted as (string form) numbers, and the resulting value sent to
ipath_set_led_override(). The upper eight bits are interpretted as a 4.4
fixed-point "frequency in Hertz", and the bottom two 4-bit values are
alternately (D0..3, then D4..7) used by the board-specific LED-setting
function to override the normal state.

Signed-off-by: Michael Albaugh <michael.albaugh at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_driver.c  |   92 +++++++++++++++++++++++++++
 drivers/infiniband/hw/ipath/ipath_iba6110.c |   10 +++
 drivers/infiniband/hw/ipath/ipath_iba6120.c |   10 +++
 drivers/infiniband/hw/ipath/ipath_kernel.h  |   19 ++++++
 drivers/infiniband/hw/ipath/ipath_sysfs.c   |   19 ++++++
 5 files changed, 149 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index e3a2232..0975932 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -1846,6 +1846,87 @@ void ipath_write_kreg_port(const struct ipath_devdata *dd, ipath_kreg regno,
 	ipath_write_kreg(dd, where, value);
 }
 
+/*
+ * Following deal with the "obviously simple" task of overriding the state
+ * of the LEDS, which normally indicate link physical and logical status.
+ * The complications arise in dealing with different hardware mappings
+ * and the board-dependent routine being called from interrupts.
+ * and then there's the requirement to _flash_ them.
+ */
+#define LED_OVER_FREQ_SHIFT 8
+#define LED_OVER_FREQ_MASK (0xFF<<LED_OVER_FREQ_SHIFT)
+/* Below is "non-zero" to force override, but both actual LEDs are off */
+#define LED_OVER_BOTH_OFF (8)
+
+void ipath_run_led_override(unsigned long opaque)
+{
+	struct ipath_devdata *dd = (struct ipath_devdata *)opaque;
+	int timeoff;
+	int pidx;
+	u64 lstate, ltstate, val;
+
+	if (!(dd->ipath_flags & IPATH_INITTED))
+		return;
+
+	pidx = dd->ipath_led_override_phase++ & 1;
+	dd->ipath_led_override = dd->ipath_led_override_vals[pidx];
+	timeoff = dd->ipath_led_override_timeoff;
+
+	/*
+	 * below potentially restores the LED values per current status,
+	 * should also possibly setup the traffic-blink register,
+	 * but leave that to per-chip functions.
+	 */
+	val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibcstatus);
+	ltstate = (val >> INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) &
+		  INFINIPATH_IBCS_LINKTRAININGSTATE_MASK;
+	lstate = (val >> INFINIPATH_IBCS_LINKSTATE_SHIFT) &
+		 INFINIPATH_IBCS_LINKSTATE_MASK;
+
+	dd->ipath_f_setextled(dd, lstate, ltstate);
+	mod_timer(&dd->ipath_led_override_timer, jiffies + timeoff);
+}
+
+void ipath_set_led_override(struct ipath_devdata *dd, unsigned int val)
+{
+	int timeoff, freq;
+
+	if (!(dd->ipath_flags & IPATH_INITTED))
+		return;
+
+	/* First check if we are blinking. If not, use 1HZ polling */
+	timeoff = HZ;
+	freq = (val & LED_OVER_FREQ_MASK) >> LED_OVER_FREQ_SHIFT;
+
+	if (freq) {
+		/* For blink, set each phase from one nybble of val */
+		dd->ipath_led_override_vals[0] = val & 0xF;
+		dd->ipath_led_override_vals[1] = (val >> 4) & 0xF;
+		timeoff = (HZ << 4)/freq;
+	} else {
+		/* Non-blink set both phases the same. */
+		dd->ipath_led_override_vals[0] = val & 0xF;
+		dd->ipath_led_override_vals[1] = val & 0xF;
+	}
+	dd->ipath_led_override_timeoff = timeoff;
+
+	/*
+	 * If the timer has not already been started, do so. Use a "quick"
+	 * timeout so the function will be called soon, to look at our request.
+	 */
+	if (atomic_inc_return(&dd->ipath_led_override_timer_active) == 1) {
+		/* Need to start timer */
+		init_timer(&dd->ipath_led_override_timer);
+		dd->ipath_led_override_timer.function =
+						 ipath_run_led_override;
+		dd->ipath_led_override_timer.data = (unsigned long) dd;
+		dd->ipath_led_override_timer.expires = jiffies + 1;
+		add_timer(&dd->ipath_led_override_timer);
+	} else {
+		atomic_dec(&dd->ipath_led_override_timer_active);
+	}
+}
+
 /**
  * ipath_shutdown_device - shut down a device
  * @dd: the infinipath device
@@ -1909,7 +1990,6 @@ void ipath_shutdown_device(struct ipath_devdata *dd)
 	 * Turn the LEDs off explictly for the same reason.
 	 */
 	dd->ipath_f_quiet_serdes(dd);
-	dd->ipath_f_setextled(dd, 0, 0);
 
 	if (dd->ipath_stats_timer_active) {
 		del_timer_sync(&dd->ipath_stats_timer);
@@ -2085,6 +2165,16 @@ int ipath_reset_device(int unit)
 		goto bail;
 	}
 
+	if (atomic_read(&dd->ipath_led_override_timer_active)) {
+		/* Need to stop LED timer, _then_ shut off LEDs */
+		del_timer_sync(&dd->ipath_led_override_timer);
+		atomic_set(&dd->ipath_led_override_timer_active, 0);
+	}
+
+	/* Shut off LEDs after we are sure timer is not running */
+	dd->ipath_led_override = LED_OVER_BOTH_OFF;
+	dd->ipath_f_setextled(dd, 0, 0);
+
 	dev_info(&dd->pcidev->dev, "Reset on unit %u requested\n", unit);
 
 	if (!dd->ipath_kregbase || !(dd->ipath_flags & IPATH_PRESENT)) {
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c
index ba73dd0..4372c6c 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c
@@ -1065,6 +1065,16 @@ static void ipath_setup_ht_setextled(struct ipath_devdata *dd,
 	if (ipath_diag_inuse)
 		return;
 
+	/* Allow override of LED display for, e.g. Locating system in rack */
+	if (dd->ipath_led_override) {
+		ltst = (dd->ipath_led_override & IPATH_LED_PHYS)
+			? INFINIPATH_IBCS_LT_STATE_LINKUP
+			: INFINIPATH_IBCS_LT_STATE_DISABLED;
+		lst = (dd->ipath_led_override & IPATH_LED_LOG)
+			? INFINIPATH_IBCS_L_STATE_ACTIVE
+			: INFINIPATH_IBCS_L_STATE_DOWN;
+	}
+
 	/*
 	 * start by setting both LED control bits to off, then turn
 	 * on the appropriate bit(s).
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c
index 4e2e3df..bcb70d6 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c
@@ -797,6 +797,16 @@ static void ipath_setup_pe_setextled(struct ipath_devdata *dd, u64 lst,
 	if (ipath_diag_inuse)
 		return;
 
+	/* Allow override of LED display for, e.g. Locating system in rack */
+	if (dd->ipath_led_override) {
+		ltst = (dd->ipath_led_override & IPATH_LED_PHYS)
+			? INFINIPATH_IBCS_LT_STATE_LINKUP
+			: INFINIPATH_IBCS_LT_STATE_DISABLED;
+		lst = (dd->ipath_led_override & IPATH_LED_LOG)
+			? INFINIPATH_IBCS_L_STATE_ACTIVE
+			: INFINIPATH_IBCS_L_STATE_DOWN;
+	}
+
 	extctl = dd->ipath_extctrl & ~(INFINIPATH_EXTC_LED1PRIPORT_ON |
 				       INFINIPATH_EXTC_LED2PRIPORT_ON);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index 12194f3..2f39db7 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -575,6 +575,16 @@ struct ipath_devdata {
 	u16 ipath_gpio_scl_num;
 	u64 ipath_gpio_sda;
 	u64 ipath_gpio_scl;
+
+	/* used to override LED behavior */
+	u8 ipath_led_override;  /* Substituted for normal value, if non-zero */
+	u16 ipath_led_override_timeoff; /* delta to next timer event */
+	u8 ipath_led_override_vals[2]; /* Alternates per blink-frame */
+	u8 ipath_led_override_phase; /* Just counts, LSB picks from vals[] */
+	atomic_t ipath_led_override_timer_active;
+	/* Used to flash LEDs in override mode */
+	struct timer_list ipath_led_override_timer;
+
 };
 
 /* Private data for file operations */
@@ -717,6 +727,15 @@ u64 ipath_snap_cntr(struct ipath_devdata *, ipath_creg);
 void ipath_disarm_senderrbufs(struct ipath_devdata *, int);
 
 /*
+ * Set LED override, only the two LSBs have "public" meaning, but
+ * any non-zero value substitutes them for the Link and LinkTrain
+ * LED states.
+ */
+#define IPATH_LED_PHYS 1 /* Physical (linktraining) GREEN LED */
+#define IPATH_LED_LOG 2  /* Logical (link) YELLOW LED */
+void ipath_set_led_override(struct ipath_devdata *dd, unsigned int val);
+
+/*
  * number of words used for protocol header if not set by ipath_userinit();
  */
 #define IPATH_DFLT_RCVHDRSIZE 9
diff --git a/drivers/infiniband/hw/ipath/ipath_sysfs.c b/drivers/infiniband/hw/ipath/ipath_sysfs.c
index 4dc398d..17ec145 100644
--- a/drivers/infiniband/hw/ipath/ipath_sysfs.c
+++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c
@@ -596,6 +596,23 @@ bail:
 	return ret;
 }
 
+static ssize_t store_led_override(struct device *dev,
+			  struct device_attribute *attr,
+			  const char *buf,
+			  size_t count)
+{
+	struct ipath_devdata *dd = dev_get_drvdata(dev);
+	int ret;
+	u16 val;
+
+	ret = ipath_parse_ushort(buf, &val);
+	if (ret > 0)
+		ipath_set_led_override(dd, val);
+	else
+		ipath_dev_err(dd, "attempt to set invalid LED override\n");
+	return ret;
+}
+
 
 static DRIVER_ATTR(num_units, S_IRUGO, show_num_units, NULL);
 static DRIVER_ATTR(version, S_IRUGO, show_version, NULL);
@@ -625,6 +642,7 @@ static DEVICE_ATTR(status_str, S_IRUGO, show_status_str, NULL);
 static DEVICE_ATTR(boardversion, S_IRUGO, show_boardversion, NULL);
 static DEVICE_ATTR(unit, S_IRUGO, show_unit, NULL);
 static DEVICE_ATTR(rx_pol_inv, S_IWUSR, NULL, store_rx_pol_inv);
+static DEVICE_ATTR(led_override, S_IWUSR, NULL, store_led_override);
 
 static struct attribute *dev_attributes[] = {
 	&dev_attr_guid.attr,
@@ -641,6 +659,7 @@ static struct attribute *dev_attributes[] = {
 	&dev_attr_unit.attr,
 	&dev_attr_enabled.attr,
 	&dev_attr_rx_pol_inv.attr,
+	&dev_attr_led_override.attr,
 	NULL
 };
 

From arthur.jones at qlogic.com  Tue Jun 19 16:40:45 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:40:45 -0700
Subject: [ofa-general] [PATCH 03/28] IB/ipath -- lock and always use shadow
	copies of GPIO register
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234045.3794.92822.stgit@bauxite.internal.keyresearch.com>

From: Michael Albaugh <michael.albaugh at qlogic.com>

The new LED blinking interface adds more contention
for the unprotected GPIO pins that were already shared,
though not commonly at the same time.  We add locks to
the accesses to these pins so that Read-Modify-Write
is now safe.  Some of these locks are added at
interrupt context, so we shadow the registers
which drive and inspect these pins to avoid the
mmio read/writes.  This mitigates the effects of
the locks and hastens us through the interrupt.

Add locking and always use shadows, for registers controlling GPIO pins
(That would be ExtCtrl and GPIOout). The use of shadows implies doing less
I/O, which can make I2C operation too fast on some platforms. An explicit
udelay(1) in SCL manipulation fixes that.

Signed-off-by: Michael Albaugh <michael.albaugh at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_eeprom.c    |   68 +++++++++++++++----------
 drivers/infiniband/hw/ipath/ipath_iba6110.c   |    3 +
 drivers/infiniband/hw/ipath/ipath_iba6120.c   |    3 +
 drivers/infiniband/hw/ipath/ipath_init_chip.c |    2 +
 drivers/infiniband/hw/ipath/ipath_kernel.h    |    7 ++-
 5 files changed, 53 insertions(+), 30 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_eeprom.c b/drivers/infiniband/hw/ipath/ipath_eeprom.c
index 030185f..26daac9 100644
--- a/drivers/infiniband/hw/ipath/ipath_eeprom.c
+++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c
@@ -95,39 +95,37 @@ static int i2c_gpio_set(struct ipath_devdata *dd,
 			enum i2c_type line,
 			enum i2c_state new_line_state)
 {
-	u64 read_val, write_val, mask, *gpioval;
+	u64 out_mask, dir_mask, *gpioval;
+	unsigned long flags = 0;
 
 	gpioval = &dd->ipath_gpio_out;
-	read_val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_extctrl);
-	if (line == i2c_line_scl)
-		mask = dd->ipath_gpio_scl;
-	else
-		mask = dd->ipath_gpio_sda;
 
-	if (new_line_state == i2c_line_high)
+	if (line == i2c_line_scl) {
+		dir_mask = dd->ipath_gpio_scl;
+		out_mask = (1UL << dd->ipath_gpio_scl_num);
+	} else {
+		dir_mask = dd->ipath_gpio_sda;
+		out_mask = (1UL << dd->ipath_gpio_sda_num);
+	}
+
+	spin_lock_irqsave(&dd->ipath_gpio_lock, flags);
+	if (new_line_state == i2c_line_high) {
 		/* tri-state the output rather than force high */
-		write_val = read_val & ~mask;
-	else
+		dd->ipath_extctrl &= ~dir_mask;
+	} else {
 		/* config line to be an output */
-		write_val = read_val | mask;
-	ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, write_val);
+		dd->ipath_extctrl |= dir_mask;
+	}
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, dd->ipath_extctrl);
 
-	/* set high and verify */
+	/* set output as well (no real verify) */
 	if (new_line_state == i2c_line_high)
-		write_val = 0x1UL;
+		*gpioval |= out_mask;
 	else
-		write_val = 0x0UL;
+		*gpioval &= ~out_mask;
 
-	if (line == i2c_line_scl) {
-		write_val <<= dd->ipath_gpio_scl_num;
-		*gpioval = *gpioval & ~(1UL << dd->ipath_gpio_scl_num);
-		*gpioval |= write_val;
-	} else {
-		write_val <<= dd->ipath_gpio_sda_num;
-		*gpioval = *gpioval & ~(1UL << dd->ipath_gpio_sda_num);
-		*gpioval |= write_val;
-	}
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_out, *gpioval);
+	spin_unlock_irqrestore(&dd->ipath_gpio_lock, flags);
 
 	return 0;
 }
@@ -145,8 +143,9 @@ static int i2c_gpio_get(struct ipath_devdata *dd,
 			enum i2c_type line,
 			enum i2c_state *curr_statep)
 {
-	u64 read_val, write_val, mask;
+	u64 read_val, mask;
 	int ret;
+	unsigned long flags = 0;
 
 	/* check args */
 	if (curr_statep == NULL) {
@@ -154,15 +153,21 @@ static int i2c_gpio_get(struct ipath_devdata *dd,
 		goto bail;
 	}
 
-	read_val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_extctrl);
 	/* config line to be an input */
 	if (line == i2c_line_scl)
 		mask = dd->ipath_gpio_scl;
 	else
 		mask = dd->ipath_gpio_sda;
-	write_val = read_val & ~mask;
-	ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, write_val);
+
+	spin_lock_irqsave(&dd->ipath_gpio_lock, flags);
+	dd->ipath_extctrl &= ~mask;
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, dd->ipath_extctrl);
+	/*
+	 * Below is very unlikely to reflect true input state if Output
+	 * Enable actually changed.
+	 */
 	read_val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_extstatus);
+	spin_unlock_irqrestore(&dd->ipath_gpio_lock, flags);
 
 	if (read_val & mask)
 		*curr_statep = i2c_line_high;
@@ -192,6 +197,7 @@ static void i2c_wait_for_writes(struct ipath_devdata *dd)
 
 static void scl_out(struct ipath_devdata *dd, u8 bit)
 {
+	udelay(1);
 	i2c_gpio_set(dd, i2c_line_scl, bit ? i2c_line_high : i2c_line_low);
 
 	i2c_wait_for_writes(dd);
@@ -314,12 +320,18 @@ static int eeprom_reset(struct ipath_devdata *dd)
 	int clock_cycles_left = 9;
 	u64 *gpioval = &dd->ipath_gpio_out;
 	int ret;
+	unsigned long flags;
 
-	eeprom_init = 1;
+	spin_lock_irqsave(&dd->ipath_gpio_lock, flags);
+	/* Make sure shadows are consistent */
+	dd->ipath_extctrl = ipath_read_kreg64(dd, dd->ipath_kregs->kr_extctrl);
 	*gpioval = ipath_read_kreg64(dd, dd->ipath_kregs->kr_gpio_out);
+	spin_unlock_irqrestore(&dd->ipath_gpio_lock, flags);
+
 	ipath_cdbg(VERBOSE, "Resetting i2c eeprom; initial gpioout reg "
 		   "is %llx\n", (unsigned long long) *gpioval);
 
+	eeprom_init = 1;
 	/*
 	 * This is to get the i2c into a known state, by first going low,
 	 * then tristate sda (and then tristate scl as first thing
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c
index 4372c6c..8482ea3 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c
@@ -1059,6 +1059,7 @@ static void ipath_setup_ht_setextled(struct ipath_devdata *dd,
 				     u64 lst, u64 ltst)
 {
 	u64 extctl;
+	unsigned long flags = 0;
 
 	/* the diags use the LED to indicate diag info, so we leave
 	 * the external LED alone when the diags are running */
@@ -1075,6 +1076,7 @@ static void ipath_setup_ht_setextled(struct ipath_devdata *dd,
 			: INFINIPATH_IBCS_L_STATE_DOWN;
 	}
 
+	spin_lock_irqsave(&dd->ipath_gpio_lock, flags);
 	/*
 	 * start by setting both LED control bits to off, then turn
 	 * on the appropriate bit(s).
@@ -1103,6 +1105,7 @@ static void ipath_setup_ht_setextled(struct ipath_devdata *dd,
 	}
 	dd->ipath_extctrl = extctl;
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, extctl);
+	spin_unlock_irqrestore(&dd->ipath_gpio_lock, flags);
 }
 
 static void ipath_init_ht_variables(struct ipath_devdata *dd)
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c
index bcb70d6..2345bb0 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c
@@ -791,6 +791,7 @@ static void ipath_setup_pe_setextled(struct ipath_devdata *dd, u64 lst,
 				     u64 ltst)
 {
 	u64 extctl;
+	unsigned long flags = 0;
 
 	/* the diags use the LED to indicate diag info, so we leave
 	 * the external LED alone when the diags are running */
@@ -807,6 +808,7 @@ static void ipath_setup_pe_setextled(struct ipath_devdata *dd, u64 lst,
 			: INFINIPATH_IBCS_L_STATE_DOWN;
 	}
 
+	spin_lock_irqsave(&dd->ipath_gpio_lock, flags);
 	extctl = dd->ipath_extctrl & ~(INFINIPATH_EXTC_LED1PRIPORT_ON |
 				       INFINIPATH_EXTC_LED2PRIPORT_ON);
 
@@ -816,6 +818,7 @@ static void ipath_setup_pe_setextled(struct ipath_devdata *dd, u64 lst,
 		extctl |= INFINIPATH_EXTC_LED1PRIPORT_ON;
 	dd->ipath_extctrl = extctl;
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, extctl);
+	spin_unlock_irqrestore(&dd->ipath_gpio_lock, flags);
 }
 
 /**
diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c
index 7045ba6..f6ee7a8 100644
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c
@@ -340,6 +340,8 @@ static int init_chip_first(struct ipath_devdata *dd,
 
 	spin_lock_init(&dd->ipath_tid_lock);
 
+	spin_lock_init(&dd->ipath_gpio_lock);
+
 done:
 	*pdp = pd;
 	return ret;
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index 2f39db7..bd1088a 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -399,6 +399,8 @@ struct ipath_devdata {
 	u64 ipath_gpio_out;
 	/* shadow the gpio mask register */
 	u64 ipath_gpio_mask;
+	/* shadow the gpio output enable, etc... */
+	u64 ipath_extctrl;
 	/* kr_revision shadow */
 	u64 ipath_revision;
 	/*
@@ -473,8 +475,6 @@ struct ipath_devdata {
 	u32 ipath_cregbase;
 	/* shadow the control register contents */
 	u32 ipath_control;
-	/* shadow the gpio output contents */
-	u32 ipath_extctrl;
 	/* PCI revision register (HTC rev on FPGA) */
 	u32 ipath_pcirev;
 
@@ -576,6 +576,9 @@ struct ipath_devdata {
 	u64 ipath_gpio_sda;
 	u64 ipath_gpio_scl;
 
+	/* lock for doing RMW of shadows/regs for ExtCtrl and GPIO */
+	spinlock_t ipath_gpio_lock;
+
 	/* used to override LED behavior */
 	u8 ipath_led_override;  /* Substituted for normal value, if non-zero */
 	u16 ipath_led_override_timeoff; /* delta to next timer event */


From arthur.jones at qlogic.com  Tue Jun 19 16:40:51 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:40:51 -0700
Subject: [ofa-general] [PATCH 04/28] IB/ipath - remove incompletely
	implemented ipath_runtime flags and code
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234050.3794.74578.stgit@bauxite.internal.keyresearch.com>

From: John Gregor <john.gregor at qlogic.com>

The IPATH_RUNTIME_PBC_REWRITE and the IPATH_RUNTIME_LOOSE_DMA_ALIGN flags
were not ever implemented correctly and did not turn out to be necessary.
Remove the last vestiges of these flags but mark the spot with a comment
to remind us to not reuse these flags in the interest of binary
compatibility.  The INFINIPATH_XGXS_SUPPRESS_ARMLAUNCH_ERR bit was also
not found to be useful, so it was dropped in the cleanup as well.

Signed-off-by: John Gregor <john.gregor at qlogic.com>
Signed-off-by: Arthur Jones <arthur.jones at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_common.h  |    3 +--
 drivers/infiniband/hw/ipath/ipath_iba6120.c |   25 -------------------------
 2 files changed, 1 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_common.h b/drivers/infiniband/hw/ipath/ipath_common.h
index 10c008f..12e1349 100644
--- a/drivers/infiniband/hw/ipath/ipath_common.h
+++ b/drivers/infiniband/hw/ipath/ipath_common.h
@@ -189,8 +189,7 @@ typedef enum _ipath_ureg {
 #define IPATH_RUNTIME_FORCE_WC_ORDER	0x4
 #define IPATH_RUNTIME_RCVHDR_COPY	0x8
 #define IPATH_RUNTIME_MASTER	0x10
-#define IPATH_RUNTIME_PBC_REWRITE 0x20
-#define IPATH_RUNTIME_LOOSE_DMA_ALIGN 0x40
+/* 0x20 and 0x40 are no longer used, but are reserved for ABI compatibility */
 
 /*
  * This structure is returned by ipath_userinit() immediately after
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c
index 2345bb0..7115907 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c
@@ -296,13 +296,6 @@ static const struct ipath_cregs ipath_pe_cregs = {
 #define IPATH_GPIO_SCL (1ULL << \
 	(_IPATH_GPIO_SCL_NUM+INFINIPATH_EXTC_GPIOOE_SHIFT))
 
-/*
- * Rev2 silicon allows suppressing check for ArmLaunch errors.
- * this can speed up short packet sends on systems that do
- * not guaranteee write-order.
- */
-#define INFINIPATH_XGXS_SUPPRESS_ARMLAUNCH_ERR (1ULL<<63)
-
 /* 6120 specific hardware errors... */
 static const struct ipath_hwerror_msgs ipath_6120_hwerror_msgs[] = {
 	INFINIPATH_HWE_MSG(PCIEPOISONEDTLP, "PCIe Poisoned TLP"),
@@ -680,17 +673,6 @@ static int ipath_pe_bringup_serdes(struct ipath_devdata *dd)
 		val |= dd->ipath_rx_pol_inv <<
 			INFINIPATH_XGXS_RX_POL_SHIFT;
 	}
-	if (dd->ipath_minrev >= 2) {
-		/* Rev 2. can tolerate multiple writes to PBC, and
-		 * allowing them can provide lower latency on some
-		 * CPUs, but this feature is off by default, only
-		 * turned on by setting D63 of XGXSconfig reg.
-		 * May want to make this conditional more
-		 * fine-grained in future. This is not exactly
-		 * related to XGXS, but where the bit ended up.
-		 */
-		val |= INFINIPATH_XGXS_SUPPRESS_ARMLAUNCH_ERR;
-	}
 	if (val != prev_val)
 		ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, val);
 
@@ -1324,13 +1306,6 @@ static int ipath_pe_get_base_info(struct ipath_portdata *pd, void *kbase)
 
 	dd = pd->port_dd;
 
-	if (dd != NULL && dd->ipath_minrev >= 2) {
-		ipath_cdbg(PROC, "IBA6120 Rev2, allow multiple PBC write\n");
-		kinfo->spi_runtime_flags |= IPATH_RUNTIME_PBC_REWRITE;
-		ipath_cdbg(PROC, "IBA6120 Rev2, allow loose DMA alignment\n");
-		kinfo->spi_runtime_flags |= IPATH_RUNTIME_LOOSE_DMA_ALIGN;
-	}
-
 done:
 	kinfo->spi_runtime_flags |= IPATH_RUNTIME_PCIE;
 	return 0;


From arthur.jones at qlogic.com  Tue Jun 19 16:40:57 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:40:57 -0700
Subject: [ofa-general] [PATCH 05/28] IB/ipath -- Log "active" time and some
	errors to EEPROM
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234056.3794.46973.stgit@bauxite.internal.keyresearch.com>

From: Michael Albaugh <michael.albaugh at qlogic.com>

We currently track various errors, now we enhance that
capability by logging some of them to EEPROM.  We also
now log a cumulative "active" time defined by traffic
though the InfiniPath HCA beyond the normal SM traffic.

Signed-off-by: Michael Albaugh <michael.albaugh at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_driver.c    |    3 
 drivers/infiniband/hw/ipath/ipath_eeprom.c    |  233 ++++++++++++++++++++++++-
 drivers/infiniband/hw/ipath/ipath_iba6110.c   |   22 ++
 drivers/infiniband/hw/ipath/ipath_iba6120.c   |   27 +++
 drivers/infiniband/hw/ipath/ipath_init_chip.c |    2 
 drivers/infiniband/hw/ipath/ipath_intr.c      |    8 +
 drivers/infiniband/hw/ipath/ipath_kernel.h    |   38 ++++
 drivers/infiniband/hw/ipath/ipath_stats.c     |   23 ++
 drivers/infiniband/hw/ipath/ipath_sysfs.c     |   22 ++
 9 files changed, 370 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index 0975932..e963986 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -2005,6 +2005,9 @@ void ipath_shutdown_device(struct ipath_devdata *dd)
 			 ~0ULL & ~INFINIPATH_HWE_MEMBISTFAILED);
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_errorclear, -1LL);
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, -1LL);
+
+	ipath_cdbg(VERBOSE, "Flush time and errors to EEPROM\n");
+	ipath_update_eeprom_log(dd);
 }
 
 /**
diff --git a/drivers/infiniband/hw/ipath/ipath_eeprom.c b/drivers/infiniband/hw/ipath/ipath_eeprom.c
index 26daac9..9be1b9a 100644
--- a/drivers/infiniband/hw/ipath/ipath_eeprom.c
+++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c
@@ -367,8 +367,8 @@ bail:
  * @len: number of bytes to receive
  */
 
-int ipath_eeprom_read(struct ipath_devdata *dd, u8 eeprom_offset,
-		      void *buffer, int len)
+static int ipath_eeprom_internal_read(struct ipath_devdata *dd,
+					u8 eeprom_offset, void *buffer, int len)
 {
 	/* compiler complains unless initialized */
 	u8 single_byte = 0;
@@ -418,6 +418,7 @@ bail:
 	return ret;
 }
 
+
 /**
  * ipath_eeprom_write - writes data to the eeprom via I2C
  * @dd: the infinipath device
@@ -425,8 +426,8 @@ bail:
  * @buffer: data to write
  * @len: number of bytes to write
  */
-int ipath_eeprom_write(struct ipath_devdata *dd, u8 eeprom_offset,
-		       const void *buffer, int len)
+int ipath_eeprom_internal_write(struct ipath_devdata *dd, u8 eeprom_offset,
+				const void *buffer, int len)
 {
 	u8 single_byte;
 	int sub_len;
@@ -500,6 +501,38 @@ bail:
 	return ret;
 }
 
+/*
+ * The public entry-points ipath_eeprom_read() and ipath_eeprom_write()
+ * are now just wrappers around the internal functions.
+ */
+int ipath_eeprom_read(struct ipath_devdata *dd, u8 eeprom_offset,
+			void *buff, int len)
+{
+	int ret;
+
+	ret = down_interruptible(&dd->ipath_eep_sem);
+	if (!ret) {
+		ret = ipath_eeprom_internal_read(dd, eeprom_offset, buff, len);
+		up(&dd->ipath_eep_sem);
+	}
+
+	return ret;
+}
+
+int ipath_eeprom_write(struct ipath_devdata *dd, u8 eeprom_offset,
+			const void *buff, int len)
+{
+	int ret;
+
+	ret = down_interruptible(&dd->ipath_eep_sem);
+	if (!ret) {
+		ret = ipath_eeprom_internal_write(dd, eeprom_offset, buff, len);
+		up(&dd->ipath_eep_sem);
+	}
+
+	return ret;
+}
+
 static u8 flash_csum(struct ipath_flash *ifp, int adjust)
 {
 	u8 *ip = (u8 *) ifp;
@@ -527,7 +560,7 @@ void ipath_get_eeprom_info(struct ipath_devdata *dd)
 	void *buf;
 	struct ipath_flash *ifp;
 	__be64 guid;
-	int len;
+	int len, eep_stat;
 	u8 csum, *bguid;
 	int t = dd->ipath_unit;
 	struct ipath_devdata *dd0 = ipath_lookup(0);
@@ -571,7 +604,11 @@ void ipath_get_eeprom_info(struct ipath_devdata *dd)
 		goto bail;
 	}
 
-	if (ipath_eeprom_read(dd, 0, buf, len)) {
+	down(&dd->ipath_eep_sem);
+	eep_stat = ipath_eeprom_internal_read(dd, 0, buf, len);
+	up(&dd->ipath_eep_sem);
+
+	if (eep_stat) {
 		ipath_dev_err(dd, "Failed reading GUID from eeprom\n");
 		goto done;
 	}
@@ -646,8 +683,192 @@ void ipath_get_eeprom_info(struct ipath_devdata *dd)
 	ipath_cdbg(VERBOSE, "Initted GUID to %llx from eeprom\n",
 		   (unsigned long long) be64_to_cpu(dd->ipath_guid));
 
+	memcpy(&dd->ipath_eep_st_errs, &ifp->if_errcntp, IPATH_EEP_LOG_CNT);
+	/*
+	 * Power-on (actually "active") hours are kept as little-endian value
+	 * in EEPROM, but as seconds in a (possibly as small as 24-bit)
+	 * atomic_t while running.
+	 */
+	atomic_set(&dd->ipath_active_time, 0);
+	dd->ipath_eep_hrs = ifp->if_powerhour[0] | (ifp->if_powerhour[1] << 8);
+
 done:
 	vfree(buf);
 
 bail:;
 }
+
+/**
+ * ipath_update_eeprom_log - copy active-time and error counters to eeprom
+ * @dd: the infinipath device
+ *
+ * Although the time is kept as seconds in the ipath_devdata struct, it is
+ * rounded to hours for re-write, as we have only 16 bits in EEPROM.
+ * First-cut code reads whole (expected) struct ipath_flash, modifies,
+ * re-writes. Future direction: read/write only what we need, assuming
+ * that the EEPROM had to have been "good enough" for driver init, and
+ * if not, we aren't making it worse.
+ *
+ */
+
+int ipath_update_eeprom_log(struct ipath_devdata *dd)
+{
+	void *buf;
+	struct ipath_flash *ifp;
+	int len, hi_water;
+	uint32_t new_time, new_hrs;
+	u8 csum;
+	int ret, idx;
+	unsigned long flags;
+
+	/* first, check if we actually need to do anything. */
+	ret = 0;
+	for (idx = 0; idx < IPATH_EEP_LOG_CNT; ++idx) {
+		if (dd->ipath_eep_st_new_errs[idx]) {
+			ret = 1;
+			break;
+		}
+	}
+	new_time = atomic_read(&dd->ipath_active_time);
+
+	if (ret == 0 && new_time < 3600)
+		return 0;
+
+	/*
+	 * The quick-check above determined that there is something worthy
+	 * of logging, so get current contents and do a more detailed idea.
+	 */
+	len = offsetof(struct ipath_flash, if_future);
+	buf = vmalloc(len);
+	ret = 1;
+	if (!buf) {
+		ipath_dev_err(dd, "Couldn't allocate memory to read %u "
+				"bytes from eeprom for logging\n", len);
+		goto bail;
+	}
+
+	/* Grab semaphore and read current EEPROM. If we get an
+	 * error, let go, but if not, keep it until we finish write.
+	 */
+	ret = down_interruptible(&dd->ipath_eep_sem);
+	if (ret) {
+		ipath_dev_err(dd, "Unable to acquire EEPROM for logging\n");
+		goto free_bail;
+	}
+	ret = ipath_eeprom_internal_read(dd, 0, buf, len);
+	if (ret) {
+		up(&dd->ipath_eep_sem);
+		ipath_dev_err(dd, "Unable read EEPROM for logging\n");
+		goto free_bail;
+	}
+	ifp = (struct ipath_flash *)buf;
+
+	csum = flash_csum(ifp, 0);
+	if (csum != ifp->if_csum) {
+		up(&dd->ipath_eep_sem);
+		ipath_dev_err(dd, "EEPROM cks err (0x%02X, S/B 0x%02X)\n",
+				csum, ifp->if_csum);
+		ret = 1;
+		goto free_bail;
+	}
+	hi_water = 0;
+	spin_lock_irqsave(&dd->ipath_eep_st_lock, flags);
+	for (idx = 0; idx < IPATH_EEP_LOG_CNT; ++idx) {
+		int new_val = dd->ipath_eep_st_new_errs[idx];
+		if (new_val) {
+			/*
+			 * If we have seen any errors, add to EEPROM values
+			 * We need to saturate at 0xFF (255) and we also
+			 * would need to adjust the checksum if we were
+			 * trying to minimize EEPROM traffic
+			 * Note that we add to actual current count in EEPROM,
+			 * in case it was altered while we were running.
+			 */
+			new_val += ifp->if_errcntp[idx];
+			if (new_val > 0xFF)
+				new_val = 0xFF;
+			if (ifp->if_errcntp[idx] != new_val) {
+				ifp->if_errcntp[idx] = new_val;
+				hi_water = offsetof(struct ipath_flash,
+						if_errcntp) + idx;
+			}
+			/*
+			 * update our shadow (used to minimize EEPROM
+			 * traffic), to match what we are about to write.
+			 */
+			dd->ipath_eep_st_errs[idx] = new_val;
+			dd->ipath_eep_st_new_errs[idx] = 0;
+		}
+	}
+	/*
+	 * now update active-time. We would like to round to the nearest hour
+	 * but unless atomic_t are sure to be proper signed ints we cannot,
+	 * because we need to account for what we "transfer" to EEPROM and
+	 * if we log an hour at 31 minutes, then we would need to set
+	 * active_time to -29 to accurately count the _next_ hour.
+	 */
+	if (new_time > 3600) {
+		new_hrs = new_time / 3600;
+		atomic_sub((new_hrs * 3600), &dd->ipath_active_time);
+		new_hrs += dd->ipath_eep_hrs;
+		if (new_hrs > 0xFFFF)
+			new_hrs = 0xFFFF;
+		dd->ipath_eep_hrs = new_hrs;
+		if ((new_hrs & 0xFF) != ifp->if_powerhour[0]) {
+			ifp->if_powerhour[0] = new_hrs & 0xFF;
+			hi_water = offsetof(struct ipath_flash, if_powerhour);
+		}
+		if ((new_hrs >> 8) != ifp->if_powerhour[1]) {
+			ifp->if_powerhour[1] = new_hrs >> 8;
+			hi_water = offsetof(struct ipath_flash, if_powerhour)
+					+ 1;
+		}
+	}
+	/*
+	 * There is a tiny possibility that we could somehow fail to write
+	 * the EEPROM after updating our shadows, but problems from holding
+	 * the spinlock too long are a much bigger issue.
+	 */
+	spin_unlock_irqrestore(&dd->ipath_eep_st_lock, flags);
+	if (hi_water) {
+		/* we made some change to the data, uopdate cksum and write */
+		csum = flash_csum(ifp, 1);
+		ret = ipath_eeprom_internal_write(dd, 0, buf, hi_water + 1);
+	}
+	up(&dd->ipath_eep_sem);
+	if (ret)
+		ipath_dev_err(dd, "Failed updating EEPROM\n");
+
+free_bail:
+	vfree(buf);
+bail:
+	return ret;
+
+}
+
+/**
+ * ipath_inc_eeprom_err - increment one of the four error counters
+ * that are logged to EEPROM.
+ * @dd: the infinipath device
+ * @eidx: 0..3, the counter to increment
+ * @incr: how much to add
+ *
+ * Each counter is 8-bits, and saturates at 255 (0xFF). They
+ * are copied to the EEPROM (aka flash) whenever ipath_update_eeprom_log()
+ * is called, but it can only be called in a context that allows sleep.
+ * This function can be called even at interrupt level.
+ */
+
+void ipath_inc_eeprom_err(struct ipath_devdata *dd, u32 eidx, u32 incr)
+{
+	uint new_val;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dd->ipath_eep_st_lock, flags);
+	new_val = dd->ipath_eep_st_new_errs[eidx] + incr;
+	if (new_val > 255)
+		new_val = 255;
+	dd->ipath_eep_st_new_errs[eidx] = new_val;
+	spin_unlock_irqrestore(&dd->ipath_eep_st_lock, flags);
+	return;
+}
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c
index 8482ea3..85f408d 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c
@@ -440,6 +440,7 @@ static void ipath_ht_handle_hwerrors(struct ipath_devdata *dd, char *msg,
 	u32 bits, ctrl;
 	int isfatal = 0;
 	char bitsmsg[64];
+	int log_idx;
 
 	hwerrs = ipath_read_kreg64(dd, dd->ipath_kregs->kr_hwerrstatus);
 
@@ -468,6 +469,11 @@ static void ipath_ht_handle_hwerrors(struct ipath_devdata *dd, char *msg,
 
 	hwerrs &= dd->ipath_hwerrmask;
 
+	/* We log some errors to EEPROM, check if we have any of those. */
+	for (log_idx = 0; log_idx < IPATH_EEP_LOG_CNT; ++log_idx)
+		if (hwerrs & dd->ipath_eep_st_masks[log_idx].hwerrs_to_log)
+			ipath_inc_eeprom_err(dd, log_idx, 1);
+
 	/*
 	 * make sure we get this much out, unless told to be quiet,
 	 * it's a parity error we may recover from,
@@ -1171,6 +1177,22 @@ static void ipath_init_ht_variables(struct ipath_devdata *dd)
 
 	dd->ipath_i_rcvavail_mask = INFINIPATH_I_RCVAVAIL_MASK;
 	dd->ipath_i_rcvurg_mask = INFINIPATH_I_RCVURG_MASK;
+
+	/*
+	 * EEPROM error log 0 is TXE Parity errors. 1 is RXE Parity.
+	 * 2 is Some Misc, 3 is reserved for future.
+	 */
+	dd->ipath_eep_st_masks[0].hwerrs_to_log =
+		INFINIPATH_HWE_TXEMEMPARITYERR_MASK <<
+		INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT;
+
+	dd->ipath_eep_st_masks[1].hwerrs_to_log =
+		INFINIPATH_HWE_RXEMEMPARITYERR_MASK <<
+		INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT;
+
+	dd->ipath_eep_st_masks[2].errs_to_log =
+		INFINIPATH_E_INVALIDADDR | INFINIPATH_E_RESET;
+
 }
 
 /**
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c
index 7115907..207323a 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c
@@ -340,6 +340,7 @@ static void ipath_pe_handle_hwerrors(struct ipath_devdata *dd, char *msg,
 	u32 bits, ctrl;
 	int isfatal = 0;
 	char bitsmsg[64];
+	int log_idx;
 
 	hwerrs = ipath_read_kreg64(dd, dd->ipath_kregs->kr_hwerrstatus);
 	if (!hwerrs) {
@@ -367,6 +368,11 @@ static void ipath_pe_handle_hwerrors(struct ipath_devdata *dd, char *msg,
 
 	hwerrs &= dd->ipath_hwerrmask;
 
+	/* We log some errors to EEPROM, check if we have any of those. */
+	for (log_idx = 0; log_idx < IPATH_EEP_LOG_CNT; ++log_idx)
+		if (hwerrs & dd->ipath_eep_st_masks[log_idx].hwerrs_to_log)
+			ipath_inc_eeprom_err(dd, log_idx, 1);
+
 	/*
 	 * make sure we get this much out, unless told to be quiet,
 	 * or it's occurred within the last 5 seconds
@@ -950,6 +956,27 @@ static void ipath_init_pe_variables(struct ipath_devdata *dd)
 
 	dd->ipath_i_rcvavail_mask = INFINIPATH_I_RCVAVAIL_MASK;
 	dd->ipath_i_rcvurg_mask = INFINIPATH_I_RCVURG_MASK;
+
+	/*
+	 * EEPROM error log 0 is TXE Parity errors. 1 is RXE Parity.
+	 * 2 is Some Misc, 3 is reserved for future.
+	 */
+	dd->ipath_eep_st_masks[0].hwerrs_to_log =
+		INFINIPATH_HWE_TXEMEMPARITYERR_MASK <<
+		INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT;
+
+	/* Ignore errors in PIO/PBC on systems with unordered write-combining */
+	if (ipath_unordered_wc())
+		dd->ipath_eep_st_masks[0].hwerrs_to_log &= ~TXE_PIO_PARITY;
+
+	dd->ipath_eep_st_masks[1].hwerrs_to_log =
+		INFINIPATH_HWE_RXEMEMPARITYERR_MASK <<
+		INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT;
+
+	dd->ipath_eep_st_masks[2].errs_to_log =
+		INFINIPATH_E_INVALIDADDR | INFINIPATH_E_RESET;
+
+
 }
 
 /* setup the MSI stuff again after a reset.  I'd like to just call
diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c
index f6ee7a8..ee83934 100644
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c
@@ -341,6 +341,8 @@ static int init_chip_first(struct ipath_devdata *dd,
 	spin_lock_init(&dd->ipath_tid_lock);
 
 	spin_lock_init(&dd->ipath_gpio_lock);
+	spin_lock_init(&dd->ipath_eep_st_lock);
+	sema_init(&dd->ipath_eep_sem, 1);
 
 done:
 	*pdp = pd;
diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c
index a90d3b5..d9cdd00 100644
--- a/drivers/infiniband/hw/ipath/ipath_intr.c
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c
@@ -505,6 +505,7 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs)
 	int i, iserr = 0;
 	int chkerrpkts = 0, noprint = 0;
 	unsigned supp_msgs;
+	int log_idx;
 
 	supp_msgs = handle_frequent_errors(dd, errs, msg, &noprint);
 
@@ -518,6 +519,13 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs)
 	if (errs & INFINIPATH_E_HARDWARE) {
 		/* reuse same msg buf */
 		dd->ipath_f_handle_hwerrors(dd, msg, sizeof msg);
+	} else {
+		u64 mask;
+		for (log_idx = 0; log_idx < IPATH_EEP_LOG_CNT; ++log_idx) {
+			mask = dd->ipath_eep_st_masks[log_idx].errs_to_log;
+			if (errs & mask)
+				ipath_inc_eeprom_err(dd, log_idx, 1);
+		}
 	}
 
 	if (!noprint && (errs & ~dd->ipath_e_bitsextant))
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index bd1088a..2a4414b 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -57,6 +57,24 @@
 extern struct infinipath_stats ipath_stats;
 
 #define IPATH_CHIP_SWVERSION IPATH_CHIP_VERS_MAJ
+/*
+ * First-cut critierion for "device is active" is
+ * two thousand dwords combined Tx, Rx traffic per
+ * 5-second interval. SMA packets are 64 dwords,
+ * and occur "a few per second", presumably each way.
+ */
+#define IPATH_TRAFFIC_ACTIVE_THRESHOLD (2000)
+/*
+ * Struct used to indicate which errors are logged in each of the
+ * error-counters that are logged to EEPROM. A counter is incremented
+ * _once_ (saturating at 255) for each event with any bits set in
+ * the error or hwerror register masks below.
+ */
+#define IPATH_EEP_LOG_CNT (4)
+struct ipath_eep_log_mask {
+	u64 errs_to_log;
+	u64 hwerrs_to_log;
+};
 
 struct ipath_portdata {
 	void **port_rcvegrbuf;
@@ -588,6 +606,24 @@ struct ipath_devdata {
 	/* Used to flash LEDs in override mode */
 	struct timer_list ipath_led_override_timer;
 
+	/* Support (including locks) for EEPROM logging of errors and time */
+	/* control access to actual counters, timer */
+	spinlock_t ipath_eep_st_lock;
+	/* control high-level access to EEPROM */
+	struct semaphore ipath_eep_sem;
+	/* Below inc'd by ipath_snap_cntrs(), locked by ipath_eep_st_lock */
+	uint64_t ipath_traffic_wds;
+	/* active time is kept in seconds, but logged in hours */
+	atomic_t ipath_active_time;
+	/* Below are nominal shadow of EEPROM, new since last EEPROM update */
+	uint8_t ipath_eep_st_errs[IPATH_EEP_LOG_CNT];
+	uint8_t ipath_eep_st_new_errs[IPATH_EEP_LOG_CNT];
+	uint16_t ipath_eep_hrs;
+	/*
+	 * masks for which bits of errs, hwerrs that cause
+	 * each of the counters to increment.
+	 */
+	struct ipath_eep_log_mask ipath_eep_st_masks[IPATH_EEP_LOG_CNT];
 };
 
 /* Private data for file operations */
@@ -726,6 +762,8 @@ u32 __iomem *ipath_getpiobuf(struct ipath_devdata *, u32 *);
 void ipath_init_iba6120_funcs(struct ipath_devdata *);
 void ipath_init_iba6110_funcs(struct ipath_devdata *);
 void ipath_get_eeprom_info(struct ipath_devdata *);
+int ipath_update_eeprom_log(struct ipath_devdata *dd);
+void ipath_inc_eeprom_err(struct ipath_devdata *dd, u32 eidx, u32 incr);
 u64 ipath_snap_cntr(struct ipath_devdata *, ipath_creg);
 void ipath_disarm_senderrbufs(struct ipath_devdata *, int);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_stats.c b/drivers/infiniband/hw/ipath/ipath_stats.c
index d8b5e4c..2955f36 100644
--- a/drivers/infiniband/hw/ipath/ipath_stats.c
+++ b/drivers/infiniband/hw/ipath/ipath_stats.c
@@ -55,6 +55,7 @@ u64 ipath_snap_cntr(struct ipath_devdata *dd, ipath_creg creg)
 	u64 val64;
 	unsigned long t0, t1;
 	u64 ret;
+	unsigned long flags;
 
 	t0 = jiffies;
 	/* If fast increment counters are only 32 bits, snapshot them,
@@ -91,12 +92,18 @@ u64 ipath_snap_cntr(struct ipath_devdata *dd, ipath_creg creg)
 	if (creg == dd->ipath_cregs->cr_wordsendcnt) {
 		if (val != dd->ipath_lastsword) {
 			dd->ipath_sword += val - dd->ipath_lastsword;
+			spin_lock_irqsave(&dd->ipath_eep_st_lock, flags);
+			dd->ipath_traffic_wds += val - dd->ipath_lastsword;
+			spin_unlock_irqrestore(&dd->ipath_eep_st_lock, flags);
 			dd->ipath_lastsword = val;
 		}
 		val64 = dd->ipath_sword;
 	} else if (creg == dd->ipath_cregs->cr_wordrcvcnt) {
 		if (val != dd->ipath_lastrword) {
 			dd->ipath_rword += val - dd->ipath_lastrword;
+			spin_lock_irqsave(&dd->ipath_eep_st_lock, flags);
+			dd->ipath_traffic_wds += val - dd->ipath_lastrword;
+			spin_unlock_irqrestore(&dd->ipath_eep_st_lock, flags);
 			dd->ipath_lastrword = val;
 		}
 		val64 = dd->ipath_rword;
@@ -200,6 +207,7 @@ void ipath_get_faststats(unsigned long opaque)
 	struct ipath_devdata *dd = (struct ipath_devdata *) opaque;
 	u32 val;
 	static unsigned cnt;
+	unsigned long flags;
 
 	/*
 	 * don't access the chip while running diags, or memory diags can
@@ -210,9 +218,20 @@ void ipath_get_faststats(unsigned long opaque)
 		/* but re-arm the timer, for diags case; won't hurt other */
 		goto done;
 
+	/*
+	 * We now try to maintain a "active timer", based on traffic
+	 * exceeding a threshold, so we need to check the word-counts
+	 * even if they are 64-bit.
+	 */
+	ipath_snap_cntr(dd, dd->ipath_cregs->cr_wordsendcnt);
+	ipath_snap_cntr(dd, dd->ipath_cregs->cr_wordrcvcnt);
+	spin_lock_irqsave(&dd->ipath_eep_st_lock, flags);
+	if (dd->ipath_traffic_wds  >= IPATH_TRAFFIC_ACTIVE_THRESHOLD)
+		atomic_add(5, &dd->ipath_active_time); /* S/B #define */
+	dd->ipath_traffic_wds = 0;
+	spin_unlock_irqrestore(&dd->ipath_eep_st_lock, flags);
+
 	if (dd->ipath_flags & IPATH_32BITCOUNTERS) {
-		ipath_snap_cntr(dd, dd->ipath_cregs->cr_wordsendcnt);
-		ipath_snap_cntr(dd, dd->ipath_cregs->cr_wordrcvcnt);
 		ipath_snap_cntr(dd, dd->ipath_cregs->cr_pktsendcnt);
 		ipath_snap_cntr(dd, dd->ipath_cregs->cr_pktrcvcnt);
 	}
diff --git a/drivers/infiniband/hw/ipath/ipath_sysfs.c b/drivers/infiniband/hw/ipath/ipath_sysfs.c
index 17ec145..ab34d3e 100644
--- a/drivers/infiniband/hw/ipath/ipath_sysfs.c
+++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c
@@ -613,6 +613,26 @@ static ssize_t store_led_override(struct device *dev,
 	return ret;
 }
 
+static ssize_t show_logged_errs(struct device *dev,
+				struct device_attribute *attr,
+				char *buf)
+{
+	struct ipath_devdata *dd = dev_get_drvdata(dev);
+	int idx, count;
+
+	/* force consistency with actual EEPROM */
+	if (ipath_update_eeprom_log(dd) != 0)
+		return -ENXIO;
+
+	count = 0;
+	for (idx = 0; idx < IPATH_EEP_LOG_CNT; ++idx) {
+		count += scnprintf(buf + count, PAGE_SIZE - count, "%d%c",
+			dd->ipath_eep_st_errs[idx],
+			idx == (IPATH_EEP_LOG_CNT - 1) ? '\n' : ' ');
+	}
+
+	return count;
+}
 
 static DRIVER_ATTR(num_units, S_IRUGO, show_num_units, NULL);
 static DRIVER_ATTR(version, S_IRUGO, show_version, NULL);
@@ -643,6 +663,7 @@ static DEVICE_ATTR(boardversion, S_IRUGO, show_boardversion, NULL);
 static DEVICE_ATTR(unit, S_IRUGO, show_unit, NULL);
 static DEVICE_ATTR(rx_pol_inv, S_IWUSR, NULL, store_rx_pol_inv);
 static DEVICE_ATTR(led_override, S_IWUSR, NULL, store_led_override);
+static DEVICE_ATTR(logged_errors, S_IRUGO, show_logged_errs, NULL);
 
 static struct attribute *dev_attributes[] = {
 	&dev_attr_guid.attr,
@@ -660,6 +681,7 @@ static struct attribute *dev_attributes[] = {
 	&dev_attr_enabled.attr,
 	&dev_attr_rx_pol_inv.attr,
 	&dev_attr_led_override.attr,
+	&dev_attr_logged_errors.attr,
 	NULL
 };
 

From arthur.jones at qlogic.com  Tue Jun 19 16:41:03 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:41:03 -0700
Subject: [ofa-general] [PATCH 06/28] IB/ipath - Support the IBA6110 revision
	4
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234102.3794.86911.stgit@bauxite.internal.keyresearch.com>

From: Dave Olson <dave.olson at qlogic.com>

Recognize IBA 6110 Revision 4, same feature set, etc. as earlier revisions.

Signed-off-by: Dave Olson <dave.olson at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_iba6110.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c
index 85f408d..0479985 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c
@@ -680,9 +680,9 @@ static int ipath_ht_boardname(struct ipath_devdata *dd, char *name,
 		snprintf(name, namelen, "%s", n);
 
 	if (dd->ipath_majrev != 3 || (dd->ipath_minrev < 2 ||
-		dd->ipath_minrev > 3)) {
+		dd->ipath_minrev > 4)) {
 		/*
-		 * This version of the driver only supports Rev 3.2 and 3.3
+		 * This version of the driver only supports Rev 3.2 - 3.4
 		 */
 		ipath_dev_err(dd,
 			      "Unsupported InfiniPath hardware revision %u.%u!\n",


From arthur.jones at qlogic.com  Tue Jun 19 16:41:09 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:41:09 -0700
Subject: [ofa-general] [PATCH 07/28] IB/ipath - fix maximum MTU reporting
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234108.3794.33107.stgit@bauxite.internal.keyresearch.com>

From: Robert Walsh <robert.walsh at qlogic.com>

Although our chip supports 4K MTUs, our driver doesn't yet support this
feature, so limit the maximum MTU to 2K until we get support for 4K
MTUs implemented.

Signed-off-by: Robert Walsh <robert.walsh at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_fs.c        |    7 ++++++-
 drivers/infiniband/hw/ipath/ipath_init_chip.c |    7 ++++++-
 drivers/infiniband/hw/ipath/ipath_mad.c       |    7 ++++++-
 drivers/infiniband/hw/ipath/ipath_qp.c        |    7 ++++++-
 drivers/infiniband/hw/ipath/ipath_verbs.c     |    7 ++++++-
 5 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_fs.c b/drivers/infiniband/hw/ipath/ipath_fs.c
index ebd5c7b..40cf1bc 100644
--- a/drivers/infiniband/hw/ipath/ipath_fs.c
+++ b/drivers/infiniband/hw/ipath/ipath_fs.c
@@ -257,9 +257,14 @@ static ssize_t atomic_port_info_read(struct file *file, char __user *buf,
 		/* Notimpl InitType (actually, an SMA decision) */
 		/* VLHighLimit is 0 (only one VL) */
 		; /* VLArbitrationHighCap is 0 (only one VL) */
+	/*
+	 * Note: the chips support a maximum MTU of 4096, but the driver
+	 * hasn't implemented this feature yet, so set the maximum
+	 * to 2048.
+	 */
 	portinfo[10] = 	/* VLArbitrationLowCap is 0 (only one VL) */
 		/* InitTypeReply is SMA decision */
-		(5 << 16)	/* MTUCap 4096 */
+		(4 << 16)	/* MTUCap 2048 */
 		| (7 << 13)	/* VLStallCount */
 		| (0x1f << 8)	/* HOQLife */
 		| (1 << 4)
diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c
index ee83934..bdfda62 100644
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c
@@ -310,7 +310,12 @@ static int init_chip_first(struct ipath_devdata *dd,
 	val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_sendpiosize);
 	dd->ipath_piosize2k = val & ~0U;
 	dd->ipath_piosize4k = val >> 32;
-	dd->ipath_ibmtu = 4096;	/* default to largest legal MTU */
+	/*
+	 * Note: the chips support a maximum MTU of 4096, but the driver
+	 * hasn't implemented this feature yet, so set the initial value
+	 * to 2048.
+	 */
+	dd->ipath_ibmtu = 2048;
 	val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_sendpiobufcnt);
 	dd->ipath_piobcnt2k = val & ~0U;
 	dd->ipath_piobcnt4k = val >> 32;
diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c b/drivers/infiniband/hw/ipath/ipath_mad.c
index 25908b0..2e9e161 100644
--- a/drivers/infiniband/hw/ipath/ipath_mad.c
+++ b/drivers/infiniband/hw/ipath/ipath_mad.c
@@ -292,7 +292,12 @@ static int recv_subn_get_portinfo(struct ib_smp *smp,
 	/* pip->vl_arb_high_cap; // only one VL */
 	/* pip->vl_arb_low_cap; // only one VL */
 	/* InitTypeReply = 0 */
-	pip->inittypereply_mtucap = IB_MTU_4096;
+	/*
+	 * Note: the chips support a maximum MTU of 4096, but the driver
+	 * hasn't implemented this feature yet, so set the maximum value
+	 * to 2048.
+	 */
+	pip->inittypereply_mtucap = IB_MTU_2048;
 	// HCAs ignore VLStallCount and HOQLife
 	/* pip->vlstallcnt_hoqlife; */
 	pip->operationalvl_pei_peo_fpi_fpo = 0x10;	/* OVLs = 1 */
diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c b/drivers/infiniband/hw/ipath/ipath_qp.c
index bfef08e..9e07abb 100644
--- a/drivers/infiniband/hw/ipath/ipath_qp.c
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c
@@ -507,8 +507,13 @@ int ipath_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 		    attr->port_num > ibqp->device->phys_port_cnt)
 			goto inval;
 
+	/*
+	 * Note: the chips support a maximum MTU of 4096, but the driver
+	 * hasn't implemented this feature yet, so don't allow Path MTU
+	 * values greater than 2048.
+	 */
 	if (attr_mask & IB_QP_PATH_MTU)
-		if (attr->path_mtu > IB_MTU_4096)
+		if (attr->path_mtu > IB_MTU_2048)
 			goto inval;
 
 	if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC)
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index bb70845..980b64a 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -1051,7 +1051,12 @@ static int ipath_query_port(struct ib_device *ibdev,
 	props->max_vl_num = 1;		/* VLCap = VL0 */
 	props->init_type_reply = 0;
 
-	props->max_mtu = IB_MTU_4096;
+	/*
+	 * Note: the chips support a maximum MTU of 4096, but the driver
+	 * hasn't implemented this feature yet, so set the maximum value
+	 * to 2048.
+	 */
+	props->max_mtu = IB_MTU_2048;
 	switch (dev->dd->ipath_ibmtu) {
 	case 4096:
 		mtu = IB_MTU_4096;


From arthur.jones at qlogic.com  Tue Jun 19 16:41:14 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:41:14 -0700
Subject: [ofa-general] [PATCH 08/28] IB/ipath -- fill in some missing
	FMR-related fields
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234114.3794.6698.stgit@bauxite.internal.keyresearch.com>

From: Robert Walsh <robert.walsh at qlogic.com>

In ipath_query_device(), some of the struct ib_device_attr
fields were not being initialized.

Signed-off-by: Robert Walsh <robert.walsh at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_verbs.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 980b64a..04294ca 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -981,6 +981,8 @@ static int ipath_query_device(struct ib_device *ibdev,
 	props->max_ah = ib_ipath_max_ahs;
 	props->max_cqe = ib_ipath_max_cqes;
 	props->max_mr = dev->lk_table.max;
+	props->max_fmr = dev->lk_table.max;
+	props->max_map_per_fmr = 32767;
 	props->max_pd = ib_ipath_max_pds;
 	props->max_qp_rd_atom = IPATH_MAX_RDMA_ATOMIC;
 	props->max_qp_init_rd_atom = 255;


From arthur.jones at qlogic.com  Tue Jun 19 16:41:20 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:41:20 -0700
Subject: [ofa-general] [PATCH 09/28] IB/ipath - fix problem with next WQE
	after a UC completion
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234119.3794.43684.stgit@bauxite.internal.keyresearch.com>

From: Ralph Campbell <ralph.campbell at qlogic.com>

This patch fixes a bug introduced when moving some code around
for readability.

Setting the wqe pointer at the end of the function
is a NOP since it isn't used. Move it back to where
it is used.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_uc.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_uc.c b/drivers/infiniband/hw/ipath/ipath_uc.c
index 1c2b03c..49d650c 100644
--- a/drivers/infiniband/hw/ipath/ipath_uc.c
+++ b/drivers/infiniband/hw/ipath/ipath_uc.c
@@ -58,7 +58,6 @@ static void complete_last_send(struct ipath_qp *qp, struct ipath_swqe *wqe,
 		wc->port_num = 0;
 		ipath_cq_enter(to_icq(qp->ibqp.send_cq), wc, 0);
 	}
-	wqe = get_swqe_ptr(qp, qp->s_last);
 }
 
 /**
@@ -97,8 +96,10 @@ int ipath_make_uc_req(struct ipath_qp *qp,
 		 * Signal the completion of the last send
 		 * (if there is one).
 		 */
-		if (qp->s_last != qp->s_tail)
+		if (qp->s_last != qp->s_tail) {
 			complete_last_send(qp, wqe, &wc);
+			wqe = get_swqe_ptr(qp, qp->s_last);
+		}
 
 		/* Check if send work queue is empty. */
 		if (qp->s_tail == qp->s_head)


From arthur.jones at qlogic.com  Tue Jun 19 16:41:26 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:41:26 -0700
Subject: [ofa-general] [PATCH 10/28] IB/ipath - fix local loopback bug when
	waiting for resources
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234125.3794.25265.stgit@bauxite.internal.keyresearch.com>

From: Ralph Campbell <ralph.campbell at qlogic.com>

This patch fixes a minor bug where the wrong QP was checked for
a send work request which should wait for an RNR timeout.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_ruc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c
index d9c2a9b..8c5d20a 100644
--- a/drivers/infiniband/hw/ipath/ipath_ruc.c
+++ b/drivers/infiniband/hw/ipath/ipath_ruc.c
@@ -267,7 +267,7 @@ again:
 	spin_lock_irqsave(&sqp->s_lock, flags);
 
 	if (!(ib_ipath_state_ops[sqp->state] & IPATH_PROCESS_SEND_OK) ||
-	    qp->s_rnr_timeout) {
+	    sqp->s_rnr_timeout) {
 		spin_unlock_irqrestore(&sqp->s_lock, flags);
 		goto done;
 	}


From arthur.jones at qlogic.com  Tue Jun 19 16:41:32 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:41:32 -0700
Subject: [ofa-general] [PATCH 11/28] IB/ipath - set M bit in BTH according to
	IB spec.
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234131.3794.4718.stgit@bauxite.internal.keyresearch.com>

From: Ralph Campbell <ralph.campbell at qlogic.com>

According to ch. 17.2.8.1.1, QPs start in the migrated state and
should send packets with the M bit set in the BTH.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_rc.c |    6 +++---
 drivers/infiniband/hw/ipath/ipath_uc.c |    2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c
index 1915771..9ba80d1 100644
--- a/drivers/infiniband/hw/ipath/ipath_rc.c
+++ b/drivers/infiniband/hw/ipath/ipath_rc.c
@@ -188,7 +188,7 @@ static int ipath_make_rc_ack(struct ipath_qp *qp,
 	}
 	qp->s_hdrwords = hwords;
 	qp->s_cur_size = len;
-	*bth0p = bth0;
+	*bth0p = bth0 | (1 << 22); /* Set M bit */
 	*bth2p = bth2;
 	return 1;
 
@@ -240,7 +240,7 @@ int ipath_make_rc_req(struct ipath_qp *qp,
 
 	/* header size in 32-bit words LRH+BTH = (8+12)/4. */
 	hwords = 5;
-	bth0 = 0;
+	bth0 = 1 << 22; /* Set M bit */
 
 	/* Send a request. */
 	wqe = get_swqe_ptr(qp, qp->s_cur);
@@ -604,7 +604,7 @@ static void send_rc_ack(struct ipath_qp *qp)
 	}
 	/* read pkey_index w/o lock (its atomic) */
 	bth0 = ipath_get_pkey(dev->dd, qp->s_pkey_index) |
-		OP(ACKNOWLEDGE) << 24;
+		(OP(ACKNOWLEDGE) << 24) | (1 << 22);
 	if (qp->r_nak_state)
 		ohdr->u.aeth = cpu_to_be32((qp->r_msn & IPATH_MSN_MASK) |
 					    (qp->r_nak_state <<
diff --git a/drivers/infiniband/hw/ipath/ipath_uc.c b/drivers/infiniband/hw/ipath/ipath_uc.c
index 49d650c..243d7c6 100644
--- a/drivers/infiniband/hw/ipath/ipath_uc.c
+++ b/drivers/infiniband/hw/ipath/ipath_uc.c
@@ -86,7 +86,7 @@ int ipath_make_uc_req(struct ipath_qp *qp,
 
 	/* header size in 32-bit words LRH+BTH = (8+12)/4. */
 	hwords = 5;
-	bth0 = 0;
+	bth0 = 1 << 22; /* Set M bit */
 
 	/* Get the next send request. */
 	wqe = get_swqe_ptr(qp, qp->s_last);


From arthur.jones at qlogic.com  Tue Jun 19 16:41:38 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:41:38 -0700
Subject: [ofa-general] [PATCH 12/28] IB/ipath - Change use of constants for
	TID type to defined values
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234137.3794.42065.stgit@bauxite.internal.keyresearch.com>

From: Joan Eslinger <joan.eslinger at qlogic.com>

Define pkt rcvd 'type' in a way consistent w/ h/w spec and chips

The hardware considers received packets of type 0 to be expected,
and type 1 to be eager. The driver was calling the ipath_f_put_tid
functions using a variable called 'type' set to 0 for eager and
to 1 for expected packets. Worse, the iba6110 and iba6120 drivers
used those values inconsistently. This was quite confusing.
Now everything is consistent with the hardware.

Signed-off-by: Dave Olson <dave.olson at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_file_ops.c  |   12 ++++++++----
 drivers/infiniband/hw/ipath/ipath_iba6110.c   |   10 ++++++----
 drivers/infiniband/hw/ipath/ipath_iba6120.c   |   14 ++++++++------
 drivers/infiniband/hw/ipath/ipath_init_chip.c |    3 ++-
 4 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
index 1272aaf..931802b 100644
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
@@ -396,7 +396,8 @@ static int ipath_tid_update(struct ipath_portdata *pd, struct file *fp,
 			   "TID %u, vaddr %lx, physaddr %llx pgp %p\n",
 			   tid, vaddr, (unsigned long long) physaddr,
 			   pagep[i]);
-		dd->ipath_f_put_tid(dd, &tidbase[tid], 1, physaddr);
+		dd->ipath_f_put_tid(dd, &tidbase[tid], RCVHQ_RCV_TYPE_EXPECTED,
+				    physaddr);
 		/*
 		 * don't check this tid in ipath_portshadow, since we
 		 * just filled it in; start with the next one.
@@ -422,7 +423,8 @@ static int ipath_tid_update(struct ipath_portdata *pd, struct file *fp,
 			if (dd->ipath_pageshadow[porttid + tid]) {
 				ipath_cdbg(VERBOSE, "Freeing TID %u\n",
 					   tid);
-				dd->ipath_f_put_tid(dd, &tidbase[tid], 1,
+				dd->ipath_f_put_tid(dd, &tidbase[tid],
+						    RCVHQ_RCV_TYPE_EXPECTED,
 						    dd->ipath_tidinvalid);
 				pci_unmap_page(dd->pcidev,
 					dd->ipath_physshadow[porttid + tid],
@@ -538,7 +540,8 @@ static int ipath_tid_free(struct ipath_portdata *pd, unsigned subport,
 		if (dd->ipath_pageshadow[porttid + tid]) {
 			ipath_cdbg(VERBOSE, "PID %u freeing TID %u\n",
 				   pd->port_pid, tid);
-			dd->ipath_f_put_tid(dd, &tidbase[tid], 1,
+			dd->ipath_f_put_tid(dd, &tidbase[tid],
+					    RCVHQ_RCV_TYPE_EXPECTED,
 					    dd->ipath_tidinvalid);
 			pci_unmap_page(dd->pcidev,
 				dd->ipath_physshadow[porttid + tid],
@@ -921,7 +924,8 @@ static int ipath_create_user_egr(struct ipath_portdata *pd)
 					    (u64 __iomem *)
 					    ((char __iomem *)
 					     dd->ipath_kregbase +
-					     dd->ipath_rcvegrbase), 0, pa);
+					     dd->ipath_rcvegrbase),
+					    RCVHQ_RCV_TYPE_EAGER, pa);
 			pa += egrsize;
 		}
 		cond_resched();	/* don't hog the cpu */
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c
index 0479985..d8ac9f1 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c
@@ -1408,7 +1408,7 @@ static void ipath_ht_quiet_serdes(struct ipath_devdata *dd)
  * ipath_pe_put_tid - write a TID in chip
  * @dd: the infinipath device
  * @tidptr: pointer to the expected TID (in chip) to udpate
- * @tidtype: 0 for eager, 1 for expected
+ * @tidtype: RCVHQ_RCV_TYPE_EAGER (1) for eager, RCVHQ_RCV_TYPE_EXPECTED (0) for expected
  * @pa: physical address of in memory buffer; ipath_tidinvalid if freeing
  *
  * This exists as a separate routine to allow for special locking etc.
@@ -1429,7 +1429,7 @@ static void ipath_ht_put_tid(struct ipath_devdata *dd,
 				 "40 bits, using only 40!!!\n", pa);
 			pa &= INFINIPATH_RT_ADDR_MASK;
 		}
-		if (type == 0)
+		if (type == RCVHQ_RCV_TYPE_EAGER)
 			pa |= dd->ipath_tidtemplate;
 		else {
 			/* in words (fixed, full page).  */
@@ -1469,7 +1469,8 @@ static void ipath_ht_clear_tids(struct ipath_devdata *dd, unsigned port)
 				   port * dd->ipath_rcvtidcnt *
 				   sizeof(*tidbase));
 	for (i = 0; i < dd->ipath_rcvtidcnt; i++)
-		ipath_ht_put_tid(dd, &tidbase[i], 1, dd->ipath_tidinvalid);
+		ipath_ht_put_tid(dd, &tidbase[i], RCVHQ_RCV_TYPE_EXPECTED,
+				 dd->ipath_tidinvalid);
 
 	tidbase = (u64 __iomem *) ((char __iomem *)(dd->ipath_kregbase) +
 				   dd->ipath_rcvegrbase +
@@ -1477,7 +1478,8 @@ static void ipath_ht_clear_tids(struct ipath_devdata *dd, unsigned port)
 				   sizeof(*tidbase));
 
 	for (i = 0; i < dd->ipath_rcvegrcnt; i++)
-		ipath_ht_put_tid(dd, &tidbase[i], 0, dd->ipath_tidinvalid);
+		ipath_ht_put_tid(dd, &tidbase[i], RCVHQ_RCV_TYPE_EAGER,
+				 dd->ipath_tidinvalid);
 }
 
 /**
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c
index 207323a..b931057 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c
@@ -1104,7 +1104,7 @@ bail:
  * ipath_pe_put_tid - write a TID in chip
  * @dd: the infinipath device
  * @tidptr: pointer to the expected TID (in chip) to udpate
- * @tidtype: 0 for eager, 1 for expected
+ * @tidtype: RCVHQ_RCV_TYPE_EAGER (1) for eager, RCVHQ_RCV_TYPE_EXPECTED (0) for expected
  * @pa: physical address of in memory buffer; ipath_tidinvalid if freeing
  *
  * This exists as a separate routine to allow for special locking etc.
@@ -1130,7 +1130,7 @@ static void ipath_pe_put_tid(struct ipath_devdata *dd, u64 __iomem *tidptr,
 				      "BUG: Physical page address 0x%lx "
 				      "has bits set in 31-29\n", pa);
 
-		if (type == 0)
+		if (type == RCVHQ_RCV_TYPE_EAGER)
 			pa |= dd->ipath_tidtemplate;
 		else /* for now, always full 4KB page */
 			pa |= 2 << 29;
@@ -1154,7 +1154,7 @@ static void ipath_pe_put_tid(struct ipath_devdata *dd, u64 __iomem *tidptr,
  * ipath_pe_put_tid_2 - write a TID in chip, Revision 2 or higher
  * @dd: the infinipath device
  * @tidptr: pointer to the expected TID (in chip) to udpate
- * @tidtype: 0 for eager, 1 for expected
+ * @tidtype: RCVHQ_RCV_TYPE_EAGER (1) for eager, RCVHQ_RCV_TYPE_EXPECTED (0) for expected
  * @pa: physical address of in memory buffer; ipath_tidinvalid if freeing
  *
  * This exists as a separate routine to allow for selection of the
@@ -1179,7 +1179,7 @@ static void ipath_pe_put_tid_2(struct ipath_devdata *dd, u64 __iomem *tidptr,
 				      "BUG: Physical page address 0x%lx "
 				      "has bits set in 31-29\n", pa);
 
-		if (type == 0)
+		if (type == RCVHQ_RCV_TYPE_EAGER)
 			pa |= dd->ipath_tidtemplate;
 		else /* for now, always full 4KB page */
 			pa |= 2 << 29;
@@ -1218,7 +1218,8 @@ static void ipath_pe_clear_tids(struct ipath_devdata *dd, unsigned port)
 		 port * dd->ipath_rcvtidcnt * sizeof(*tidbase));
 
 	for (i = 0; i < dd->ipath_rcvtidcnt; i++)
-		ipath_pe_put_tid(dd, &tidbase[i], 0, tidinv);
+		ipath_pe_put_tid(dd, &tidbase[i], RCVHQ_RCV_TYPE_EXPECTED,
+				 tidinv);
 
 	tidbase = (u64 __iomem *)
 		((char __iomem *)(dd->ipath_kregbase) +
@@ -1226,7 +1227,8 @@ static void ipath_pe_clear_tids(struct ipath_devdata *dd, unsigned port)
 		 port * dd->ipath_rcvegrcnt * sizeof(*tidbase));
 
 	for (i = 0; i < dd->ipath_rcvegrcnt; i++)
-		ipath_pe_put_tid(dd, &tidbase[i], 1, tidinv);
+		ipath_pe_put_tid(dd, &tidbase[i], RCVHQ_RCV_TYPE_EAGER,
+				 tidinv);
 }
 
 /**
diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c
index bdfda62..9f61155 100644
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c
@@ -133,7 +133,8 @@ static int create_port0_egr(struct ipath_devdata *dd)
 				   dd->ipath_ibmaxlen, PCI_DMA_FROMDEVICE);
 		dd->ipath_f_put_tid(dd, e + (u64 __iomem *)
 				    ((char __iomem *) dd->ipath_kregbase +
-				     dd->ipath_rcvegrbase), 0,
+				     dd->ipath_rcvegrbase),
+				    RCVHQ_RCV_TYPE_EAGER,
 				    dd->ipath_port0_skbinfo[e].phys);
 	}
 

From arthur.jones at qlogic.com  Tue Jun 19 16:41:45 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:41:45 -0700
Subject: [ofa-general] [PATCH 13/28] IB/ipath - Fix the mtrr_add args for
	chips with 2 buffer sizes
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234144.3794.6370.stgit@bauxite.internal.keyresearch.com>

From: Dave Olson <dave.olson at qlogic.com>

The values passed have never been right for iba 6120 chips, but
just happened to work.  We needed to select the right buffer
offset in the chip (both are in same register), and the total
length was wrong also, but was covered by the rounding up.

Signed-off-by: Dave Olson <dave.olson at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c |   27 ++++++++++++++++++++-----
 1 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 04696e6..9f409fd 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -63,12 +63,29 @@ int ipath_enable_wc(struct ipath_devdata *dd)
 	 * of 2 address matching the length (which has to be a power of 2).
 	 * For rev1, that means the base address, for rev2, it will be just
 	 * the PIO buffers themselves.
+	 * For chips with two sets of buffers, the calculations are
+	 * somewhat more complicated; we need to sum, and the piobufbase
+	 * register has both offsets, 2K in low 32 bits, 4K in high 32 bits.
+	 * The buffers are still packed, so a single range covers both.
 	 */
-	pioaddr = addr + dd->ipath_piobufbase;
-	piolen = (dd->ipath_piobcnt2k +
-		  dd->ipath_piobcnt4k) *
-		ALIGN(dd->ipath_piobcnt2k +
-		      dd->ipath_piobcnt4k, dd->ipath_palign);
+	if (dd->ipath_piobcnt2k && dd->ipath_piobcnt4k) { /* 2 sizes */
+		unsigned long pio2kbase, pio4kbase;
+		pio2kbase = dd->ipath_piobufbase & 0xffffffffUL;
+		pio4kbase = (dd->ipath_piobufbase >> 32) & 0xffffffffUL;
+		if (pio2kbase < pio4kbase) { /* all, for now */
+			pioaddr = addr + pio2kbase;
+			piolen = pio4kbase - pio2kbase +
+				dd->ipath_piobcnt4k * dd->ipath_4kalign;
+		} else {
+			pioaddr = addr + pio4kbase;
+			piolen = pio2kbase - pio4kbase +
+				dd->ipath_piobcnt2k * dd->ipath_palign;
+		}
+	} else {  /* single buffer size (2K, currently) */
+		pioaddr = addr + dd->ipath_piobufbase;
+		piolen = dd->ipath_piobcnt2k * dd->ipath_palign +
+			dd->ipath_piobcnt4k * dd->ipath_4kalign;
+	}
 
 	for (bits = 0; !(piolen & (1ULL << bits)); bits++)
 		/* do nothing */ ;


From arthur.jones at qlogic.com  Tue Jun 19 16:41:51 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:41:51 -0700
Subject: [ofa-general] [PATCH 14/28] IB/ipath - Use S_ABORT not cancel and
	abort on exit freeze mode after recovery
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234150.3794.43464.stgit@bauxite.internal.keyresearch.com>

From: Dave Olson <dave.olson at qlogic.com>

This centralizes the use of the abort functionality, removes the unneeded
buffer cancel (abort does the same thing), sets up to ignore launch errors
after abort, same as cancel.  We need abort on exit from freeze mode to
avoid having buffers stuck in the busy state, if a user process happened
to complete the send while we were in freeze mode doing the recovery.

Signed-off-by: Dave Olson <dave.olson at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_driver.c    |   57 ++++++++++++++++---------
 drivers/infiniband/hw/ipath/ipath_iba6110.c   |   13 +++---
 drivers/infiniband/hw/ipath/ipath_iba6120.c   |   16 ++++++-
 drivers/infiniband/hw/ipath/ipath_init_chip.c |    6 +++
 drivers/infiniband/hw/ipath/ipath_intr.c      |   13 ++----
 drivers/infiniband/hw/ipath/ipath_kernel.h    |    1 
 6 files changed, 68 insertions(+), 38 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index e963986..8b61179 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -706,9 +706,9 @@ void ipath_disarm_piobufs(struct ipath_devdata *dd, unsigned first,
 	u64 sendctrl, sendorig;
 
 	ipath_cdbg(PKT, "disarm %u PIObufs first=%u\n", cnt, first);
-	sendorig = dd->ipath_sendctrl | INFINIPATH_S_DISARM;
+	sendorig = dd->ipath_sendctrl;
 	for (i = first; i < last; i++) {
-		sendctrl = sendorig |
+		sendctrl = sendorig  | INFINIPATH_S_DISARM |
 			(i << INFINIPATH_S_DISARMPIOBUF_SHIFT);
 		ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
 				 sendctrl);
@@ -719,12 +719,12 @@ void ipath_disarm_piobufs(struct ipath_devdata *dd, unsigned first,
 	 * while we were looping; no critical bits that would require
 	 * locking.
 	 *
-	 * Write a 0, and then the original value, reading scratch in
+	 * disable PIOAVAILUPD, then re-enable, reading scratch in
 	 * between.  This seems to avoid a chip timing race that causes
 	 * pioavail updates to memory to stop.
 	 */
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
-			 0);
+			 sendorig & ~IPATH_S_PIOBUFAVAILUPD);
 	sendorig = ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
 			 dd->ipath_sendctrl);
@@ -1596,6 +1596,35 @@ int ipath_waitfor_mdio_cmdready(struct ipath_devdata *dd)
 	return ret;
 }
 
+
+/*
+ * Flush all sends that might be in the ready to send state, as well as any
+ * that are in the process of being sent.   Used whenever we need to be
+ * sure the send side is idle.  Cleans up all buffer state by canceling
+ * all pio buffers, and issuing an abort, which cleans up anything in the
+ * launch fifo.  The cancel is superfluous on some chip versions, but
+ * it's safer to always do it.
+ * PIOAvail bits are updated by the chip as if normal send had happened.
+ */
+void ipath_cancel_sends(struct ipath_devdata *dd)
+{
+	ipath_dbg("Cancelling all in-progress send buffers\n");
+	dd->ipath_lastcancel = jiffies+HZ/2; /* skip armlaunch errs a bit */
+	/*
+	 * the abort bit is auto-clearing.  We read scratch to be sure
+	 * that cancels and the abort have taken effect in the chip.
+	 */
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
+		INFINIPATH_S_ABORT);
+	ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
+	ipath_disarm_piobufs(dd, 0,
+		(unsigned)(dd->ipath_piobcnt2k + dd->ipath_piobcnt4k));
+
+	/* and again, be sure all have hit the chip */
+	ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
+}
+
+
 static void ipath_set_ib_lstate(struct ipath_devdata *dd, int which)
 {
 	static const char *what[4] = {
@@ -1617,14 +1646,8 @@ static void ipath_set_ib_lstate(struct ipath_devdata *dd, int which)
 			   INFINIPATH_IBCS_LINKTRAININGSTATE_MASK]);
 	/* flush all queued sends when going to DOWN or INIT, to be sure that
 	 * they don't block MAD packets */
-	if (!linkcmd || linkcmd == INFINIPATH_IBCC_LINKCMD_INIT) {
-		ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
-				 INFINIPATH_S_ABORT);
-		ipath_disarm_piobufs(dd, dd->ipath_lastport_piobuf,
-		                    (unsigned)(dd->ipath_piobcnt2k +
-				    dd->ipath_piobcnt4k) -
-				    dd->ipath_lastport_piobuf);
-	}
+	if (!linkcmd || linkcmd == INFINIPATH_IBCC_LINKCMD_INIT)
+		ipath_cancel_sends(dd);
 
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_ibcctrl,
 			 dd->ipath_ibcctrl | which);
@@ -1967,17 +1990,9 @@ void ipath_shutdown_device(struct ipath_devdata *dd)
 	 */
 	udelay(5);
 
-	/*
-	 * abort any armed or launched PIO buffers that didn't go. (self
-	 * clearing).  Will cause any packet currently being transmitted to
-	 * go out with an EBP, and may also cause a short packet error on
-	 * the receiver.
-	 */
-	ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
-			 INFINIPATH_S_ABORT);
-
 	ipath_set_ib_lstate(dd, INFINIPATH_IBCC_LINKINITCMD_DISABLE <<
 			    INFINIPATH_IBCC_LINKINITCMD_SHIFT);
+	ipath_cancel_sends(dd);
 
 	/* disable IBC */
 	dd->ipath_control &= ~INFINIPATH_C_LINKENABLE;
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c
index d8ac9f1..34d159a 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c
@@ -509,6 +509,13 @@ static void ipath_ht_handle_hwerrors(struct ipath_devdata *dd, char *msg,
 		if (!hwerrs) {
 			ipath_dbg("Clearing freezemode on ignored or "
 				  "recovered hardware error\n");
+			/*
+			 * clear all sends, becauase they have may been
+			 * completed by usercode while in freeze mode, and
+			 * therefore would not be sent, and eventually
+			 * might cause the process to run out of bufs
+			 */
+			ipath_cancel_sends(dd);
 			ctrl &= ~INFINIPATH_C_FREEZEMODE;
 			ipath_write_kreg(dd, dd->ipath_kregs->kr_control,
 					 ctrl);
@@ -1566,11 +1573,6 @@ static int ipath_ht_early_init(struct ipath_devdata *dd)
 		writel(16, piobuf);
 		piobuf += pioincr;
 	}
-	/*
-	 * self-clearing
-	 */
-	ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
-			 INFINIPATH_S_ABORT);
 
 	ipath_get_eeprom_info(dd);
 	if (dd->ipath_boardrev == 5 && dd->ipath_serial[0] == '1' &&
@@ -1599,7 +1601,6 @@ static int ipath_ht_txe_recover(struct ipath_devdata *dd)
 	}
 	dev_info(&dd->pcidev->dev,
 		"Recovering from TXE PIO parity error\n");
-	ipath_disarm_senderrbufs(dd, 1);
 	return 1;
 }
 
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c
index b931057..0c34555 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c
@@ -430,8 +430,19 @@ static void ipath_pe_handle_hwerrors(struct ipath_devdata *dd, char *msg,
 			*dd->ipath_statusp |= IPATH_STATUS_HWERROR;
 			dd->ipath_flags &= ~IPATH_INITTED;
 		} else {
-			ipath_dbg("Clearing freezemode on ignored hardware "
-				  "error\n");
+			static u32 freeze_cnt;
+
+			freeze_cnt++;
+			ipath_dbg("Clearing freezemode on ignored or recovered "
+				  "hardware error (%u)\n", freeze_cnt);
+			/*
+			 * clear all sends, becauase they have may been
+			 * completed by usercode while in freeze mode, and
+			 * therefore would not be sent, and eventually
+			 * might cause the process to run out of bufs
+			 */
+			ipath_cancel_sends(dd);
+			ctrl &= ~INFINIPATH_C_FREEZEMODE;
 			ipath_write_kreg(dd, dd->ipath_kregs->kr_control,
 			   		 dd->ipath_control);
 		}
@@ -1371,7 +1382,6 @@ static int ipath_pe_txe_recover(struct ipath_devdata *dd)
 		dev_info(&dd->pcidev->dev,
 			"Recovering from TXE PIO parity error\n");
 	}
-	ipath_disarm_senderrbufs(dd, 1);
 	return 1;
 }
 
diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c
index 9f61155..5193d69 100644
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c
@@ -777,6 +777,12 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit)
 		   piobufs, dd->ipath_pbufsport, uports);
 
 	dd->ipath_f_early_init(dd);
+	/*
+	 * cancel any possible active sends from early driver load.
+	 * Follows early_init because some chips have to initialize
+	 * PIO buffers in early_init to avoid false parity errors.
+	 */
+	ipath_cancel_sends(dd);
 
 	/* early_init sets rcvhdrentsize and rcvhdrsize, so this must be
 	 * done after early_init */
diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c
index d9cdd00..948091f 100644
--- a/drivers/infiniband/hw/ipath/ipath_intr.c
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c
@@ -93,7 +93,8 @@ void ipath_disarm_senderrbufs(struct ipath_devdata *dd, int rewrite)
 
 	if (sbuf[0] || sbuf[1] || (piobcnt > 128 && (sbuf[2] || sbuf[3]))) {
 		int i;
-		if (ipath_debug & (__IPATH_PKTDBG|__IPATH_DBG)) {
+		if (ipath_debug & (__IPATH_PKTDBG|__IPATH_DBG) &&
+			dd->ipath_lastcancel > jiffies) {
 			__IPATH_DBG_WHICH(__IPATH_PKTDBG|__IPATH_DBG,
 					  "SendbufErrs %lx %lx", sbuf[0],
 					  sbuf[1]);
@@ -108,7 +109,8 @@ void ipath_disarm_senderrbufs(struct ipath_devdata *dd, int rewrite)
 					ipath_clrpiobuf(dd, i);
 				ipath_disarm_piobufs(dd, i, 1);
 			}
-		dd->ipath_lastcancel = jiffies+3; /* no armlaunch for a bit */
+		/* ignore armlaunch errs for a bit */
+		dd->ipath_lastcancel = jiffies+3;
 	}
 }
 
@@ -290,12 +292,7 @@ static void handle_e_ibstatuschanged(struct ipath_devdata *dd,
 		 * Flush all queued sends when link went to DOWN or INIT,
 		 * to be sure that they don't block SMA and other MAD packets
 		 */
-		ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
-				 INFINIPATH_S_ABORT);
-		ipath_disarm_piobufs(dd, dd->ipath_lastport_piobuf,
-							(unsigned)(dd->ipath_piobcnt2k +
-					dd->ipath_piobcnt4k) -
-					dd->ipath_lastport_piobuf);
+		ipath_cancel_sends(dd);
 	}
 	else if (lstate == IPATH_IBSTATE_INIT || lstate == IPATH_IBSTATE_ARM ||
 	    lstate == IPATH_IBSTATE_ACTIVE) {
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index 2a4414b..2e85aec 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -676,6 +676,7 @@ int ipath_unordered_wc(void);
 
 void ipath_disarm_piobufs(struct ipath_devdata *, unsigned first,
 			  unsigned cnt);
+void ipath_cancel_sends(struct ipath_devdata *);
 
 int ipath_create_rcvhdrq(struct ipath_devdata *, struct ipath_portdata *);
 void ipath_free_pddata(struct ipath_devdata *, struct ipath_portdata *);


From arthur.jones at qlogic.com  Tue Jun 19 16:41:57 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:41:57 -0700
Subject: [ofa-general] [PATCH 15/28] IB/ipath - add barrier before updating
	WC head in shared memory
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234156.3794.26440.stgit@bauxite.internal.keyresearch.com>

From: Ralph Campbell <ralph.campbell at qlogic.com>

Add a barrier to make sure the CPU doesn't reorder writes
to memory since user programs can be polling on the head index
update and the entry should be written before that.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_cq.c    |    3 ++-
 drivers/infiniband/hw/ipath/ipath_ruc.c   |    1 +
 drivers/infiniband/hw/ipath/ipath_srq.c   |    1 +
 drivers/infiniband/hw/ipath/ipath_ud.c    |    1 +
 drivers/infiniband/hw/ipath/ipath_verbs.c |    1 +
 5 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c b/drivers/infiniband/hw/ipath/ipath_cq.c
index 3e9241b..8a2a774 100644
--- a/drivers/infiniband/hw/ipath/ipath_cq.c
+++ b/drivers/infiniband/hw/ipath/ipath_cq.c
@@ -90,6 +90,7 @@ void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int solicited)
 	wc->queue[head].sl = entry->sl;
 	wc->queue[head].dlid_path_bits = entry->dlid_path_bits;
 	wc->queue[head].port_num = entry->port_num;
+	wmb();
 	wc->head = next;
 
 	if (cq->notify == IB_CQ_NEXT_COMP ||
@@ -139,7 +140,7 @@ int ipath_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry)
 
 		if (tail == wc->head)
 			break;
-
+		rmb();
 		qp = ipath_lookup_qpn(&to_idev(cq->ibcq.device)->qp_table,
 				      wc->queue[tail].qp_num);
 		entry->qp = &qp->ibqp;
diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c
index 8c5d20a..103dea0 100644
--- a/drivers/infiniband/hw/ipath/ipath_ruc.c
+++ b/drivers/infiniband/hw/ipath/ipath_ruc.c
@@ -194,6 +194,7 @@ int ipath_get_rwqe(struct ipath_qp *qp, int wr_id_only)
 			ret = 0;
 			goto bail;
 		}
+		rmb();
 		wqe = get_rwqe_ptr(rq, tail);
 		if (++tail >= rq->size)
 			tail = 0;
diff --git a/drivers/infiniband/hw/ipath/ipath_srq.c b/drivers/infiniband/hw/ipath/ipath_srq.c
index 03acae6..4b4214e 100644
--- a/drivers/infiniband/hw/ipath/ipath_srq.c
+++ b/drivers/infiniband/hw/ipath/ipath_srq.c
@@ -80,6 +80,7 @@ int ipath_post_srq_receive(struct ib_srq *ibsrq, struct ib_recv_wr *wr,
 		wqe->num_sge = wr->num_sge;
 		for (i = 0; i < wr->num_sge; i++)
 			wqe->sg_list[i] = wr->sg_list[i];
+		wmb();
 		wq->head = next;
 		spin_unlock_irqrestore(&srq->rq.lock, flags);
 	}
diff --git a/drivers/infiniband/hw/ipath/ipath_ud.c b/drivers/infiniband/hw/ipath/ipath_ud.c
index a518f7c..eee54c7 100644
--- a/drivers/infiniband/hw/ipath/ipath_ud.c
+++ b/drivers/infiniband/hw/ipath/ipath_ud.c
@@ -176,6 +176,7 @@ static void ipath_ud_loopback(struct ipath_qp *sqp,
 			dev->n_pkt_drops++;
 			goto bail_sge;
 		}
+		rmb();
 		wqe = get_rwqe_ptr(rq, tail);
 		if (++tail >= rq->size)
 			tail = 0;
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 04294ca..b92006a 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -323,6 +323,7 @@ static int ipath_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr,
 		wqe->num_sge = wr->num_sge;
 		for (i = 0; i < wr->num_sge; i++)
 			wqe->sg_list[i] = wr->sg_list[i];
+		wmb();
 		wq->head = next;
 		spin_unlock_irqrestore(&qp->r_rq.lock, flags);
 	}


From arthur.jones at qlogic.com  Tue Jun 19 16:42:03 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:42:03 -0700
Subject: [ofa-general] [PATCH 16/28] IB/ipath - Fix RDMA read retry code
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234202.3794.36576.stgit@bauxite.internal.keyresearch.com>

From: Ralph Campbell <ralph.campbell at qlogic.com>

A RDMA read response or atomic response can ACK earlier sends
and RDMA writes.  In this case, the wrong work request pointer
was being used to store the read first response or atomic result.
Also, if a RDMA read request is retried, the code to compute
which request to resend was incorrect.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_rc.c |   57 +++++++++++++++++++++-----------
 1 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c
index 9ba80d1..014d811 100644
--- a/drivers/infiniband/hw/ipath/ipath_rc.c
+++ b/drivers/infiniband/hw/ipath/ipath_rc.c
@@ -806,13 +806,15 @@ static inline void update_last_psn(struct ipath_qp *qp, u32 psn)
  * Called at interrupt level with the QP s_lock held and interrupts disabled.
  * Returns 1 if OK, 0 if current operation should be aborted (NAK).
  */
-static int do_rc_ack(struct ipath_qp *qp, u32 aeth, u32 psn, int opcode)
+static int do_rc_ack(struct ipath_qp *qp, u32 aeth, u32 psn, int opcode,
+		     u64 val)
 {
 	struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
 	struct ib_wc wc;
 	struct ipath_swqe *wqe;
 	int ret = 0;
 	u32 ack_psn;
+	int diff;
 
 	/*
 	 * Remove the QP from the timeout queue (or RNR timeout queue).
@@ -840,7 +842,19 @@ static int do_rc_ack(struct ipath_qp *qp, u32 aeth, u32 psn, int opcode)
 	 * The MSN might be for a later WQE than the PSN indicates so
 	 * only complete WQEs that the PSN finishes.
 	 */
-	while (ipath_cmp24(ack_psn, wqe->lpsn) >= 0) {
+	while ((diff = ipath_cmp24(ack_psn, wqe->lpsn)) >= 0) {
+		/*
+		 * RDMA_READ_RESPONSE_ONLY is a special case since
+		 * we want to generate completion events for everything
+		 * before the RDMA read, copy the data, then generate
+		 * the completion for the read.
+		 */
+		if (wqe->wr.opcode == IB_WR_RDMA_READ &&
+		    opcode == OP(RDMA_READ_RESPONSE_ONLY) &&
+		    diff == 0) {
+			ret = 1;
+			goto bail;
+		}
 		/*
 		 * If this request is a RDMA read or atomic, and the ACK is
 		 * for a later operation, this ACK NAKs the RDMA read or
@@ -851,12 +865,10 @@ static int do_rc_ack(struct ipath_qp *qp, u32 aeth, u32 psn, int opcode)
 		 * is sent but before the response is received.
 		 */
 		if ((wqe->wr.opcode == IB_WR_RDMA_READ &&
-		     (opcode != OP(RDMA_READ_RESPONSE_LAST) ||
-		      ipath_cmp24(ack_psn, wqe->lpsn) != 0)) ||
+		     (opcode != OP(RDMA_READ_RESPONSE_LAST) || diff != 0)) ||
 		    ((wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP ||
 		      wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD) &&
-		     (opcode != OP(ATOMIC_ACKNOWLEDGE) ||
-		      ipath_cmp24(wqe->psn, psn) != 0))) {
+		     (opcode != OP(ATOMIC_ACKNOWLEDGE) || diff != 0))) {
 			/*
 			 * The last valid PSN seen is the previous
 			 * request's.
@@ -870,6 +882,9 @@ static int do_rc_ack(struct ipath_qp *qp, u32 aeth, u32 psn, int opcode)
 			 */
 			goto bail;
 		}
+		if (wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP ||
+		    wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD)
+			*(u64 *) wqe->sg_list[0].vaddr = val;
 		if (qp->s_num_rd_atomic &&
 		    (wqe->wr.opcode == IB_WR_RDMA_READ ||
 		     wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP ||
@@ -1079,6 +1094,7 @@ static inline void ipath_rc_rcv_resp(struct ipath_ibdev *dev,
 	int diff;
 	u32 pad;
 	u32 aeth;
+	u64 val;
 
 	spin_lock_irqsave(&qp->s_lock, flags);
 
@@ -1118,8 +1134,6 @@ static inline void ipath_rc_rcv_resp(struct ipath_ibdev *dev,
 			data += sizeof(__be32);
 		}
 		if (opcode == OP(ATOMIC_ACKNOWLEDGE)) {
-			u64 val;
-
 			if (!header_in_data) {
 				__be32 *p = ohdr->u.at.atomic_ack_eth;
 
@@ -1127,12 +1141,13 @@ static inline void ipath_rc_rcv_resp(struct ipath_ibdev *dev,
 					be32_to_cpu(p[1]);
 			} else
 				val = be64_to_cpu(((__be64 *) data)[0]);
-			*(u64 *) wqe->sg_list[0].vaddr = val;
-		}
-		if (!do_rc_ack(qp, aeth, psn, opcode) ||
+		} else
+			val = 0;
+		if (!do_rc_ack(qp, aeth, psn, opcode, val) ||
 		    opcode != OP(RDMA_READ_RESPONSE_FIRST))
 			goto ack_done;
 		hdrsize += 4;
+		wqe = get_swqe_ptr(qp, qp->s_last);
 		if (unlikely(wqe->wr.opcode != IB_WR_RDMA_READ))
 			goto ack_op_err;
 		/*
@@ -1176,13 +1191,12 @@ static inline void ipath_rc_rcv_resp(struct ipath_ibdev *dev,
 		goto bail;
 
 	case OP(RDMA_READ_RESPONSE_ONLY):
-		if (unlikely(ipath_cmp24(psn, qp->s_last_psn + 1))) {
-			dev->n_rdma_seq++;
-			ipath_restart_rc(qp, qp->s_last_psn + 1, &wc);
+		if (!header_in_data)
+			aeth = be32_to_cpu(ohdr->u.aeth);
+		else
+			aeth = be32_to_cpu(((__be32 *) data)[0]);
+		if (!do_rc_ack(qp, aeth, psn, opcode, 0))
 			goto ack_done;
-		}
-		if (unlikely(wqe->wr.opcode != IB_WR_RDMA_READ))
-			goto ack_op_err;
 		/* Get the number of bytes the message was padded by. */
 		pad = (be32_to_cpu(ohdr->bth[0]) >> 20) & 3;
 		/*
@@ -1197,6 +1211,7 @@ static inline void ipath_rc_rcv_resp(struct ipath_ibdev *dev,
 		 * have to be careful to copy the data to the right
 		 * location.
 		 */
+		wqe = get_swqe_ptr(qp, qp->s_last);
 		qp->s_rdma_read_len = restart_sge(&qp->s_rdma_read_sge,
 						  wqe, psn, pmtu);
 		goto read_last;
@@ -1230,7 +1245,8 @@ static inline void ipath_rc_rcv_resp(struct ipath_ibdev *dev,
 			data += sizeof(__be32);
 		}
 		ipath_copy_sge(&qp->s_rdma_read_sge, data, tlen);
-		(void) do_rc_ack(qp, aeth, psn, OP(RDMA_READ_RESPONSE_LAST));
+		(void) do_rc_ack(qp, aeth, psn,
+				 OP(RDMA_READ_RESPONSE_LAST), 0);
 		goto ack_done;
 	}
 
@@ -1344,8 +1360,11 @@ static inline int ipath_rc_rcv_error(struct ipath_ibdev *dev,
 			e = NULL;
 			break;
 		}
-		if (ipath_cmp24(psn, e->psn) >= 0)
+		if (ipath_cmp24(psn, e->psn) >= 0) {
+			if (prev == qp->s_tail_ack_queue)
+				old_req = 0;
 			break;
+		}
 	}
 	switch (opcode) {
 	case OP(RDMA_READ_REQUEST): {


From arthur.jones at qlogic.com  Tue Jun 19 16:42:09 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:42:09 -0700
Subject: [ofa-general] [PATCH 17/28] IB/ipath - wait for PIO available
	interrupt
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234208.3794.75336.stgit@bauxite.internal.keyresearch.com>

From: Ralph Campbell <ralph.campbell at qlogic.com>

The send function is called when posting new send work requests.
There is no point in trying to send a packet if the QP is already
waiting for a HW send buffer so don't clear the busy bit until the
buffer available interrupt happens.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_ruc.c   |    6 ++----
 drivers/infiniband/hw/ipath/ipath_verbs.c |    1 +
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c
index 103dea0..7d09f5b 100644
--- a/drivers/infiniband/hw/ipath/ipath_ruc.c
+++ b/drivers/infiniband/hw/ipath/ipath_ruc.c
@@ -504,11 +504,9 @@ void ipath_no_bufs_available(struct ipath_qp *qp, struct ipath_ibdev *dev)
 	 * could be called.  If we are still in the tasklet function,
 	 * tasklet_hi_schedule() will not call us until the next time
 	 * tasklet_hi_schedule() is called.
-	 * We clear the tasklet flag now since we are committing to return
-	 * from the tasklet function.
+	 * We leave the busy flag set so that another post send doesn't
+	 * try to put the same QP on the piowait list again.
 	 */
-	clear_bit(IPATH_S_BUSY, &qp->s_busy);
-	tasklet_unlock(&qp->s_task);
 	want_buffer(dev->dd);
 	dev->n_piowait++;
 }
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index b92006a..68952be 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -949,6 +949,7 @@ int ipath_ib_piobufavail(struct ipath_ibdev *dev)
 		qp = list_entry(dev->piowait.next, struct ipath_qp,
 				piowait);
 		list_del_init(&qp->piowait);
+		clear_bit(IPATH_S_BUSY, &qp->s_busy);
 		tasklet_hi_schedule(&qp->s_task);
 	}
 	spin_unlock_irqrestore(&dev->pending_lock, flags);


From arthur.jones at qlogic.com  Tue Jun 19 16:42:15 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:42:15 -0700
Subject: [ofa-general] [PATCH 18/28] IB/ipath - Possible data corruption if
	multiple SGEs used for receive
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234214.3794.13513.stgit@bauxite.internal.keyresearch.com>

From: Ralph Campbell <ralph.campbell at qlogic.com>

The code to copy data from the receive queue buffers to the IB
SGEs doesn't check the SGE length, only the memory region/page length
when copying data. This could overwrite parts of the user's memory that
were not intended to be written. It can only happen if multiple SGEs
are used to describe a receive buffer which almost never happens
in practice.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_ruc.c   |    2 ++
 drivers/infiniband/hw/ipath/ipath_ud.c    |    2 ++
 drivers/infiniband/hw/ipath/ipath_verbs.c |    8 ++++++--
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c
index 7d09f5b..1a5afaf 100644
--- a/drivers/infiniband/hw/ipath/ipath_ruc.c
+++ b/drivers/infiniband/hw/ipath/ipath_ruc.c
@@ -397,6 +397,8 @@ again:
 
 		if (len > sge->length)
 			len = sge->length;
+		if (len > sge->sge_length)
+			len = sge->sge_length;
 		BUG_ON(len == 0);
 		ipath_copy_sge(&qp->r_sge, sge->vaddr, len);
 		sge->vaddr += len;
diff --git a/drivers/infiniband/hw/ipath/ipath_ud.c b/drivers/infiniband/hw/ipath/ipath_ud.c
index eee54c7..26171e5 100644
--- a/drivers/infiniband/hw/ipath/ipath_ud.c
+++ b/drivers/infiniband/hw/ipath/ipath_ud.c
@@ -232,6 +232,8 @@ static void ipath_ud_loopback(struct ipath_qp *sqp,
 
 		if (len > length)
 			len = length;
+		if (len > sge->sge_length)
+			len = sge->sge_length;
 		BUG_ON(len == 0);
 		ipath_copy_sge(&rsge, sge->vaddr, len);
 		sge->vaddr += len;
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 68952be..6753f7d 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -164,9 +164,11 @@ void ipath_copy_sge(struct ipath_sge_state *ss, void *data, u32 length)
 	while (length) {
 		u32 len = sge->length;
 
-		BUG_ON(len == 0);
 		if (len > length)
 			len = length;
+		if (len > sge->sge_length)
+			len = sge->sge_length;
+		BUG_ON(len == 0);
 		memcpy(sge->vaddr, data, len);
 		sge->vaddr += len;
 		sge->length -= len;
@@ -202,9 +204,11 @@ void ipath_skip_sge(struct ipath_sge_state *ss, u32 length)
 	while (length) {
 		u32 len = sge->length;
 
-		BUG_ON(len == 0);
 		if (len > length)
 			len = length;
+		if (len > sge->sge_length)
+			len = sge->sge_length;
+		BUG_ON(len == 0);
 		sge->vaddr += len;
 		sge->length -= len;
 		sge->sge_length -= len;


From arthur.jones at qlogic.com  Tue Jun 19 16:42:21 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:42:21 -0700
Subject: [ofa-general] [PATCH 19/28] IB/ipath - Duplicate RDMA reads can
	cause responder to NAK inappropriately
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234220.3794.7662.stgit@bauxite.internal.keyresearch.com>

From: Ralph Campbell <ralph.campbell at qlogic.com>

A duplicate RDMA read request can fool the responder into NAKing
a new RDMA read request because the responder wasn't keeping track
of whether the queue of RDMA read requests had been sent at least once.
For example, requester sends 4 2K byte RDMA read requests, times out,
and resends the first, then sees the 4 responses, then sends a 5th
RDMA read or atomic operation. The responder sees the 4 requests,
sends 4 responses, sees the resent 1st request, rewinds the queue,
then sees the 5th request but thinks the queue is full and that
the requester is invalidly sending a 5th new request.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_rc.c    |   38 +++++++++++++++++++++++++----
 drivers/infiniband/hw/ipath/ipath_verbs.h |    1 +
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c
index 014d811..9e71239 100644
--- a/drivers/infiniband/hw/ipath/ipath_rc.c
+++ b/drivers/infiniband/hw/ipath/ipath_rc.c
@@ -125,8 +125,10 @@ static int ipath_make_rc_ack(struct ipath_qp *qp,
 			if (len > pmtu) {
 				len = pmtu;
 				qp->s_ack_state = OP(RDMA_READ_RESPONSE_FIRST);
-			} else
+			} else {
 				qp->s_ack_state = OP(RDMA_READ_RESPONSE_ONLY);
+				e->sent = 1;
+			}
 			ohdr->u.aeth = ipath_compute_aeth(qp);
 			hwords++;
 			qp->s_ack_rdma_psn = e->psn;
@@ -143,6 +145,7 @@ static int ipath_make_rc_ack(struct ipath_qp *qp,
 				cpu_to_be32(e->atomic_data);
 			hwords += sizeof(ohdr->u.at) / sizeof(u32);
 			bth2 = e->psn;
+			e->sent = 1;
 		}
 		bth0 = qp->s_ack_state << 24;
 		break;
@@ -158,6 +161,7 @@ static int ipath_make_rc_ack(struct ipath_qp *qp,
 			ohdr->u.aeth = ipath_compute_aeth(qp);
 			hwords++;
 			qp->s_ack_state = OP(RDMA_READ_RESPONSE_LAST);
+			qp->s_ack_queue[qp->s_tail_ack_queue].sent = 1;
 		}
 		bth0 = qp->s_ack_state << 24;
 		bth2 = qp->s_ack_rdma_psn++ & IPATH_PSN_MASK;
@@ -1479,6 +1483,22 @@ static void ipath_rc_error(struct ipath_qp *qp, enum ib_wc_status err)
 	spin_unlock_irqrestore(&qp->s_lock, flags);
 }
 
+static inline void ipath_update_ack_queue(struct ipath_qp *qp, unsigned n)
+{
+	unsigned long flags;
+	unsigned next;
+
+	next = n + 1;
+	if (next > IPATH_MAX_RDMA_ATOMIC)
+		next = 0;
+	spin_lock_irqsave(&qp->s_lock, flags);
+	if (n == qp->s_tail_ack_queue) {
+		qp->s_tail_ack_queue = next;
+		qp->s_ack_state = OP(ACKNOWLEDGE);
+	}
+	spin_unlock_irqrestore(&qp->s_lock, flags);
+}
+
 /**
  * ipath_rc_rcv - process an incoming RC packet
  * @dev: the device this packet came in on
@@ -1741,8 +1761,11 @@ void ipath_rc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr,
 		next = qp->r_head_ack_queue + 1;
 		if (next > IPATH_MAX_RDMA_ATOMIC)
 			next = 0;
-		if (unlikely(next == qp->s_tail_ack_queue))
-			goto nack_inv;
+		if (unlikely(next == qp->s_tail_ack_queue)) {
+			if (!qp->s_ack_queue[next].sent)
+				goto nack_inv;
+			ipath_update_ack_queue(qp, next);
+		}
 		e = &qp->s_ack_queue[qp->r_head_ack_queue];
 		/* RETH comes after BTH */
 		if (!header_in_data)
@@ -1777,6 +1800,7 @@ void ipath_rc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr,
 			e->rdma_sge.sge.sge_length = 0;
 		}
 		e->opcode = opcode;
+		e->sent = 0;
 		e->psn = psn;
 		/*
 		 * We need to increment the MSN here instead of when we
@@ -1812,8 +1836,11 @@ void ipath_rc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr,
 		next = qp->r_head_ack_queue + 1;
 		if (next > IPATH_MAX_RDMA_ATOMIC)
 			next = 0;
-		if (unlikely(next == qp->s_tail_ack_queue))
-			goto nack_inv;
+		if (unlikely(next == qp->s_tail_ack_queue)) {
+			if (!qp->s_ack_queue[next].sent)
+				goto nack_inv;
+			ipath_update_ack_queue(qp, next);
+		}
 		if (!header_in_data)
 			ateth = &ohdr->u.atomic_eth;
 		else
@@ -1838,6 +1865,7 @@ void ipath_rc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr,
 				      be64_to_cpu(ateth->compare_data),
 				      sdata);
 		e->opcode = opcode;
+		e->sent = 0;
 		e->psn = psn & IPATH_PSN_MASK;
 		qp->r_msn++;
 		qp->r_psn++;
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h
index 088b837..458f822 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h
@@ -321,6 +321,7 @@ struct ipath_sge_state {
  */
 struct ipath_ack_entry {
 	u8 opcode;
+	u8 sent;
 	u32 psn;
 	union {
 		struct ipath_sge_state rdma_sge;


From arthur.jones at qlogic.com  Tue Jun 19 16:42:27 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:42:27 -0700
Subject: [ofa-general] [PATCH 20/28] IB/ipath - Correct checking of swminor
	version field when using subports
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234226.3794.45007.stgit@bauxite.internal.keyresearch.com>

From: Mark Debbage <mark.debbage at qlogic.com>

When subports are required to run a program, this patch checks that the
driver and the user-space library have compatible subport
implementations. This is achieved through checks on the swminor version
field built into the driver and user-space library. Bad combinations are
reported through syslog and result in an error when opening the port.

Signed-off-by: Mark Debbage <mark.debbage at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_file_ops.c |   64 ++++++++++++++++++++++----
 1 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
index 931802b..fc83f40 100644
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
@@ -1403,6 +1403,38 @@ bail:
 	return pollflag;
 }
 
+static int ipath_supports_subports(int user_swmajor, int user_swminor)
+{
+	/* no subport implementation prior to software version 1.3 */
+	return (user_swmajor > 1) || (user_swminor >= 3);
+}
+
+static int ipath_compatible_subports(int user_swmajor, int user_swminor)
+{
+	/* this code is written long-hand for clarity */
+	if (IPATH_USER_SWMAJOR != user_swmajor) {
+		/* no promise of compatibility if major mismatch */
+		return 0;
+	}
+	if (IPATH_USER_SWMAJOR == 1) {
+		switch (IPATH_USER_SWMINOR) {
+		case 0:
+		case 1:
+		case 2:
+			/* no subport implementation so cannot be compatible */
+			return 0;
+		case 3:
+			/* 3 is only compatible with itself */
+			return user_swminor == 3;
+		default:
+			/* >= 4 are compatible (or are expected to be) */
+			return user_swminor >= 4;
+		}
+	}
+	/* make no promises yet for future major versions */
+	return 0;
+}
+
 static int init_subports(struct ipath_devdata *dd,
 			 struct ipath_portdata *pd,
 			 const struct ipath_user_info *uinfo)
@@ -1418,14 +1450,26 @@ static int init_subports(struct ipath_devdata *dd,
 	if (uinfo->spu_subport_cnt <= 1)
 		goto bail;
 
-	/* Old user binaries don't know about new subport implementation */
-	if ((uinfo->spu_userversion & 0xffff) != IPATH_USER_SWMINOR) {
+	/* Self-consistency check for ipath_compatible_subports() */
+	if (ipath_supports_subports(IPATH_USER_SWMAJOR, IPATH_USER_SWMINOR) &&
+	    !ipath_compatible_subports(IPATH_USER_SWMAJOR,
+				       IPATH_USER_SWMINOR)) {
 		dev_info(&dd->pcidev->dev,
-			 "Mismatched user minor version (%d) and driver "
-                         "minor version (%d) while port sharing. Ensure "
+			 "Inconsistent ipath_compatible_subports()\n");
+		goto bail;
+	}
+
+	/* Check for subport compatibility */
+	if (!ipath_compatible_subports(uinfo->spu_userversion >> 16,
+				       uinfo->spu_userversion & 0xffff)) {
+		dev_info(&dd->pcidev->dev,
+			 "Mismatched user version (%d.%d) and driver "
+			 "version (%d.%d) while port sharing. Ensure "
                          "that driver and library are from the same "
                          "release.\n",
+			 (int) (uinfo->spu_userversion >> 16),
                          (int) (uinfo->spu_userversion & 0xffff),
+			 IPATH_USER_SWMAJOR,
 	                 IPATH_USER_SWMINOR);
 		goto bail;
 	}
@@ -1729,14 +1773,13 @@ static int ipath_open(struct inode *in, struct file *fp)
 	return fp->private_data ? 0 : -ENOMEM;
 }
 
-
 /* Get port early, so can set affinity prior to memory allocation */
 static int ipath_assign_port(struct file *fp,
 			      const struct ipath_user_info *uinfo)
 {
 	int ret;
 	int i_minor;
-	unsigned swminor;
+	unsigned swmajor, swminor;
 
 	/* Check to be sure we haven't already initialized this file */
 	if (port_fp(fp)) {
@@ -1745,7 +1788,8 @@ static int ipath_assign_port(struct file *fp,
 	}
 
 	/* for now, if major version is different, bail */
-	if ((uinfo->spu_userversion >> 16) != IPATH_USER_SWMAJOR) {
+	swmajor = uinfo->spu_userversion >> 16;
+	if (swmajor != IPATH_USER_SWMAJOR) {
 		ipath_dbg("User major version %d not same as driver "
 			  "major %d\n", uinfo->spu_userversion >> 16,
 			  IPATH_USER_SWMAJOR);
@@ -1760,7 +1804,8 @@ static int ipath_assign_port(struct file *fp,
 
 	mutex_lock(&ipath_mutex);
 
-	if (swminor == IPATH_USER_SWMINOR && uinfo->spu_subport_cnt &&
+	if (ipath_compatible_subports(swmajor, swminor) &&
+	    uinfo->spu_subport_cnt &&
 	    (ret = find_shared_port(fp, uinfo))) {
 		mutex_unlock(&ipath_mutex);
 		if (ret > 0)
@@ -2024,7 +2069,8 @@ static int ipath_port_info(struct ipath_portdata *pd, u16 subport,
 	info.port = pd->port_port;
 	info.subport = subport;
 	/* Don't return new fields if old library opened the port. */
-	if ((pd->userversion & 0xffff) == IPATH_USER_SWMINOR) {
+	if (ipath_supports_subports(pd->userversion >> 16,
+				    pd->userversion & 0xffff)) {
 		/* Number of user ports available for this device. */
 		info.num_ports = pd->port_dd->ipath_cfgports - 1;
 		info.num_subports = pd->port_subport_cnt;


From arthur.jones at qlogic.com  Tue Jun 19 16:42:34 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:42:34 -0700
Subject: [ofa-general] [PATCH 21/28] IB/ipath - Consistent handling for one
	subport
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234232.3794.65280.stgit@bauxite.internal.keyresearch.com>

From: Mark Debbage <mark.debbage at qlogic.com>

Previously the driver and user-space code handled the case of 1 subport
somewhat inconsistently. The new interpretation of this situation is
that if one subport is requested, the driver turns on the subport
mechanism and arranges for the port to be "shared" by one process. In
normal use the user-space library does not use this configuration and
instead arranges for the port not to be shared at all. This particular
idiom can be useful for testing purposes.

Signed-off-by: Mark Debbage <mark.debbage at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_file_ops.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
index fc83f40..a474796 100644
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
@@ -1444,10 +1444,10 @@ static int init_subports(struct ipath_devdata *dd,
 	size_t size;
 
 	/*
-	 * If the user is requesting zero or one port,
+	 * If the user is requesting zero subports,
 	 * skip the subport allocation.
 	 */
-	if (uinfo->spu_subport_cnt <= 1)
+	if (uinfo->spu_subport_cnt <= 0)
 		goto bail;
 
 	/* Self-consistency check for ipath_compatible_subports() */


From arthur.jones at qlogic.com  Tue Jun 19 16:42:41 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:42:41 -0700
Subject: [ofa-general] [PATCH 22/28] IB/ipath - Add capability to modify PBC
	word
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234240.3794.882.stgit@bauxite.internal.keyresearch.com>

From: Michael Albaugh <Michael.Albaugh at qlogic.com>

During compliance testing and when debugging some interconnect issues,
it is very useful to be able to send malformed packets, without having
the device signal them as malformed (drop, or terminate with EBP). The
hardware supports this, but the driver "diagnostic packet" interface did
not.

Extend capability to send specific malformed packets for testing.

Signed-off-by: Michael Albaugh <Michael.Albaugh at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_common.h |   19 +++++++++++++-
 drivers/infiniband/hw/ipath/ipath_diag.c   |   39 ++++++++++++++++++++++++----
 2 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_common.h b/drivers/infiniband/hw/ipath/ipath_common.h
index 12e1349..f70788c 100644
--- a/drivers/infiniband/hw/ipath/ipath_common.h
+++ b/drivers/infiniband/hw/ipath/ipath_common.h
@@ -501,13 +501,30 @@ struct __ipath_sendpkt {
 	struct ipath_iovec sps_iov[4];
 };
 
-/* Passed into diag data special file's ->write method. */
+/*
+ * diagnostics can send a packet by "writing" one of the following
+ * two structs to diag data special file
+ * The first is the legacy version for backward compatibility
+ */
 struct ipath_diag_pkt {
 	__u32 unit;
 	__u64 data;
 	__u32 len;
 };
 
+/* The second diag_pkt struct is the expanded version that allows
+ * more control over the packet, specifically, by allowing a custom
+ * pbc (+ extra) qword, so that special modes and deliberate
+ * changes to CRCs can be used. The elements were also re-ordered
+ * for better alignment and to avoid padding issues.
+ */
+struct ipath_diag_xpkt {
+	__u64 data;
+	__u64 pbc_wd;
+	__u32 unit;
+	__u32 len;
+};
+
 /*
  * Data layout in I2C flash (for GUID, etc.)
  * All fields are little-endian binary unless otherwise stated
diff --git a/drivers/infiniband/hw/ipath/ipath_diag.c b/drivers/infiniband/hw/ipath/ipath_diag.c
index 63e8368..aab21c1 100644
--- a/drivers/infiniband/hw/ipath/ipath_diag.c
+++ b/drivers/infiniband/hw/ipath/ipath_diag.c
@@ -323,13 +323,14 @@ static ssize_t ipath_diagpkt_write(struct file *fp,
 {
 	u32 __iomem *piobuf;
 	u32 plen, clen, pbufn;
-	struct ipath_diag_pkt dp;
+	struct ipath_diag_pkt odp;
+	struct ipath_diag_xpkt dp;
 	u32 *tmpbuf = NULL;
 	struct ipath_devdata *dd;
 	ssize_t ret = 0;
 	u64 val;
 
-	if (count < sizeof(dp)) {
+	if (count != sizeof(dp)) {
 		ret = -EINVAL;
 		goto bail;
 	}
@@ -339,6 +340,29 @@ static ssize_t ipath_diagpkt_write(struct file *fp,
 		goto bail;
 	}
 
+	/*
+	 * Due to padding/alignment issues (lessened with new struct)
+	 * the old and new structs are the same length. We need to
+	 * disambiguate them, which we can do because odp.len has never
+	 * been less than the total of LRH+BTH+DETH so far, while
+	 * dp.unit (same offset) unit is unlikely to get that high.
+	 * Similarly, dp.data, the pointer to user at the same offset
+	 * as odp.unit, is almost certainly at least one (512byte)page
+	 * "above" NULL. The if-block below can be omitted if compatibility
+	 * between a new driver and older diagnostic code is unimportant.
+	 * compatibility the other direction (new diags, old driver) is
+	 * handled in the diagnostic code, with a warning.
+	 */
+	if (dp.unit >= 20 && dp.data < 512) {
+		/* very probable version mismatch. Fix it up */
+		memcpy(&odp, &dp, sizeof(odp));
+		/* We got a legacy dp, copy elements to dp */
+		dp.unit = odp.unit;
+		dp.data = odp.data;
+		dp.len = odp.len;
+		dp.pbc_wd = 0; /* Indicate we need to compute PBC wd */
+	}
+
 	/* send count must be an exact number of dwords */
 	if (dp.len & 3) {
 		ret = -EINVAL;
@@ -371,9 +395,10 @@ static ssize_t ipath_diagpkt_write(struct file *fp,
 		ret = -ENODEV;
 		goto bail;
 	}
+	/* Check link state, but not if we have custom PBC */
 	val = dd->ipath_lastibcstat & IPATH_IBSTATE_MASK;
-	if (val != IPATH_IBSTATE_INIT && val != IPATH_IBSTATE_ARM &&
-	    val != IPATH_IBSTATE_ACTIVE) {
+	if (!dp.pbc_wd && val != IPATH_IBSTATE_INIT &&
+		val != IPATH_IBSTATE_ARM && val != IPATH_IBSTATE_ACTIVE) {
 		ipath_cdbg(VERBOSE, "unit %u not ready (state %llx)\n",
 			   dd->ipath_unit, (unsigned long long) val);
 		ret = -EINVAL;
@@ -419,9 +444,13 @@ static ssize_t ipath_diagpkt_write(struct file *fp,
 		ipath_cdbg(VERBOSE, "unit %u 0x%x+1w pio%d\n",
 			   dd->ipath_unit, plen - 1, pbufn);
 
+	if (dp.pbc_wd == 0)
+		/* Legacy operation, use computed pbc_wd */
+		dp.pbc_wd = plen;
+
 	/* we have to flush after the PBC for correctness on some cpus
 	 * or WC buffer can be written out of order */
-	writeq(plen, piobuf);
+	writeq(dp.pbc_wd, piobuf);
 	ipath_flush_wc();
 	/* copy all by the trigger word, then flush, so it's written
 	 * to chip before trigger word, then write trigger word, then


From arthur.jones at qlogic.com  Tue Jun 19 16:42:47 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:42:47 -0700
Subject: [ofa-general] [PATCH 23/28] IB/ipath - send ACK invalid where
	appropriate
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234246.3794.44838.stgit@bauxite.internal.keyresearch.com>

From: Robert Walsh <robert.walsh at qlogic.com>

The IB specification ch. 9.9.3 table 58 says that a QP which isn't
set up for the operation should return a NAK invalid request.

Signed-off-by: Robert Walsh <robert.walsh at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_rc.c  |   13 +++++++------
 drivers/infiniband/hw/ipath/ipath_ruc.c |   22 ++++++++++++++++++----
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c
index 9e71239..6423d9e 100644
--- a/drivers/infiniband/hw/ipath/ipath_rc.c
+++ b/drivers/infiniband/hw/ipath/ipath_rc.c
@@ -1711,6 +1711,9 @@ void ipath_rc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr,
 	case OP(RDMA_WRITE_FIRST):
 	case OP(RDMA_WRITE_ONLY):
 	case OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE):
+		if (unlikely(!(qp->qp_access_flags &
+			       IB_ACCESS_REMOTE_WRITE)))
+			goto nack_inv;
 		/* consume RWQE */
 		/* RETH comes after BTH */
 		if (!header_in_data)
@@ -1740,9 +1743,6 @@ void ipath_rc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr,
 			qp->r_sge.sge.length = 0;
 			qp->r_sge.sge.sge_length = 0;
 		}
-		if (unlikely(!(qp->qp_access_flags &
-			       IB_ACCESS_REMOTE_WRITE)))
-			goto nack_acc;
 		if (opcode == OP(RDMA_WRITE_FIRST))
 			goto send_middle;
 		else if (opcode == OP(RDMA_WRITE_ONLY))
@@ -1756,8 +1756,9 @@ void ipath_rc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr,
 		u32 len;
 		u8 next;
 
-		if (unlikely(!(qp->qp_access_flags & IB_ACCESS_REMOTE_READ)))
-			goto nack_acc;
+		if (unlikely(!(qp->qp_access_flags &
+			       IB_ACCESS_REMOTE_READ)))
+			goto nack_inv;
 		next = qp->r_head_ack_queue + 1;
 		if (next > IPATH_MAX_RDMA_ATOMIC)
 			next = 0;
@@ -1832,7 +1833,7 @@ void ipath_rc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr,
 
 		if (unlikely(!(qp->qp_access_flags &
 			       IB_ACCESS_REMOTE_ATOMIC)))
-			goto nack_acc;
+			goto nack_inv;
 		next = qp->r_head_ack_queue + 1;
 		if (next > IPATH_MAX_RDMA_ATOMIC)
 			next = 0;
diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c
index 1a5afaf..c44e015 100644
--- a/drivers/infiniband/hw/ipath/ipath_ruc.c
+++ b/drivers/infiniband/hw/ipath/ipath_ruc.c
@@ -320,12 +320,22 @@ again:
 		break;
 
 	case IB_WR_RDMA_WRITE_WITH_IMM:
+		if (unlikely(!(qp->qp_access_flags &
+			       IB_ACCESS_REMOTE_WRITE))) {
+			wc.status = IB_WC_REM_INV_REQ_ERR;
+			goto err;
+		}
 		wc.wc_flags = IB_WC_WITH_IMM;
 		wc.imm_data = wqe->wr.imm_data;
 		if (!ipath_get_rwqe(qp, 1))
 			goto rnr_nak;
 		/* FALLTHROUGH */
 	case IB_WR_RDMA_WRITE:
+		if (unlikely(!(qp->qp_access_flags &
+			       IB_ACCESS_REMOTE_WRITE))) {
+			wc.status = IB_WC_REM_INV_REQ_ERR;
+			goto err;
+		}
 		if (wqe->length == 0)
 			break;
 		if (unlikely(!ipath_rkey_ok(qp, &qp->r_sge, wqe->length,
@@ -355,8 +365,10 @@ again:
 
 	case IB_WR_RDMA_READ:
 		if (unlikely(!(qp->qp_access_flags &
-			       IB_ACCESS_REMOTE_READ)))
-			goto acc_err;
+			       IB_ACCESS_REMOTE_READ))) {
+			wc.status = IB_WC_REM_INV_REQ_ERR;
+			goto err;
+		}
 		if (unlikely(!ipath_rkey_ok(qp, &sqp->s_sge, wqe->length,
 					    wqe->wr.wr.rdma.remote_addr,
 					    wqe->wr.wr.rdma.rkey,
@@ -370,8 +382,10 @@ again:
 	case IB_WR_ATOMIC_CMP_AND_SWP:
 	case IB_WR_ATOMIC_FETCH_AND_ADD:
 		if (unlikely(!(qp->qp_access_flags &
-			       IB_ACCESS_REMOTE_ATOMIC)))
-			goto acc_err;
+			       IB_ACCESS_REMOTE_ATOMIC))) {
+			wc.status = IB_WC_REM_INV_REQ_ERR;
+			goto err;
+		}
 		if (unlikely(!ipath_rkey_ok(qp, &qp->r_sge, sizeof(u64),
 					    wqe->wr.wr.atomic.remote_addr,
 					    wqe->wr.wr.atomic.rkey,


From arthur.jones at qlogic.com  Tue Jun 19 16:42:52 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:42:52 -0700
Subject: [ofa-general] [PATCH 24/28] IB/ipath - ipath_poll fixups and
	enhancements
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com>

From: Robert Walsh <robert.walsh at qlogic.com>

Fix ipath_poll and enhance it so we can poll for urgent packets or regular
packets and receive notifications of when a header queue overflows.

Signed-off-by: Robert Walsh <robert.walsh at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_common.h   |   11 ++
 drivers/infiniband/hw/ipath/ipath_file_ops.c |  125 +++++++++++++++++---------
 drivers/infiniband/hw/ipath/ipath_intr.c     |   38 ++++++--
 drivers/infiniband/hw/ipath/ipath_kernel.h   |    8 ++
 4 files changed, 131 insertions(+), 51 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_common.h b/drivers/infiniband/hw/ipath/ipath_common.h
index f70788c..b4b786d 100644
--- a/drivers/infiniband/hw/ipath/ipath_common.h
+++ b/drivers/infiniband/hw/ipath/ipath_common.h
@@ -431,8 +431,15 @@ struct ipath_user_info {
 #define IPATH_CMD_UNUSED_1	25
 #define IPATH_CMD_UNUSED_2	26
 #define IPATH_CMD_PIOAVAILUPD	27	/* force an update of PIOAvail reg */
+#define IPATH_CMD_POLL_TYPE	28	/* set the kind of polling we want */
 
-#define IPATH_CMD_MAX		27
+#define IPATH_CMD_MAX		28
+
+/*
+ * Poll types
+ */
+#define IPATH_POLL_TYPE_URGENT	 0x01
+#define IPATH_POLL_TYPE_OVERFLOW 0x02
 
 struct ipath_port_info {
 	__u32 num_active;	/* number of active units */
@@ -473,6 +480,8 @@ struct ipath_cmd {
 		__u16 part_key;
 		/* user address of __u32 bitmask of active slaves */
 		__u64 slave_mask_addr;
+		/* type of polling we want */
+		__u16 poll_type;
 	} cmd;
 };
 
diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
index a474796..33ab0d6 100644
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
@@ -1341,65 +1341,98 @@ bail:
 	return ret;
 }
 
-static unsigned int ipath_poll(struct file *fp,
-			       struct poll_table_struct *pt)
+static unsigned int ipath_poll_urgent(struct ipath_portdata *pd,
+				      struct file *fp,
+				      struct poll_table_struct *pt)
 {
-	struct ipath_portdata *pd;
-	u32 head, tail;
-	int bit;
 	unsigned pollflag = 0;
 	struct ipath_devdata *dd;
 
-	pd = port_fp(fp);
-	if (!pd)
-		goto bail;
 	dd = pd->port_dd;
 
-	bit = pd->port_port + INFINIPATH_R_INTRAVAIL_SHIFT;
-	set_bit(bit, &dd->ipath_rcvctrl);
+	if (test_bit(IPATH_PORT_WAITING_OVERFLOW, &pd->int_flag)) {
+		pollflag |= POLLERR;
+		clear_bit(IPATH_PORT_WAITING_OVERFLOW, &pd->int_flag);
+	}
 
-	/*
-	 * Before blocking, make sure that head is still == tail,
-	 * reading from the chip, so we can be sure the interrupt
-	 * enable has made it to the chip.  If not equal, disable
-	 * interrupt again and return immediately.  This avoids races,
-	 * and the overhead of the chip read doesn't matter much at
-	 * this point, since we are waiting for something anyway.
-	 */
+	if (test_bit(IPATH_PORT_WAITING_URG, &pd->int_flag)) {
+		pollflag |= POLLIN | POLLRDNORM;
+		clear_bit(IPATH_PORT_WAITING_URG, &pd->int_flag);
+	}
 
-	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
-			 dd->ipath_rcvctrl);
+	if (!pollflag) {
+		set_bit(IPATH_PORT_WAITING_URG, &pd->port_flag);
+		if (pd->poll_type & IPATH_POLL_TYPE_OVERFLOW)
+			set_bit(IPATH_PORT_WAITING_OVERFLOW,
+				&pd->port_flag);
+
+		poll_wait(fp, &pd->port_wait, pt);
+	}
+
+	return pollflag;
+}
+
+static unsigned int ipath_poll_next(struct ipath_portdata *pd,
+				    struct file *fp,
+				    struct poll_table_struct *pt)
+{
+	u32 head, tail;
+	unsigned pollflag = 0;
+	struct ipath_devdata *dd;
+
+	dd = pd->port_dd;
 
 	head = ipath_read_ureg32(dd, ur_rcvhdrhead, pd->port_port);
-	tail = ipath_read_ureg32(dd, ur_rcvhdrtail, pd->port_port);
+	tail = *(volatile u64 *)pd->port_rcvhdrtail_kvaddr;
+
+	if (test_bit(IPATH_PORT_WAITING_OVERFLOW, &pd->int_flag)) {
+		pollflag |= POLLERR;
+		clear_bit(IPATH_PORT_WAITING_OVERFLOW, &pd->int_flag);
+	}
 
-	if (tail == head) {
+	if (tail != head ||
+	    test_bit(IPATH_PORT_WAITING_RCV, &pd->int_flag)) {
+		pollflag |= POLLIN | POLLRDNORM;
+		clear_bit(IPATH_PORT_WAITING_RCV, &pd->int_flag);
+	}
+
+	if (!pollflag) {
 		set_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag);
+		if (pd->poll_type & IPATH_POLL_TYPE_OVERFLOW)
+			set_bit(IPATH_PORT_WAITING_OVERFLOW,
+				&pd->port_flag);
+
+		set_bit(pd->port_port + INFINIPATH_R_INTRAVAIL_SHIFT,
+			&dd->ipath_rcvctrl);
+
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
+				 dd->ipath_rcvctrl);
+
 		if (dd->ipath_rhdrhead_intr_off) /* arm rcv interrupt */
-			(void)ipath_write_ureg(dd, ur_rcvhdrhead,
-					       dd->ipath_rhdrhead_intr_off
-					       | head, pd->port_port);
-		poll_wait(fp, &pd->port_wait, pt);
+			ipath_write_ureg(dd, ur_rcvhdrhead,
+					 dd->ipath_rhdrhead_intr_off | head,
+					 pd->port_port);
 
-		if (test_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag)) {
-			/* timed out, no packets received */
-			clear_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag);
-			pd->port_rcvwait_to++;
-		}
-		else
-			pollflag = POLLIN | POLLRDNORM;
-	}
-	else {
-		/* it's already happened; don't do wait_event overhead */
-		pollflag = POLLIN | POLLRDNORM;
-		pd->port_rcvnowait++;
+		poll_wait(fp, &pd->port_wait, pt);
 	}
 
-	clear_bit(bit, &dd->ipath_rcvctrl);
-	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
-			 dd->ipath_rcvctrl);
+	return pollflag;
+}
+
+static unsigned int ipath_poll(struct file *fp,
+			       struct poll_table_struct *pt)
+{
+	struct ipath_portdata *pd;
+	unsigned pollflag;
+
+	pd = port_fp(fp);
+	if (!pd)
+		pollflag = 0;
+	else if (pd->poll_type & IPATH_POLL_TYPE_URGENT)
+		pollflag = ipath_poll_urgent(pd, fp, pt);
+	else
+		pollflag = ipath_poll_next(pd, fp, pt);
 
-bail:
 	return pollflag;
 }
 
@@ -2173,6 +2206,11 @@ static ssize_t ipath_write(struct file *fp, const char __user *data,
 		src = NULL;
 		dest = NULL;
 		break;
+	case IPATH_CMD_POLL_TYPE:
+		copy = sizeof(cmd.cmd.poll_type);
+		dest = &cmd.cmd.poll_type;
+		src = &ucmd->cmd.poll_type;
+		break;
 	default:
 		ret = -EINVAL;
 		goto bail;
@@ -2245,6 +2283,9 @@ static ssize_t ipath_write(struct file *fp, const char __user *data,
 	case IPATH_CMD_PIOAVAILUPD:
 		ret = ipath_force_pio_avail_update(pd->port_dd);
 		break;
+	case IPATH_CMD_POLL_TYPE:
+		pd->poll_type = cmd.cmd.poll_type;
+		break;
 	}
 
 	if (ret >= 0)
diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c
index 948091f..f8aac8e 100644
--- a/drivers/infiniband/hw/ipath/ipath_intr.c
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c
@@ -680,6 +680,17 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs)
 					chkerrpkts = 1;
 				dd->ipath_lastrcvhdrqtails[i] = tl;
 				pd->port_hdrqfull++;
+				if (test_bit(IPATH_PORT_WAITING_OVERFLOW,
+					     &pd->port_flag)) {
+					clear_bit(
+					  IPATH_PORT_WAITING_OVERFLOW,
+					  &pd->port_flag);
+					set_bit(
+					  IPATH_PORT_WAITING_OVERFLOW,
+					  &pd->int_flag);
+					wake_up_interruptible(
+					  &pd->port_wait);
+				}
 			}
 		}
 	}
@@ -877,14 +888,25 @@ static void handle_urcv(struct ipath_devdata *dd, u32 istat)
 		   dd->ipath_i_rcvurg_mask);
 	for (i = 1; i < dd->ipath_cfgports; i++) {
 		struct ipath_portdata *pd = dd->ipath_pd[i];
-		if (portr & (1 << i) && pd && pd->port_cnt &&
-			test_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag)) {
-			clear_bit(IPATH_PORT_WAITING_RCV,
-				  &pd->port_flag);
-			clear_bit(i + INFINIPATH_R_INTRAVAIL_SHIFT,
-				  &dd->ipath_rcvctrl);
-			wake_up_interruptible(&pd->port_wait);
-			rcvdint = 1;
+		if (portr & (1 << i) && pd && pd->port_cnt) {
+			if (test_bit(IPATH_PORT_WAITING_RCV,
+				     &pd->port_flag)) {
+				clear_bit(IPATH_PORT_WAITING_RCV,
+					  &pd->port_flag);
+				set_bit(IPATH_PORT_WAITING_RCV,
+					&pd->int_flag);
+				clear_bit(i + INFINIPATH_R_INTRAVAIL_SHIFT,
+					  &dd->ipath_rcvctrl);
+				wake_up_interruptible(&pd->port_wait);
+				rcvdint = 1;
+			} else if (test_bit(IPATH_PORT_WAITING_URG,
+					    &pd->port_flag)) {
+				clear_bit(IPATH_PORT_WAITING_URG,
+					  &pd->port_flag);
+				set_bit(IPATH_PORT_WAITING_URG,
+					&pd->int_flag);
+				wake_up_interruptible(&pd->port_wait);
+			}
 		}
 	}
 	if (rcvdint) {
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index 2e85aec..034c283 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -127,6 +127,8 @@ struct ipath_portdata {
 	u32 port_tidcursor;
 	/* next expected TID to check */
 	unsigned long port_flag;
+	/* what happened */
+	unsigned long int_flag;
 	/* WAIT_RCV that timed out, no interrupt */
 	u32 port_rcvwait_to;
 	/* WAIT_PIO that timed out, no interrupt */
@@ -155,6 +157,8 @@ struct ipath_portdata {
 	u32 userversion;
 	/* Bitmask of active slaves */
 	u32 active_slaves;
+	/* Type of packets or conditions we want to poll for */
+	u16 poll_type;
 };
 
 struct sk_buff;
@@ -754,6 +758,10 @@ int ipath_set_rx_pol_inv(struct ipath_devdata *dd, u8 new_pol_inv);
 #define IPATH_PORT_WAITING_PIO   3
 		/* master has not finished initializing */
 #define IPATH_PORT_MASTER_UNINIT 4
+		/* waiting for an urgent packet to arrive */
+#define IPATH_PORT_WAITING_URG 5
+		/* waiting for a header overflow */
+#define IPATH_PORT_WAITING_OVERFLOW 6
 
 /* free up any allocated data at closes */
 void ipath_free_data(struct ipath_portdata *dd);


From arthur.jones at qlogic.com  Tue Jun 19 16:42:58 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:42:58 -0700
Subject: [ofa-general] [PATCH 25/28] IB/ipath - clean send flags properly on
	QP reset.
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234257.3794.52010.stgit@bauxite.internal.keyresearch.com>

From: Robert Walsh <robert.walsh at qlogic.com>

Signed-off-by: Robert Walsh <robert.walsh at qlogic.com>
Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_qp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c b/drivers/infiniband/hw/ipath/ipath_qp.c
index 9e07abb..bfd39c9 100644
--- a/drivers/infiniband/hw/ipath/ipath_qp.c
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c
@@ -336,7 +336,7 @@ static void ipath_reset_qp(struct ipath_qp *qp)
 	qp->qkey = 0;
 	qp->qp_access_flags = 0;
 	qp->s_busy = 0;
-	qp->s_flags &= ~IPATH_S_SIGNAL_REQ_WR;
+	qp->s_flags &= IPATH_S_SIGNAL_REQ_WR;
 	qp->s_hdrwords = 0;
 	qp->s_psn = 0;
 	qp->r_psn = 0;


From arthur.jones at qlogic.com  Tue Jun 19 16:43:04 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:43:04 -0700
Subject: [ofa-general] [PATCH 26/28] IB/ipath - print warning if LID not
	acquired and link ACTIVE within one minute
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234303.3794.75856.stgit@bauxite.internal.keyresearch.com>

From: Robert Walsh <robert.walsh at qlogic.com>

Signed-off-by: Robert Walsh <robert.walsh at qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_driver.c |   45 ++++++++++++++++++++++++++++
 drivers/infiniband/hw/ipath/ipath_kernel.h |    3 ++
 2 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index 8b61179..1d2369b 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -104,6 +104,13 @@ static int __devinit ipath_init_one(struct pci_dev *,
 #define PCI_DEVICE_ID_INFINIPATH_HT 0xd
 #define PCI_DEVICE_ID_INFINIPATH_PE800 0x10
 
+/*
+ * Number of seconds before we complain about not getting a LID
+ * assignment.
+ */
+
+#define LID_TIMEOUT 60
+
 static const struct pci_device_id ipath_pci_tbl[] = {
 	{ PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, PCI_DEVICE_ID_INFINIPATH_HT) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, PCI_DEVICE_ID_INFINIPATH_PE800) },
@@ -119,6 +126,32 @@ static struct pci_driver ipath_driver = {
 	.id_table = ipath_pci_tbl,
 };
 
+static void check_link_status(struct work_struct *work)
+{
+	struct ipath_devdata *dd = container_of(work, struct ipath_devdata,
+						link_work);
+
+	/*
+	 * If we're in the NOCABLE state, try again in another minute.
+	 */
+
+	if (*dd->ipath_statusp & IPATH_STATUS_IB_NOCABLE) {
+		schedule_delayed_work(&dd->link_work, HZ * LID_TIMEOUT);
+		return;
+	}
+
+	/*
+	 * If we don't have a LID, let the user know and don't bother
+	 * checking again.
+	 */
+
+	if (dd->ipath_lid == 0)
+		dev_info(&dd->pcidev->dev,
+			 "We don't have a LID yet (no subnet manager?)\n");
+	else if (!(*dd->ipath_statusp & IPATH_STATUS_IB_READY))
+		dev_info(&dd->pcidev->dev,
+			 "LID assigned, but IB link is not ACTIVE\n");
+}
 
 static inline void read_bars(struct ipath_devdata *dd, struct pci_dev *dev,
 			     u32 *bar0, u32 *bar1)
@@ -187,6 +220,8 @@ static struct ipath_devdata *ipath_alloc_devdata(struct pci_dev *pdev)
 	dd->pcidev = pdev;
 	pci_set_drvdata(pdev, dd);
 
+	INIT_DELAYED_WORK(&dd->link_work, check_link_status);
+
 	list_add(&dd->ipath_list, &ipath_dev_list);
 
 bail_unlock:
@@ -511,6 +546,9 @@ static int __devinit ipath_init_one(struct pci_dev *pdev,
 	ipath_diag_add(dd);
 	ipath_register_ib_device(dd);
 
+	/* Check that we have a LID in LID_TIMEOUT seconds. */
+	schedule_delayed_work(&dd->link_work, HZ * LID_TIMEOUT);
+
 	goto bail;
 
 bail_irqsetup:
@@ -638,6 +676,9 @@ static void __devexit ipath_remove_one(struct pci_dev *pdev)
 	 */
 	ipath_shutdown_device(dd);
 
+	cancel_delayed_work(&dd->link_work);
+	flush_scheduled_work();
+
 	if (dd->verbs_dev)
 		ipath_unregister_ib_device(dd->verbs_dev);
 
@@ -1840,6 +1881,10 @@ int ipath_set_lid(struct ipath_devdata *dd, u32 arg, u8 lmc)
 	dd->ipath_lid = arg;
 	dd->ipath_lmc = lmc;
 
+	ipath_layer_lid_changed(dd);
+
+	dev_info(&dd->pcidev->dev, "We got a lid: 0x%x\n", arg);
+
 	return 0;
 }
 
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index 034c283..f261af1 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -574,6 +574,9 @@ struct ipath_devdata {
 	u32 ipath_overrun_thresh_errs;
 	u32 ipath_lli_errs;
 
+	/* Link status check work */
+	struct delayed_work link_work;
+
 	/*
 	 * Not all devices managed by a driver instance are the same
 	 * type, so these fields must be per-device.


From arthur.jones at qlogic.com  Tue Jun 19 16:43:10 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:43:10 -0700
Subject: [ofa-general] [PATCH 27/28] IB/ipath - when we check for LID
	availability, check for lack of interrupts too.
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234309.3794.784.stgit@bauxite.internal.keyresearch.com>

All too often, interrupts do not get enabled for our card due
to bios misconfiguration and other issues.  This patch checks
for that condition on startup when checking for LID availability
and warns the user.

Signed-off-by: Arthur Jones <arthur.jones at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_driver.c |    8 +++++---
 drivers/infiniband/hw/ipath/ipath_intr.c   |    3 +++
 drivers/infiniband/hw/ipath/ipath_kernel.h |    3 +++
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index 1d2369b..825ed4d 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -141,11 +141,13 @@ static void check_link_status(struct work_struct *work)
 	}
 
 	/*
-	 * If we don't have a LID, let the user know and don't bother
-	 * checking again.
+	 * If we don't have a LID or interrupts, let the user know and
+	 * don't bother checking again.
 	 */
 
-	if (dd->ipath_lid == 0)
+	if (dd->ipath_int_counter == 0)
+		dev_err(&dd->pcidev->dev, "No interrupts detected.\n");
+	else if (dd->ipath_lid == 0)
 		dev_info(&dd->pcidev->dev,
 			 "We don't have a LID yet (no subnet manager?)\n");
 	else if (!(*dd->ipath_statusp & IPATH_STATUS_IB_READY))
diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c
index f8aac8e..ced591d 100644
--- a/drivers/infiniband/hw/ipath/ipath_intr.c
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c
@@ -932,6 +932,9 @@ irqreturn_t ipath_intr(int irq, void *data)
 
 	ipath_stats.sps_ints++;
 
+	if (dd->ipath_int_counter != (u32) -1)
+		dd->ipath_int_counter++;
+
 	if (!(dd->ipath_flags & IPATH_PRESENT)) {
 		/*
 		 * This return value is not great, but we do not want the
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index f261af1..381c97e 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -297,6 +297,9 @@ struct ipath_devdata {
 	u32 ipath_lastport_piobuf;
 	/* is a stats timer active */
 	u32 ipath_stats_timer_active;
+	atomic_t ipath_rewrite_timer_active;
+	/* number of interrupts for this device -- saturates... */
+	u32 ipath_int_counter;
 	/* dwords sent read from counter */
 	u32 ipath_lastsword;
 	/* dwords received read from counter */


From arthur.jones at qlogic.com  Tue Jun 19 16:43:16 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 19 Jun 2007 16:43:16 -0700
Subject: [ofa-general] [PATCH 28/28] IB/ipath - update copyright dates
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <20070619234315.3794.72264.stgit@bauxite.internal.keyresearch.com>

From: John Gregor <john.gregor at qlogic.com>

Now that it's june, it's about time to update
the copyright notices of files that have changed.

Signed-off-by: John Gregor <john.gregor at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_cq.c          |    2 +-
 drivers/infiniband/hw/ipath/ipath_debug.h       |    2 +-
 drivers/infiniband/hw/ipath/ipath_diag.c        |    2 +-
 drivers/infiniband/hw/ipath/ipath_driver.c      |    2 +-
 drivers/infiniband/hw/ipath/ipath_eeprom.c      |    2 +-
 drivers/infiniband/hw/ipath/ipath_fs.c          |    2 +-
 drivers/infiniband/hw/ipath/ipath_iba6110.c     |    2 +-
 drivers/infiniband/hw/ipath/ipath_iba6120.c     |    2 +-
 drivers/infiniband/hw/ipath/ipath_init_chip.c   |    2 +-
 drivers/infiniband/hw/ipath/ipath_intr.c        |    2 +-
 drivers/infiniband/hw/ipath/ipath_kernel.h      |    2 +-
 drivers/infiniband/hw/ipath/ipath_keys.c        |    2 +-
 drivers/infiniband/hw/ipath/ipath_layer.c       |    2 +-
 drivers/infiniband/hw/ipath/ipath_layer.h       |    2 +-
 drivers/infiniband/hw/ipath/ipath_mad.c         |    2 +-
 drivers/infiniband/hw/ipath/ipath_mmap.c        |    2 +-
 drivers/infiniband/hw/ipath/ipath_mr.c          |    2 +-
 drivers/infiniband/hw/ipath/ipath_qp.c          |    2 +-
 drivers/infiniband/hw/ipath/ipath_rc.c          |    2 +-
 drivers/infiniband/hw/ipath/ipath_registers.h   |    2 +-
 drivers/infiniband/hw/ipath/ipath_ruc.c         |    2 +-
 drivers/infiniband/hw/ipath/ipath_srq.c         |    2 +-
 drivers/infiniband/hw/ipath/ipath_stats.c       |    2 +-
 drivers/infiniband/hw/ipath/ipath_sysfs.c       |    2 +-
 drivers/infiniband/hw/ipath/ipath_uc.c          |    2 +-
 drivers/infiniband/hw/ipath/ipath_ud.c          |    2 +-
 drivers/infiniband/hw/ipath/ipath_user_pages.c  |    2 +-
 drivers/infiniband/hw/ipath/ipath_verbs.c       |    2 +-
 drivers/infiniband/hw/ipath/ipath_verbs.h       |    2 +-
 drivers/infiniband/hw/ipath/ipath_verbs_mcast.c |    2 +-
 drivers/infiniband/hw/ipath/ipath_wc_ppc64.c    |    2 +-
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c   |    2 +-
 32 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c b/drivers/infiniband/hw/ipath/ipath_cq.c
index 8a2a774..8b4673b 100644
--- a/drivers/infiniband/hw/ipath/ipath_cq.c
+++ b/drivers/infiniband/hw/ipath/ipath_cq.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_debug.h b/drivers/infiniband/hw/ipath/ipath_debug.h
index 42bfbdb..19c56e6 100644
--- a/drivers/infiniband/hw/ipath/ipath_debug.h
+++ b/drivers/infiniband/hw/ipath/ipath_debug.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_diag.c b/drivers/infiniband/hw/ipath/ipath_diag.c
index aab21c1..a698f19 100644
--- a/drivers/infiniband/hw/ipath/ipath_diag.c
+++ b/drivers/infiniband/hw/ipath/ipath_diag.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index 825ed4d..7893a5a 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_eeprom.c b/drivers/infiniband/hw/ipath/ipath_eeprom.c
index 9be1b9a..6b91479 100644
--- a/drivers/infiniband/hw/ipath/ipath_eeprom.c
+++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_fs.c b/drivers/infiniband/hw/ipath/ipath_fs.c
index 40cf1bc..2e689b9 100644
--- a/drivers/infiniband/hw/ipath/ipath_fs.c
+++ b/drivers/infiniband/hw/ipath/ipath_fs.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c
index 34d159a..87b18e9 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c
index 0c34555..e67e4a8 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c
index 5193d69..1b1af34 100644
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c
index ced591d..ca3e877 100644
--- a/drivers/infiniband/hw/ipath/ipath_intr.c
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index 381c97e..d374c87 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -1,7 +1,7 @@
 #ifndef _IPATH_KERNEL_H
 #define _IPATH_KERNEL_H
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_keys.c b/drivers/infiniband/hw/ipath/ipath_keys.c
index dd487c1..85a4aef 100644
--- a/drivers/infiniband/hw/ipath/ipath_keys.c
+++ b/drivers/infiniband/hw/ipath/ipath_keys.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_layer.c b/drivers/infiniband/hw/ipath/ipath_layer.c
index 05a1d2b..82616b7 100644
--- a/drivers/infiniband/hw/ipath/ipath_layer.c
+++ b/drivers/infiniband/hw/ipath/ipath_layer.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_layer.h b/drivers/infiniband/hw/ipath/ipath_layer.h
index 3854a4e..415709c 100644
--- a/drivers/infiniband/hw/ipath/ipath_layer.h
+++ b/drivers/infiniband/hw/ipath/ipath_layer.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c b/drivers/infiniband/hw/ipath/ipath_mad.c
index 2e9e161..2aaa029 100644
--- a/drivers/infiniband/hw/ipath/ipath_mad.c
+++ b/drivers/infiniband/hw/ipath/ipath_mad.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_mmap.c b/drivers/infiniband/hw/ipath/ipath_mmap.c
index 937bc33..fa830e2 100644
--- a/drivers/infiniband/hw/ipath/ipath_mmap.c
+++ b/drivers/infiniband/hw/ipath/ipath_mmap.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/ipath/ipath_mr.c b/drivers/infiniband/hw/ipath/ipath_mr.c
index bdeef8d..e442470 100644
--- a/drivers/infiniband/hw/ipath/ipath_mr.c
+++ b/drivers/infiniband/hw/ipath/ipath_mr.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c b/drivers/infiniband/hw/ipath/ipath_qp.c
index bfd39c9..d317b81 100644
--- a/drivers/infiniband/hw/ipath/ipath_qp.c
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c
index 6423d9e..46744ea 100644
--- a/drivers/infiniband/hw/ipath/ipath_rc.c
+++ b/drivers/infiniband/hw/ipath/ipath_rc.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_registers.h b/drivers/infiniband/hw/ipath/ipath_registers.h
index c182bcd..708eba3 100644
--- a/drivers/infiniband/hw/ipath/ipath_registers.h
+++ b/drivers/infiniband/hw/ipath/ipath_registers.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c
index c44e015..38d1d9b 100644
--- a/drivers/infiniband/hw/ipath/ipath_ruc.c
+++ b/drivers/infiniband/hw/ipath/ipath_ruc.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_srq.c b/drivers/infiniband/hw/ipath/ipath_srq.c
index 4b4214e..83d2569 100644
--- a/drivers/infiniband/hw/ipath/ipath_srq.c
+++ b/drivers/infiniband/hw/ipath/ipath_srq.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_stats.c b/drivers/infiniband/hw/ipath/ipath_stats.c
index 2955f36..73ed17d 100644
--- a/drivers/infiniband/hw/ipath/ipath_stats.c
+++ b/drivers/infiniband/hw/ipath/ipath_stats.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_sysfs.c b/drivers/infiniband/hw/ipath/ipath_sysfs.c
index ab34d3e..16238cd 100644
--- a/drivers/infiniband/hw/ipath/ipath_sysfs.c
+++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_uc.c b/drivers/infiniband/hw/ipath/ipath_uc.c
index 243d7c6..8380fbc 100644
--- a/drivers/infiniband/hw/ipath/ipath_uc.c
+++ b/drivers/infiniband/hw/ipath/ipath_uc.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_ud.c b/drivers/infiniband/hw/ipath/ipath_ud.c
index 26171e5..c22920b 100644
--- a/drivers/infiniband/hw/ipath/ipath_ud.c
+++ b/drivers/infiniband/hw/ipath/ipath_ud.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_user_pages.c b/drivers/infiniband/hw/ipath/ipath_user_pages.c
index 8536aeb..27034d3 100644
--- a/drivers/infiniband/hw/ipath/ipath_user_pages.c
+++ b/drivers/infiniband/hw/ipath/ipath_user_pages.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 6753f7d..66b8287 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h
index 458f822..f3d1f2c 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c b/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c
index dd691cf..9e5abf9 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c b/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c
index 0095bb7..1d7bd82 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 9f409fd..3428acb 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two


From Frank.Leers at Sun.COM  Tue Jun 19 17:00:14 2007
From: Frank.Leers at Sun.COM (Frank Leers)
Date: Tue, 19 Jun 2007 17:00:14 -0700
Subject: [ofa-general] don't want to rebuild all rpm's from install.sh
Message-ID: <1182297614.1774.30.camel@localhost>

If I understand the Installation Guide doc correctly I should be able to
just install rpm's using the install.sh script without rebuilding the
rpm's.  I have built the rpm's successfully and installed them on a node
in my cluster via an NFS mount.  I'd now like to install the rest of my
nodes using './install.sh -c <> -net <>' but this results in a rebuild
of the rpm's all over again.  

I'm obviously missing something here, although another section of the
doc mentions building once and then installing the resultant rpm's on
all other nodes via standard tools in parallel - 'pdsh ...rpm -ivd ...'

Can I rerun install.sh to simply install rpm's and configure ipoib etc.
on the rest of my nodes somehow without rebuilding?

thanks,

-frank


From sweitzen at cisco.com  Tue Jun 19 20:05:34 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Tue, 19 Jun 2007 20:05:34 -0700
Subject: [ofa-general] don't want to rebuild all rpm's from install.sh
In-Reply-To: <1182297614.1774.30.camel@localhost>
References: <1182297614.1774.30.camel@localhost>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA303B78C6B@xmb-sjc-216.amer.cisco.com>

Once you build your rpms on one node, you can just install them with
"rpm" on the other nodes instead of "install.sh".

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

> -----Original Message-----
> From: general-bounces at lists.openfabrics.org 
> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of 
> Frank Leers
> Sent: Tuesday, June 19, 2007 5:00 PM
> To: general at lists.openfabrics.org
> Subject: [ofa-general] don't want to rebuild all rpm's from install.sh
> 
> If I understand the Installation Guide doc correctly I should 
> be able to
> just install rpm's using the install.sh script without rebuilding the
> rpm's.  I have built the rpm's successfully and installed 
> them on a node
> in my cluster via an NFS mount.  I'd now like to install the 
> rest of my
> nodes using './install.sh -c <> -net <>' but this results in a rebuild
> of the rpm's all over again.  
> 
> I'm obviously missing something here, although another section of the
> doc mentions building once and then installing the resultant rpm's on
> all other nodes via standard tools in parallel - 'pdsh ...rpm 
> -ivd ...'
> 
> Can I rerun install.sh to simply install rpm's and configure 
> ipoib etc.
> on the rest of my nodes somehow without rebuilding?
> 
> thanks,
> 
> -frank
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 


From vlad at dev.mellanox.co.il  Tue Jun 19 23:34:38 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Wed, 20 Jun 2007 09:34:38 +0300
Subject: [ofa-general] don't want to rebuild all rpm's from install.sh
In-Reply-To: <1182297614.1774.30.camel@localhost>
References: <1182297614.1774.30.camel@localhost>
Message-ID: <4678CA7E.9090200@dev.mellanox.co.il>

Frank Leers wrote:
> If I understand the Installation Guide doc correctly I should be able to
> just install rpm's using the install.sh script without rebuilding the
> rpm's.  I have built the rpm's successfully and installed them on a node
> in my cluster via an NFS mount.  I'd now like to install the rest of my
> nodes using './install.sh -c <> -net <>' but this results in a rebuild
> of the rpm's all over again.  
> 
Yes,
It should work this way if all of the nodes have the same Arch/OS/kernel.
Can you send me the ofed.conf file (that you use after '-c' parameter), 
the output of the './install.sh -c <> -net <>' command and 
Arch/OS/kernel of your nodes.

Thanks,
Vladimir


From yangdong at ncic.ac.cn  Wed Jun 20 00:14:45 2007
From: yangdong at ncic.ac.cn (ncic)
Date: Wed, 20 Jun 2007 15:14:45 +0800
Subject: [ofa-general] why netwoked file system(e.g. nfs, pvfs,
 etc.) supported IB by using access layer (linux kernel ib ops)
Message-ID: <4678D3E5.706@ncic.ac.cn>

why didn't they support ib with sdp?


From kliteyn at dev.mellanox.co.il  Wed Jun 20 00:42:59 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Wed, 20 Jun 2007 10:42:59 +0300
Subject: [ofa-general] [PATCH] osm: cosmetics in ftree - added get_guid
 functions for switch and hca
Message-ID: <4678DA83.2050700@dev.mellanox.co.il>

Hi Hal,

Cosmetic code changes in fat-tree:
added get_guid_ho and get_guid_no functions for switches and hca's

-- Yevgeny

Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
  opensm/opensm/osm_ucast_ftree.c |   77 +++++++++++++++++++++++++++++----------
  1 files changed, 58 insertions(+), 19 deletions(-)

diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c
index 1ead199..1ae8b29 100644
--- a/opensm/opensm/osm_ucast_ftree.c
+++ b/opensm/opensm/osm_ucast_ftree.c
@@ -640,6 +640,26 @@ __osm_ftree_sw_destroy(

  /***************************************************/

+static uint64_t
+__osm_ftree_sw_get_guid_no(
+   IN  ftree_sw_t * p_sw)
+{
+   if (!p_sw)
+      return 0;
+   return osm_node_get_node_guid(p_sw->p_osm_sw->p_node);
+}
+
+/***************************************************/
+
+static uint64_t
+__osm_ftree_sw_get_guid_ho(
+   IN  ftree_sw_t * p_sw)
+{
+   return cl_ntoh64(__osm_ftree_sw_get_guid_no(p_sw));
+}
+
+/***************************************************/
+
  static void
  __osm_ftree_sw_dump(
     IN  ftree_fabric_t * p_ftree,
@@ -657,7 +677,7 @@ __osm_ftree_sw_dump(
             "__osm_ftree_sw_dump: "
             "Switch index: %s, GUID: 0x%016" PRIx64 ", Ports: %u DOWN, %u UP\n",
            __osm_ftree_tuple_to_str(p_sw->tuple),
-          cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
+          __osm_ftree_sw_get_guid_ho(p_sw),
            p_sw->down_port_groups_num,
            p_sw->up_port_groups_num);

@@ -835,6 +855,26 @@ __osm_ftree_hca_destroy(

  /***************************************************/

+static uint64_t
+__osm_ftree_hca_get_guid_no(
+   IN  ftree_hca_t * p_hca)
+{
+   if (!p_hca)
+      return 0;
+   return osm_node_get_node_guid(p_hca->p_osm_node);
+}
+
+/***************************************************/
+
+static uint64_t
+__osm_ftree_hca_get_guid_ho(
+   IN  ftree_hca_t * p_hca)
+{
+   return cl_ntoh64(__osm_ftree_hca_get_guid_no(p_hca));
+}
+
+/***************************************************/
+
  static void
  __osm_ftree_hca_dump(
     IN  ftree_fabric_t * p_ftree,
@@ -851,7 +891,7 @@ __osm_ftree_hca_dump(
     osm_log(&p_ftree->p_osm->log, OSM_LOG_DEBUG,
             "__osm_ftree_hca_dump: "
             "CA GUID: 0x%016" PRIx64 ", Ports: %u UP\n",
-          cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)),
+          __osm_ftree_hca_get_guid_ho(p_hca),
            p_hca->up_port_groups_num);

     for( i = 0; i < p_hca->up_port_groups_num; i++ )
@@ -1214,7 +1254,7 @@ __osm_ftree_fabric_dump_general_info(
                 osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE,
                         "__osm_ftree_fabric_dump_general_info: "
                         "      GUID: 0x%016" PRIx64 ", LID: 0x%x, Index %s\n",
-                       cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
+                       __osm_ftree_sw_get_guid_ho(p_sw),
                         cl_ntoh16(p_sw->base_lid),
                         __osm_ftree_tuple_to_str(p_sw->tuple));
        }
@@ -1227,8 +1267,7 @@ __osm_ftree_fabric_dump_general_info(
              osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE,
                      "__osm_ftree_fabric_dump_general_info: "
                      "      GUID: 0x%016" PRIx64 ", LID: 0x%x, Index %s\n",
-                    cl_ntoh64(osm_node_get_node_guid(
-                              p_ftree->leaf_switches[i]->p_osm_sw->p_node)),
+                    __osm_ftree_sw_get_guid_ho(p_ftree->leaf_switches[i]),
                      cl_ntoh16(p_ftree->leaf_switches[i]->base_lid),
                      __osm_ftree_tuple_to_str(p_ftree->leaf_switches[i]->tuple));
        }
@@ -1442,7 +1481,7 @@ __osm_ftree_fabric_make_indexing(
             p_sw->rank,
             __osm_ftree_tuple_to_str(p_sw->tuple),
             cl_ntoh16(p_sw->base_lid),
-           cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)));
+           __osm_ftree_sw_get_guid_ho(p_sw));

     /*
      * Now run BFS and assign indexes to all switches
@@ -1617,11 +1656,11 @@ __osm_ftree_fabric_validate_topology(
                      "ERR AB09: Different number of upward port groups on switches:\n"
                      "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u groups\n"
                      "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u groups\n",
-                    cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)),
+                    __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]),
                      cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid),
                      __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple),
                      reference_sw_arr[p_sw->rank]->up_port_groups_num,
-                    cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
+                    __osm_ftree_sw_get_guid_ho(p_sw),
                      cl_ntoh16(p_sw->base_lid),
                      __osm_ftree_tuple_to_str(p_sw->tuple),
                      p_sw->up_port_groups_num);
@@ -1638,11 +1677,11 @@ __osm_ftree_fabric_validate_topology(
                      "ERR AB0A: Different number of downward port groups on switches:\n"
                      "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u port groups\n"
                      "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u port groups\n",
-                    cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)),
+                    __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]),
                      cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid),
                      __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple),
                      reference_sw_arr[p_sw->rank]->down_port_groups_num,
-                    cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
+                    __osm_ftree_sw_get_guid_ho(p_sw),
                      cl_ntoh16(p_sw->base_lid),
                      __osm_ftree_tuple_to_str(p_sw->tuple),
                      p_sw->down_port_groups_num);
@@ -1663,11 +1702,11 @@ __osm_ftree_fabric_validate_topology(
                             "ERR AB0B: Different number of ports in an upward port group on switches:\n"
                             "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n"
                             "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n",
-                           cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)),
+                           __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]),
                             cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid),
                             __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple),
                             cl_ptr_vector_get_size(&p_ref_group->ports),
-                           cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
+                           __osm_ftree_sw_get_guid_ho(p_sw),
                             cl_ntoh16(p_sw->base_lid),
                             __osm_ftree_tuple_to_str(p_sw->tuple),
                             cl_ptr_vector_get_size(&p_group->ports));
@@ -1691,11 +1730,11 @@ __osm_ftree_fabric_validate_topology(
                             "ERR AB0C: Different number of ports in an downward port group on switches:\n"
                             "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n"
                             "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n",
-                           cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)),
+                           __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]),
                             cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid),
                             __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple),
                             cl_ptr_vector_get_size(&p_ref_group->ports),
-                           cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
+                           __osm_ftree_sw_get_guid_ho(p_sw),
                             cl_ntoh16(p_sw->base_lid),
                             __osm_ftree_tuple_to_str(p_sw->tuple),
                             cl_ptr_vector_get_size(&p_group->ports));
@@ -2508,7 +2547,7 @@ __osm_ftree_rank_leaf_switches(
                      "__osm_ftree_rank_leaf_switches: ERR AB0F: "
                      "CA conected directly to another CA: "
                      "0x%016" PRIx64 " <---> 0x%016" PRIx64 "\n",
-                    cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)),
+                    __osm_ftree_hca_get_guid_ho(p_hca),
                      cl_ntoh64(osm_node_get_node_guid(p_remote_osm_node)));
              res = -1;
              goto Exit;
@@ -2548,8 +2587,8 @@ __osm_ftree_rank_leaf_switches(
                "                                            - CA guid    : 0x%016" PRIx64 "\n"
                "                                            - Switch guid: 0x%016" PRIx64 "\n"
                "                                            - Switch LID : 0x%x\n",
-              cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)),
-              cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
+              __osm_ftree_hca_get_guid_ho(p_hca),
+              __osm_ftree_sw_get_guid_ho(p_sw),
                cl_ntoh16(p_sw->base_lid));
        cl_list_insert_tail(p_ranking_bfs_list,
                            &__osm_ftree_sw_tbl_element_create(p_sw)->map_item);
@@ -2740,10 +2779,10 @@ __osm_ftree_fabric_construct_sw_ports(
                         "       GUID 0x%016" PRIx64 ", LID 0x%x, rank %u\n",
                         p_sw->rank,
                         p_remote_sw->rank,
-                       cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
+                       __osm_ftree_sw_get_guid_ho(p_sw),
                         cl_ntoh16(p_sw->base_lid),
                         p_sw->rank,
-                       cl_ntoh64(osm_node_get_node_guid(p_remote_sw->p_osm_sw->p_node)),
+                       __osm_ftree_sw_get_guid_ho(p_remote_sw),
                         cl_ntoh16(p_remote_sw->base_lid),
                         p_remote_sw->rank);
                 res = -1;
-- 
1.5.1.4


From erezz at voltaire.com  Wed Jun 20 02:19:02 2007
From: erezz at voltaire.com (Erez Zilber)
Date: Wed, 20 Jun 2007 12:19:02 +0300
Subject: [ofa-general] [PATCH 1/2] IB/iser: add open-iscsi over iSER
	support for RHAS4 in OFED scripts
In-Reply-To: <4641D32D.6030505@voltaire.com>
References: <4641D295.5060907@voltaire.com> <4641D32D.6030505@voltaire.com>
Message-ID: <4678F106.9090508@voltaire.com>

Erez Zilber wrote:

> Add support for open-iscsi over iSER in RHAS4 in OFED's scripts.
>
> Signed-off-by: Erez Zilber <erezz at voltaire.com>
> ---
>  build.sh     |    2 +-
>  build_env.sh |    4 ++--
>  install.sh   |    2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/build.sh b/build.sh
> index d54c55d..be2d1e6 100755
> --- a/build.sh
> +++ b/build.sh
> @@ -344,7 +344,7 @@ open-iscsi()
>              SuSE)
>  	    ex "$MV -f ${RPM_DIR}/RPMS/$build_arch/${OPEN_ISCSI_SUSE_NAME}-${OPEN_ISCSI_VERSION}.${build_arch}.rpm $RPMS"
>  	    ;;
> -            redhat5)
> +            redhat|redhat5)
>  	    ex "$MV -f ${RPM_DIR}/RPMS/$build_arch/${OPEN_ISCSI_REDHAT_NAME}-${OPEN_ISCSI_VERSION}.${build_arch}.rpm $RPMS"
>              ;;
>  	    *)
> diff --git a/build_env.sh b/build_env.sh
> index 6e65b21..49821b4 100644
> --- a/build_env.sh
> +++ b/build_env.sh
> @@ -135,7 +135,7 @@ IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES
>  # Iser
>  # Currently iSER is supported only on SLES10 & RHEL5
>  case ${K_VER} in
> -        2.6.16.*-*-*|2.6.*.el5)
> +        2.6.16.*-*-*|2.6.*.el5|2.6.9-*.EL*)
>          IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES} ib_iser"
>          ;;
>  esac
> @@ -1998,7 +1998,7 @@ set_package_deps()
>                      ib_iser)
>  			# Currently iSER is supported only on SLES10 & RHEL5
>                          case ${K_VER} in
> -                        2.6.16.*-*-*|2.6.*.el5)
> +                        2.6.16.*-*-*|2.6.*.el5|2.6.9-*.EL*)
>                              OFA_KERNEL_PACKAGES=$(echo "$OFA_KERNEL_PACKAGES ib_verbs ${ll_driver} ib_iser" | tr -s ' ' '\n' | sort -n | uniq)
>                              OFA_PACKAGES=$(echo "$OFA_PACKAGES kernel-ib" | tr -s ' ' '\n' | sort -n | uniq)
>                              EXTRA_PACKAGES=$(echo "$EXTRA_PACKAGES open-iscsi" | tr -s ' ' '\n' | sort -rn | uniq)
> diff --git a/install.sh b/install.sh
> index f9ed6da..dadc144 100755
> --- a/install.sh
> +++ b/install.sh
> @@ -990,7 +990,7 @@ #    fi    
>                  err_echo "${OPEN_ISCSI_SUSE_NAME}-${OPEN_ISCSI_VERSION}.${build_arch}.rpm not found under ${RPMS}."
>              fi
>              ;;
> -            redhat5)
> +            redhat|redhat5)
>  	    if [ -f ${RPMS}/${OPEN_ISCSI_REDHAT_NAME}-${OPEN_ISCSI_VERSION}.${build_arch}.rpm ]; then
>                  ex "$RPM -Uhv --oldpackage ${RPMS}/${OPEN_ISCSI_REDHAT_NAME}-${OPEN_ISCSI_VERSION}.${build_arch}.rpm"
>              else
>   

Vlad,

It seems that commit 553e284ffb2f380dc8d1451bfb3ad40165f04112 in
ofed_1_2_scripts.git is different from the patch that I submitted. For
example:

My patch:

@@ -135,7 +135,7 @@ IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES
 # Iser
 # Currently iSER is supported only on SLES10 & RHEL5
 case ${K_VER} in
-        2.6.16.*-*-*|2.6.*.el5)
+        2.6.16.*-*-*|2.6.*.el5|2.6.9-*.EL*)
         IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES} ib_iser"
         ;;
 esac


patch applied in ofed_1_2_scripts.git:
@@ -135,7 +135,7 @@ IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES
 # Iser
 # Currently iSER is supported only on SLES10 & RHEL5
 case ${K_VER} in
-        2.6.16.*-*-*|2.6.*.el5)
+        2.6.16.*-*-*|2.6.*.el5|2.6.9-[3-5]*.EL*) <-- this line is different
         IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES} ib_iser"
         ;;
 esac

Why is that?

Erez


From vlad at dev.mellanox.co.il  Wed Jun 20 02:30:02 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Wed, 20 Jun 2007 12:30:02 +0300
Subject: [ofa-general] [PATCH 1/2] IB/iser: add open-iscsi over
	iSER	support for RHAS4 in OFED scripts
In-Reply-To: <4678F106.9090508@voltaire.com>
References: <4641D295.5060907@voltaire.com> <4641D32D.6030505@voltaire.com>
	<4678F106.9090508@voltaire.com>
Message-ID: <4678F39A.1030305@dev.mellanox.co.il>

> Vlad,
> 
> It seems that commit 553e284ffb2f380dc8d1451bfb3ad40165f04112 in
> ofed_1_2_scripts.git is different from the patch that I submitted. For
> example:
> 
> My patch:
> 
> @@ -135,7 +135,7 @@ IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES
>  # Iser
>  # Currently iSER is supported only on SLES10 & RHEL5
>  case ${K_VER} in
> -        2.6.16.*-*-*|2.6.*.el5)
> +        2.6.16.*-*-*|2.6.*.el5|2.6.9-*.EL*)
>          IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES} ib_iser"
>          ;;
>  esac
> 
> 
> patch applied in ofed_1_2_scripts.git:
> @@ -135,7 +135,7 @@ IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES
>  # Iser
>  # Currently iSER is supported only on SLES10 & RHEL5
>  case ${K_VER} in
> -        2.6.16.*-*-*|2.6.*.el5)
> +        2.6.16.*-*-*|2.6.*.el5|2.6.9-[3-5]*.EL*) <-- this line is different
>          IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES} ib_iser"
>          ;;
>  esac
> 
> Why is that?
> 
> Erez

You have added backport patches for RHEL4.0 U3, U4, U5.
2.6.9-*.EL* matches also U2. So, installation fails on RHEL 4.0 U2 with 
your patch.

Vladimir


From vlad at lists.openfabrics.org  Wed Jun 20 02:45:00 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Wed, 20 Jun 2007 02:45:00 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070620-0200 daily build status
Message-ID: <20070620094501.3CC52E6087B@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.16
Passed on x86_64 with linux-2.6.20
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.17
Passed on ppc64 with linux-2.6.12
Passed on x86_64 with linux-2.6.16
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.12
Passed on powerpc with linux-2.6.19
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.18
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.14
Passed on ppc64 with linux-2.6.14
Passed on x86_64 with linux-2.6.13
Passed on ppc64 with linux-2.6.18
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.12
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.21.1
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.15
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.13
Passed on powerpc with linux-2.6.16
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.18-8.el5
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From erezz at voltaire.com  Wed Jun 20 04:33:59 2007
From: erezz at voltaire.com (Erez Zilber)
Date: Wed, 20 Jun 2007 14:33:59 +0300
Subject: [ofa-general] [PATCH 1/2] IB/iser: add open-iscsi over
	iSER	support for RHAS4 in OFED scripts
In-Reply-To: <4678F39A.1030305@dev.mellanox.co.il>
References: <4641D295.5060907@voltaire.com> <4641D32D.6030505@voltaire.com>
	<4678F106.9090508@voltaire.com>
	<4678F39A.1030305@dev.mellanox.co.il>
Message-ID: <467910A7.70001@voltaire.com>

Vladimir Sokolovsky wrote:

>> Vlad,
>>
>> It seems that commit 553e284ffb2f380dc8d1451bfb3ad40165f04112 in
>> ofed_1_2_scripts.git is different from the patch that I submitted. For
>> example:
>>
>> My patch:
>>
>> @@ -135,7 +135,7 @@ IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES
>>  # Iser
>>  # Currently iSER is supported only on SLES10 & RHEL5
>>  case ${K_VER} in
>> -        2.6.16.*-*-*|2.6.*.el5)
>> +        2.6.16.*-*-*|2.6.*.el5|2.6.9-*.EL*)
>>          IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES} ib_iser"
>>          ;;
>>  esac
>>
>>
>> patch applied in ofed_1_2_scripts.git:
>> @@ -135,7 +135,7 @@ IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES
>>  # Iser
>>  # Currently iSER is supported only on SLES10 & RHEL5
>>  case ${K_VER} in
>> -        2.6.16.*-*-*|2.6.*.el5)
>> +        2.6.16.*-*-*|2.6.*.el5|2.6.9-[3-5]*.EL*) <-- this line is
>> different
>>          IB_KERNEL_PACKAGES="${IB_KERNEL_PACKAGES} ib_iser"
>>          ;;
>>  esac
>>
>> Why is that?
>>
>> Erez
>
> You have added backport patches for RHEL4.0 U3, U4, U5.
> 2.6.9-*.EL* matches also U2. So, installation fails on RHEL 4.0 U2
> with your patch.
>
> Vladimir

You are right and I agree with your fix. Next time, just let me know if
you don't apply a patch as is.

Thanks,
Erez


From todd.rimmer at qlogic.com  Wed Jun 20 05:54:51 2007
From: todd.rimmer at qlogic.com (Todd Rimmer)
Date: Wed, 20 Jun 2007 07:54:51 -0500
Subject: [ofa-general] Patches to complib
In-Reply-To: <1182290419.15653.242651.camel@hal.voltaire.com>
Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE06119291112@EPEXCH2.qlogic.org>

Hal,

Attached is a diff with 2 fixes to complib.  The first is one I sent you
yesterday (reset count in qmap on remove_all).  The second corrects the
same bug in fleximap.

Patches are against main branch, however this code is the same in OFED
1.2 as well.

Todd Rimmer
Chief Architect 
QLogic System Interconnect Group
Voice: 610-233-4852     Fax: 610-233-4777
Todd.Rimmer at QLogic.com  www.QLogic.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cl_maps_count.diff
Type: application/octet-stream
Size: 1006 bytes
Desc: cl_maps_count.diff
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070620/f6297fa9/attachment.obj>

From halr at voltaire.com  Wed Jun 20 06:33:13 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Jun 2007 09:33:13 -0400
Subject: [ofa-general] Re: Patches to complib
In-Reply-To: <4FB1BCCAE6CAED44A1DC005B1DE06119291112@EPEXCH2.qlogic.org>
References: <4FB1BCCAE6CAED44A1DC005B1DE06119291112@EPEXCH2.qlogic.org>
Message-ID: <1182346392.15653.305841.camel@hal.voltaire.com>

Todd,

On Wed, 2007-06-20 at 08:54, Todd Rimmer wrote:
> Hal,
> 
> Attached is a diff with 2 fixes to complib.  The first is one I sent you
> yesterday (reset count in qmap on remove_all).  The second corrects the
> same bug in fleximap.
> 
> Patches are against main branch, however this code is the same in OFED
> 1.2 as well.

These patches appear to be against ofed_1_2 but they did apply to
master. This may cause an issue in the future but perhaps not for
complib changes.

Thanks. Applied (to master only).

In the future, please also include your S-O-B line:

Signed-off-by: Todd Rimmer <todd.rimmer at qlogic.com>

Also, patches are supposed to be submitted as inline text rather than
attachments.

-- Hal

> Todd Rimmer
> Chief Architect 
> QLogic System Interconnect Group
> Voice: 610-233-4852     Fax: 610-233-4777
> Todd.Rimmer at QLogic.com  www.QLogic.com


From HNGUYEN at de.ibm.com  Wed Jun 20 06:38:11 2007
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Wed, 20 Jun 2007 15:38:11 +0200
Subject: [ofa-general] Re: [ewg] Anouncement: OFED 1.2 rc6 is avilable
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9015636A9@mtlexch01.mtl.com>
Message-ID: <OFE2709483.B5746D0D-ONC1257300.0044F6E0-C1257300.004AE5B3@de.ibm.com>

Hello Tziporet!
In the attached release notes I see under "1.2 Supported Platforms and 
Operating Systems" this:
- RedHat EL5: 2.6.9-42.ELsmp
which should be 2.6.18-8.el5 according to my "uname -r" on a rhel5 system.

Mit freundlichen Gruessen/Kind Regards
Hoang-Nam Nguyen


IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Herbert Kircher
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294


ewg-bounces at lists.openfabrics.org wrote on 19.06.2007 16:47:43:

> 
> Hi, 
> 
> OFED 1.2-RC6 is available on
> http://www.openfabrics.org/builds/ofed-1.2/ 
> File: OFED-1.2-rc6.tgz 
> To get BUILD_ID run ofed_info 
> 
> Please report any issues in bugzilla https://bugs.openfabrics.org/
> 
> The GA release is expected this Friday (June 22)
> 
> I attach the OFED RN - please review and send me comments to the final
> release
> 
> Thanks,
> Tziporet
> 
> ========================================================================
> 
> Release information: 
> 
> OS support: 
> Novell: 
>     - SLES 9.0 SP3 
>     - SLES10 
>     - SLES10 SP1 RC5
> Redhat: 
>     - Redhat EL4 up3, up4 and up5 
>     - Redhat EL5 
> kernel.org: 
>     - 2.6.20 
>     - 2.6.19 
> 
> Note: Kernel 2.6.21, Fedora C6 and SuSE Pro 10 are not part of the
> official list. 
> We keep the backport patches for these OSes and make sure OFED compile
> and loaded properly but will not do full QA cycle.
> 
> Systems: 
>     * x86_64 
>     * x86 
>     * ia64 
>     * ppc64 
> 
> Main changes from OFED-1.1-rc5: 
> ===============================
> 1. Fixed 6 bugs (see attached for fixed issues)
> 
> See bugzilla for all open issues. 
> 
> Tasks that should be completed for the GA release: 
> 1. Complete all documentation (release notes, README, etc.) 
> 2. Run all QA tests on all platforms
> [attachment "rc6_fixed_bugs.csv" deleted by Hoang-Nam 
> Nguyen/Germany/IBM] [attachment "OFED_release_notes.txt" deleted by 
> Hoang-Nam Nguyen/Germany/IBM] 
_______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070620/41e5cf24/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5203 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070620/41e5cf24/attachment.bin>

From todd.rimmer at qlogic.com  Wed Jun 20 07:25:00 2007
From: todd.rimmer at qlogic.com (Todd Rimmer)
Date: Wed, 20 Jun 2007 09:25:00 -0500
Subject: [ofa-general] RE: Patches to complib
In-Reply-To: <1182346392.15653.305841.camel@hal.voltaire.com>
Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE06119291131@EPEXCH2.qlogic.org>

This adds get_next functions to the various maps (flexi, quick and map).


get_next searches for the 1st entry whose key is > the key specified.

As such get_next provides for searches where an exact key is not known,
or the map may be changing between searches (and hence the key of a
previously fetched entry may no longer be in the map).

This patch was generated against OFED 1.2, however I have diffed the
affected files and the files in the master are identical.

Signed-off-by: Todd Rimmer <todd.rimmer at qlogic.com>

diff -r -c orig2/osm/complib/cl_map.c fixed/osm/complib/cl_map.c
*** orig2/osm/complib/cl_map.c	Wed Jun 20 08:57:45 2007
--- fixed/osm/complib/cl_map.c	Wed Jun 20 09:41:55 2007
***************
*** 268,273 ****
--- 268,300 ----
  	return( p_item );
  }
  
+ cl_map_item_t*
+ cl_qmap_get_next(
+ 	IN	const cl_qmap_t* const	p_map,
+ 	IN	const uint64_t			key )
+ {
+ 	cl_map_item_t	*p_item;
+ 	cl_map_item_t	*p_item_found;
+ 
+ 	CL_ASSERT( p_map );
+ 	CL_ASSERT( p_map->state == CL_INITIALIZED );
+ 
+ 	p_item = __cl_map_root( p_map );
+ 	p_item_found = (cl_map_item_t*)&p_map->nil;
+ 
+ 	while( p_item != &p_map->nil )
+ 	{
+ 		if( key < p_item->key ){
+ 			p_item_found = p_item;
+ 			p_item = p_item->p_left;
+ 		}else{
+ 			p_item = p_item->p_right;
+ 		}
+ 	}
+     
+ 	return( p_item_found );
+ }
+ 
  void
  cl_qmap_apply_func(
  	IN	const cl_qmap_t* const	p_map,
***************
*** 832,837 ****
--- 859,881 ----
  	return( cl_qmap_obj( PARENT_STRUCT( p_item, cl_map_obj_t, item )
) );
  }
  
+ void*
+ cl_map_get_next(
+ 	IN	const cl_map_t* const	p_map,
+ 	IN	const uint64_t			key )
+ {
+ 	cl_map_item_t	*p_item;
+ 
+ 	CL_ASSERT( p_map );
+ 
+ 	p_item = cl_qmap_get_next( &p_map->qmap, key );
+ 
+ 	if( p_item == cl_qmap_end( &p_map->qmap ) )
+ 		return( NULL );
+ 
+ 	return( cl_qmap_obj( PARENT_STRUCT( p_item, cl_map_obj_t, item )
) );
+ }
+ 
  void
  cl_map_remove_item(
  	IN	cl_map_t* const			p_map,
***************
*** 1279,1284 ****
--- 1323,1358 ----
  	return( p_item );
  }
  
+ cl_fmap_item_t*
+ cl_fmap_get_next(
+ 	IN	const cl_fmap_t* const	p_map,
+ 	IN	const void* const		p_key )
+ {
+ 	cl_fmap_item_t	*p_item;
+ 	cl_fmap_item_t	*p_item_found;
+ 	intn_t			cmp;
+ 
+ 	CL_ASSERT( p_map );
+ 	CL_ASSERT( p_map->state == CL_INITIALIZED );
+ 
+ 	p_item = __cl_fmap_root( p_map );
+ 	p_item_found = (cl_fmap_item_t*)&p_map->nil;
+ 
+ 	while( p_item != &p_map->nil )
+ 	{
+ 		cmp = p_map->pfn_compare( p_key, p_item->p_key );
+ 
+ 		if( cmp < 0 ){
+ 			p_item_found = p_item;
+ 			p_item = p_item->p_left;	/* too small */
+ 		}else{
+ 			p_item = p_item->p_right;	/* too big or
match */
+ 		}
+ 	}
+ 
+ 	return( p_item_found );
+ }
+ 
  void
  cl_fmap_apply_func(
  	IN	const cl_fmap_t* const	p_map,
diff -r -c orig2/osm/include/complib/cl_fleximap.h
fixed/osm/include/complib/cl_fleximap.h
*** orig2/osm/include/complib/cl_fleximap.h	Wed Jun 20 08:57:45 2007
--- fixed/osm/include/complib/cl_fleximap.h	Wed Jun 20 09:30:30 2007
***************
*** 100,106 ****
  *
  *	Manipulation:
  *		cl_fmap_insert, cl_fmap_get, cl_fmap_remove_item,
cl_fmap_remove,
! *		cl_fmap_remove_all, cl_fmap_merge, cl_fmap_delta
  *
  *	Search:
  *		cl_fmap_apply_func
--- 100,106 ----
  *
  *	Manipulation:
  *		cl_fmap_insert, cl_fmap_get, cl_fmap_remove_item,
cl_fmap_remove,
! *		cl_fmap_remove_all, cl_fmap_merge, cl_fmap_delta,
cl_fmap_get_next
  *
  *	Search:
  *		cl_fmap_apply_func
***************
*** 672,678 ****
  *	cl_fmap_get does not remove the item from the flexi map.
  *
  * SEE ALSO
! *	Flexi Map, cl_fmap_remove
  *********/
  
  /****f* Component Library: Flexi Map/cl_fmap_remove_item
--- 672,714 ----
  *	cl_fmap_get does not remove the item from the flexi map.
  *
  * SEE ALSO
! *	Flexi Map, cl_fmap_remove, cl_fmap_get_next
! *********/
! 
! /****f* Component Library: Flexi Map/cl_fmap_get_next
! * NAME
! *	cl_fmap_get_next
! *
! * DESCRIPTION
! *	The cl_fmap_get_next function returns the first map item
associated with a
! *	key > the key specified.
! *
! * SYNOPSIS
! */
! cl_fmap_item_t*
! cl_fmap_get_next(
! 	IN	const cl_fmap_t* const	p_map,
! 	IN	const void* const		p_key );
! /*
! * PARAMETERS
! *	p_map
! *		[in] Pointer to a cl_fmap_t structure from which to
retrieve the
! *		item with the specified key.
! *
! *	p_key
! *		[in] Pointer to a key value used to search for the
desired map item.
! *
! * RETURN VALUES
! *	Pointer to the first map item with a key > the  desired key
value.
! *
! *	Pointer to the map end if there was no item with a key > the
desired key
! *	value stored in the flexi map.
! *
! * NOTES
! *	cl_fmap_get_next does not remove the item from the flexi map.
! *
! * SEE ALSO
! *	Flexi Map, cl_fmap_remove, cl_fmap_get
  *********/
  
  /****f* Component Library: Flexi Map/cl_fmap_remove_item
diff -r -c orig2/osm/include/complib/cl_map.h
fixed/osm/include/complib/cl_map.h
*** orig2/osm/include/complib/cl_map.h	Wed Jun 20 08:57:45 2007
--- fixed/osm/include/complib/cl_map.h	Wed Jun 20 09:30:51 2007
***************
*** 96,102 ****
  *
  *	Manipulation
  *		cl_map_insert, cl_map_get, cl_map_remove_item,
cl_map_remove,
! *		cl_map_remove_all, cl_map_merge, cl_map_delta
  *
  *	Attributes:
  *		cl_map_count, cl_is_map_empty, cl_is_map_inited
--- 96,102 ----
  *
  *	Manipulation
  *		cl_map_insert, cl_map_get, cl_map_remove_item,
cl_map_remove,
! *		cl_map_remove_all, cl_map_merge, cl_map_delta,
cl_map_get_next
  *
  *	Attributes:
  *		cl_map_count, cl_is_map_empty, cl_is_map_inited
***************
*** 628,634 ****
  *	cl_map_get does not remove the item from the map.
  *
  * SEE ALSO
! *	Map, cl_map_remove
  *********/
  
  /****f* Component Library: Map/cl_map_remove_item
--- 628,670 ----
  *	cl_map_get does not remove the item from the map.
  *
  * SEE ALSO
! *	Map, cl_map_remove, cl_map_get_next
! *********/
! 
! /****f* Component Library: Map/cl_map_get_next
! * NAME
! *	cl_map_get_next
! *
! * DESCRIPTION
! *	The cl_qmap_get_next function returns the first object
associated with a
! *	key > the key specified.
! *
! * SYNOPSIS
! */
! void*
! cl_map_get_next(
! 	IN	const cl_map_t* const	p_map,
! 	IN	const uint64_t			key );
! /*
! * PARAMETERS
! *	p_map
! *		[in] Pointer to a map from which to retrieve the object
with
! *		the specified key.
! *
! *	key
! *		[in] Key value used to search for the desired object.
! *
! * RETURN VALUES
! *	Pointer to the first object with a key > the desired key value.
! *
! *	NULL if there was no item with a key > the desired key
! *	value stored in the map.
! *
! * NOTES
! *	cl_map_get does not remove the item from the map.
! *
! * SEE ALSO
! *	Map, cl_map_remove, cl_map_get
  *********/
  
  /****f* Component Library: Map/cl_map_remove_item
diff -r -c orig2/osm/include/complib/cl_qmap.h
fixed/osm/include/complib/cl_qmap.h
*** orig2/osm/include/complib/cl_qmap.h	Wed Jun 20 08:57:45 2007
--- fixed/osm/include/complib/cl_qmap.h	Wed Jun 20 09:43:19 2007
***************
*** 98,104 ****
  *
  *	Manipulation:
  *		cl_qmap_insert, cl_qmap_get, cl_qmap_remove_item,
cl_qmap_remove,
! *		cl_qmap_remove_all, cl_qmap_merge, cl_qmap_delta
  *
  *	Search:
  *		cl_qmap_apply_func
--- 98,104 ----
  *
  *	Manipulation:
  *		cl_qmap_insert, cl_qmap_get, cl_qmap_remove_item,
cl_qmap_remove,
! *		cl_qmap_remove_all, cl_qmap_merge, cl_qmap_delta,
cl_qmap_get_next
  *
  *	Search:
  *		cl_qmap_apply_func
***************
*** 749,755 ****
  *	cl_qmap_get does not remove the item from the quick map.
  *
  * SEE ALSO
! *	Quick Map, cl_qmap_remove
  *********/
  
  /****f* Component Library: Quick Map/cl_qmap_remove_item
--- 749,791 ----
  *	cl_qmap_get does not remove the item from the quick map.
  *
  * SEE ALSO
! *	Quick Map, cl_qmap_get_next, cl_qmap_remove
! *********/
! 
! /****f* Component Library: Quick Map/cl_qmap_get_next
! * NAME
! *	cl_qmap_get_next
! *
! * DESCRIPTION
! *	The cl_qmap_get_next function returns the first map item
associated with a
! *	key > the key specified.
! *
! * SYNOPSIS
! */
! cl_map_item_t*
! cl_qmap_get_next(
! 	IN	const cl_qmap_t* const	p_map,
! 	IN	const uint64_t			key );
! /*
! * PARAMETERS
! *	p_map
! *		[in] Pointer to a cl_qmap_t structure from which to
retrieve the
! *		first item with a key > the specified key.
! *
! *	key
! *		[in] Key value used to search for the desired map item.
! *
! * RETURN VALUES
! *	Pointer to the first map item with a key > the desired key
value.
! *
! *	Pointer to the map end if there was no item with a key > the
desired key
! *	value stored in the quick map.
! *
! * NOTES
! *	cl_qmap_get_next does not remove the item from the quick map.
! *
! * SEE ALSO
! *	Quick Map, cl_qmap_get, cl_qmap_remove
  *********/
  
  /****f* Component Library: Quick Map/cl_qmap_remove_item

Todd Rimmer
Chief Architect 
QLogic System Interconnect Group
Voice: 610-233-4852     Fax: 610-233-4777
Todd.Rimmer at QLogic.com  www.QLogic.com


From halr at voltaire.com  Wed Jun 20 07:50:23 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Jun 2007 10:50:23 -0400
Subject: [ofa-general] RE: Patches to complib
In-Reply-To: <4FB1BCCAE6CAED44A1DC005B1DE06119291131@EPEXCH2.qlogic.org>
References: <4FB1BCCAE6CAED44A1DC005B1DE06119291131@EPEXCH2.qlogic.org>
Message-ID: <1182351021.15653.310989.camel@hal.voltaire.com>

On Wed, 2007-06-20 at 10:25, Todd Rimmer wrote:
> This adds get_next functions to the various maps (flexi, quick and map).
> 
> 
> get_next searches for the 1st entry whose key is > the key specified.
> 
> As such get_next provides for searches where an exact key is not known,
> or the map may be changing between searches (and hence the key of a
> previously fetched entry may no longer be in the map).

Looks like a nice functionality addition.

> This patch was generated against OFED 1.2, however I have diffed the
> affected files and the files in the master are identical.
> 
> Signed-off-by: Todd Rimmer <todd.rimmer at qlogic.com>

Your mailer may be munging this patch:

|diff -r -c orig2/osm/complib/cl_map.c fixed/osm/complib/cl_map.c
|*** orig2/osm/complib/cl_map.c Wed Jun 20 08:57:45 2007
|--- fixed/osm/complib/cl_map.c Wed Jun 20 09:41:55 2007
--------------------------
File to patch: complib/cl_map.c
patching file complib/cl_map.c
patch: **** malformed patch at line 95: ) );

-- Hal

> diff -r -c orig2/osm/complib/cl_map.c fixed/osm/complib/cl_map.c
> *** orig2/osm/complib/cl_map.c	Wed Jun 20 08:57:45 2007
> --- fixed/osm/complib/cl_map.c	Wed Jun 20 09:41:55 2007
> ***************
> *** 268,273 ****
> --- 268,300 ----
>   	return( p_item );
>   }
>   
> + cl_map_item_t*
> + cl_qmap_get_next(
> + 	IN	const cl_qmap_t* const	p_map,
> + 	IN	const uint64_t			key )
> + {
> + 	cl_map_item_t	*p_item;
> + 	cl_map_item_t	*p_item_found;
> + 
> + 	CL_ASSERT( p_map );
> + 	CL_ASSERT( p_map->state == CL_INITIALIZED );
> + 
> + 	p_item = __cl_map_root( p_map );
> + 	p_item_found = (cl_map_item_t*)&p_map->nil;
> + 
> + 	while( p_item != &p_map->nil )
> + 	{
> + 		if( key < p_item->key ){
> + 			p_item_found = p_item;
> + 			p_item = p_item->p_left;
> + 		}else{
> + 			p_item = p_item->p_right;
> + 		}
> + 	}
> +     
> + 	return( p_item_found );
> + }
> + 
>   void
>   cl_qmap_apply_func(
>   	IN	const cl_qmap_t* const	p_map,
> ***************
> *** 832,837 ****
> --- 859,881 ----
>   	return( cl_qmap_obj( PARENT_STRUCT( p_item, cl_map_obj_t, item )
> ) );
>   }
>   
> + void*
> + cl_map_get_next(
> + 	IN	const cl_map_t* const	p_map,
> + 	IN	const uint64_t			key )
> + {
> + 	cl_map_item_t	*p_item;
> + 
> + 	CL_ASSERT( p_map );
> + 
> + 	p_item = cl_qmap_get_next( &p_map->qmap, key );
> + 
> + 	if( p_item == cl_qmap_end( &p_map->qmap ) )
> + 		return( NULL );
> + 
> + 	return( cl_qmap_obj( PARENT_STRUCT( p_item, cl_map_obj_t, item )
> ) );
> + }
> + 
>   void
>   cl_map_remove_item(
>   	IN	cl_map_t* const			p_map,
> ***************
> *** 1279,1284 ****
> --- 1323,1358 ----
>   	return( p_item );
>   }
>   
> + cl_fmap_item_t*
> + cl_fmap_get_next(
> + 	IN	const cl_fmap_t* const	p_map,
> + 	IN	const void* const		p_key )
> + {
> + 	cl_fmap_item_t	*p_item;
> + 	cl_fmap_item_t	*p_item_found;
> + 	intn_t			cmp;
> + 
> + 	CL_ASSERT( p_map );
> + 	CL_ASSERT( p_map->state == CL_INITIALIZED );
> + 
> + 	p_item = __cl_fmap_root( p_map );
> + 	p_item_found = (cl_fmap_item_t*)&p_map->nil;
> + 
> + 	while( p_item != &p_map->nil )
> + 	{
> + 		cmp = p_map->pfn_compare( p_key, p_item->p_key );
> + 
> + 		if( cmp < 0 ){
> + 			p_item_found = p_item;
> + 			p_item = p_item->p_left;	/* too small */
> + 		}else{
> + 			p_item = p_item->p_right;	/* too big or
> match */
> + 		}
> + 	}
> + 
> + 	return( p_item_found );
> + }
> + 
>   void
>   cl_fmap_apply_func(
>   	IN	const cl_fmap_t* const	p_map,
> diff -r -c orig2/osm/include/complib/cl_fleximap.h
> fixed/osm/include/complib/cl_fleximap.h
> *** orig2/osm/include/complib/cl_fleximap.h	Wed Jun 20 08:57:45 2007
> --- fixed/osm/include/complib/cl_fleximap.h	Wed Jun 20 09:30:30 2007
> ***************
> *** 100,106 ****
>   *
>   *	Manipulation:
>   *		cl_fmap_insert, cl_fmap_get, cl_fmap_remove_item,
> cl_fmap_remove,
> ! *		cl_fmap_remove_all, cl_fmap_merge, cl_fmap_delta
>   *
>   *	Search:
>   *		cl_fmap_apply_func
> --- 100,106 ----
>   *
>   *	Manipulation:
>   *		cl_fmap_insert, cl_fmap_get, cl_fmap_remove_item,
> cl_fmap_remove,
> ! *		cl_fmap_remove_all, cl_fmap_merge, cl_fmap_delta,
> cl_fmap_get_next
>   *
>   *	Search:
>   *		cl_fmap_apply_func
> ***************
> *** 672,678 ****
>   *	cl_fmap_get does not remove the item from the flexi map.
>   *
>   * SEE ALSO
> ! *	Flexi Map, cl_fmap_remove
>   *********/
>   
>   /****f* Component Library: Flexi Map/cl_fmap_remove_item
> --- 672,714 ----
>   *	cl_fmap_get does not remove the item from the flexi map.
>   *
>   * SEE ALSO
> ! *	Flexi Map, cl_fmap_remove, cl_fmap_get_next
> ! *********/
> ! 
> ! /****f* Component Library: Flexi Map/cl_fmap_get_next
> ! * NAME
> ! *	cl_fmap_get_next
> ! *
> ! * DESCRIPTION
> ! *	The cl_fmap_get_next function returns the first map item
> associated with a
> ! *	key > the key specified.
> ! *
> ! * SYNOPSIS
> ! */
> ! cl_fmap_item_t*
> ! cl_fmap_get_next(
> ! 	IN	const cl_fmap_t* const	p_map,
> ! 	IN	const void* const		p_key );
> ! /*
> ! * PARAMETERS
> ! *	p_map
> ! *		[in] Pointer to a cl_fmap_t structure from which to
> retrieve the
> ! *		item with the specified key.
> ! *
> ! *	p_key
> ! *		[in] Pointer to a key value used to search for the
> desired map item.
> ! *
> ! * RETURN VALUES
> ! *	Pointer to the first map item with a key > the  desired key
> value.
> ! *
> ! *	Pointer to the map end if there was no item with a key > the
> desired key
> ! *	value stored in the flexi map.
> ! *
> ! * NOTES
> ! *	cl_fmap_get_next does not remove the item from the flexi map.
> ! *
> ! * SEE ALSO
> ! *	Flexi Map, cl_fmap_remove, cl_fmap_get
>   *********/
>   
>   /****f* Component Library: Flexi Map/cl_fmap_remove_item
> diff -r -c orig2/osm/include/complib/cl_map.h
> fixed/osm/include/complib/cl_map.h
> *** orig2/osm/include/complib/cl_map.h	Wed Jun 20 08:57:45 2007
> --- fixed/osm/include/complib/cl_map.h	Wed Jun 20 09:30:51 2007
> ***************
> *** 96,102 ****
>   *
>   *	Manipulation
>   *		cl_map_insert, cl_map_get, cl_map_remove_item,
> cl_map_remove,
> ! *		cl_map_remove_all, cl_map_merge, cl_map_delta
>   *
>   *	Attributes:
>   *		cl_map_count, cl_is_map_empty, cl_is_map_inited
> --- 96,102 ----
>   *
>   *	Manipulation
>   *		cl_map_insert, cl_map_get, cl_map_remove_item,
> cl_map_remove,
> ! *		cl_map_remove_all, cl_map_merge, cl_map_delta,
> cl_map_get_next
>   *
>   *	Attributes:
>   *		cl_map_count, cl_is_map_empty, cl_is_map_inited
> ***************
> *** 628,634 ****
>   *	cl_map_get does not remove the item from the map.
>   *
>   * SEE ALSO
> ! *	Map, cl_map_remove
>   *********/
>   
>   /****f* Component Library: Map/cl_map_remove_item
> --- 628,670 ----
>   *	cl_map_get does not remove the item from the map.
>   *
>   * SEE ALSO
> ! *	Map, cl_map_remove, cl_map_get_next
> ! *********/
> ! 
> ! /****f* Component Library: Map/cl_map_get_next
> ! * NAME
> ! *	cl_map_get_next
> ! *
> ! * DESCRIPTION
> ! *	The cl_qmap_get_next function returns the first object
> associated with a
> ! *	key > the key specified.
> ! *
> ! * SYNOPSIS
> ! */
> ! void*
> ! cl_map_get_next(
> ! 	IN	const cl_map_t* const	p_map,
> ! 	IN	const uint64_t			key );
> ! /*
> ! * PARAMETERS
> ! *	p_map
> ! *		[in] Pointer to a map from which to retrieve the object
> with
> ! *		the specified key.
> ! *
> ! *	key
> ! *		[in] Key value used to search for the desired object.
> ! *
> ! * RETURN VALUES
> ! *	Pointer to the first object with a key > the desired key value.
> ! *
> ! *	NULL if there was no item with a key > the desired key
> ! *	value stored in the map.
> ! *
> ! * NOTES
> ! *	cl_map_get does not remove the item from the map.
> ! *
> ! * SEE ALSO
> ! *	Map, cl_map_remove, cl_map_get
>   *********/
>   
>   /****f* Component Library: Map/cl_map_remove_item
> diff -r -c orig2/osm/include/complib/cl_qmap.h
> fixed/osm/include/complib/cl_qmap.h
> *** orig2/osm/include/complib/cl_qmap.h	Wed Jun 20 08:57:45 2007
> --- fixed/osm/include/complib/cl_qmap.h	Wed Jun 20 09:43:19 2007
> ***************
> *** 98,104 ****
>   *
>   *	Manipulation:
>   *		cl_qmap_insert, cl_qmap_get, cl_qmap_remove_item,
> cl_qmap_remove,
> ! *		cl_qmap_remove_all, cl_qmap_merge, cl_qmap_delta
>   *
>   *	Search:
>   *		cl_qmap_apply_func
> --- 98,104 ----
>   *
>   *	Manipulation:
>   *		cl_qmap_insert, cl_qmap_get, cl_qmap_remove_item,
> cl_qmap_remove,
> ! *		cl_qmap_remove_all, cl_qmap_merge, cl_qmap_delta,
> cl_qmap_get_next
>   *
>   *	Search:
>   *		cl_qmap_apply_func
> ***************
> *** 749,755 ****
>   *	cl_qmap_get does not remove the item from the quick map.
>   *
>   * SEE ALSO
> ! *	Quick Map, cl_qmap_remove
>   *********/
>   
>   /****f* Component Library: Quick Map/cl_qmap_remove_item
> --- 749,791 ----
>   *	cl_qmap_get does not remove the item from the quick map.
>   *
>   * SEE ALSO
> ! *	Quick Map, cl_qmap_get_next, cl_qmap_remove
> ! *********/
> ! 
> ! /****f* Component Library: Quick Map/cl_qmap_get_next
> ! * NAME
> ! *	cl_qmap_get_next
> ! *
> ! * DESCRIPTION
> ! *	The cl_qmap_get_next function returns the first map item
> associated with a
> ! *	key > the key specified.
> ! *
> ! * SYNOPSIS
> ! */
> ! cl_map_item_t*
> ! cl_qmap_get_next(
> ! 	IN	const cl_qmap_t* const	p_map,
> ! 	IN	const uint64_t			key );
> ! /*
> ! * PARAMETERS
> ! *	p_map
> ! *		[in] Pointer to a cl_qmap_t structure from which to
> retrieve the
> ! *		first item with a key > the specified key.
> ! *
> ! *	key
> ! *		[in] Key value used to search for the desired map item.
> ! *
> ! * RETURN VALUES
> ! *	Pointer to the first map item with a key > the desired key
> value.
> ! *
> ! *	Pointer to the map end if there was no item with a key > the
> desired key
> ! *	value stored in the quick map.
> ! *
> ! * NOTES
> ! *	cl_qmap_get_next does not remove the item from the quick map.
> ! *
> ! * SEE ALSO
> ! *	Quick Map, cl_qmap_get, cl_qmap_remove
>   *********/
>   
>   /****f* Component Library: Quick Map/cl_qmap_remove_item
> 
> Todd Rimmer
> Chief Architect 
> QLogic System Interconnect Group
> Voice: 610-233-4852     Fax: 610-233-4777
> Todd.Rimmer at QLogic.com  www.QLogic.com


From todd.rimmer at qlogic.com  Wed Jun 20 07:55:45 2007
From: todd.rimmer at qlogic.com (Todd Rimmer)
Date: Wed, 20 Jun 2007 09:55:45 -0500
Subject: [ofa-general] RE: Patches to complib
In-Reply-To: <1182351021.15653.310989.camel@hal.voltaire.com>
Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE0611929113F@EPEXCH2.qlogic.org>

> From: Hal Rosenstock [mailto:halr at voltaire.com]
> Your mailer may be munging this patch:

Here it is as an attachment, I'll try to see in the future if other
mailer options correct the munging issue.

Todd Rimmer
Chief Architect 
QLogic System Interconnect Group
Voice: 610-233-4852     Fax: 610-233-4777
Todd.Rimmer at QLogic.com  www.QLogic.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cl_maps_getnext.diff
Type: application/octet-stream
Size: 8176 bytes
Desc: cl_maps_getnext.diff
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070620/b25c6478/attachment.obj>

From isaac at clusterfs.com  Wed Jun 20 08:01:59 2007
From: isaac at clusterfs.com (Isaac Huang)
Date: Wed, 20 Jun 2007 23:01:59 +0800
Subject: [ofa-general] a possible bug in
	drivers/infiniband/hw/mthca/mthca_qp.c
Message-ID: <20070620150159.GA5628@clusterfs.com>

Hi, I don't understand the code but it doesn't look alright:

static void to_ib_ah_attr(struct mthca_dev *dev, struct ib_ah_attr *ib_ah_attr,
                                struct mthca_qp_path *path)
{
	memset(ib_ah_attr, 0, sizeof *path);


I think it shall be 'sizeof *ib_ah_attr' instead. Please CC me - I'm
not on this list.


Thanks,
Isaac


From rdreier at cisco.com  Wed Jun 20 08:06:58 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 20 Jun 2007 08:06:58 -0700
Subject: [ofa-general] a possible bug in
	drivers/infiniband/hw/mthca/mthca_qp.c
In-Reply-To: <20070620150159.GA5628@clusterfs.com> (Isaac Huang's message of
	"Wed, 20 Jun 2007 23:01:59 +0800")
References: <20070620150159.GA5628@clusterfs.com>
Message-ID: <adalkeeacml.fsf@cisco.com>

 > static void to_ib_ah_attr(struct mthca_dev *dev, struct ib_ah_attr *ib_ah_attr,
 >                                 struct mthca_qp_path *path)
 > {
 > 	memset(ib_ah_attr, 0, sizeof *path);
 > 
 > I think it shall be 'sizeof *ib_ah_attr' instead. Please CC me - I'm
 > not on this list.

Yes, you're right, but what source are you looking at?  The fix went
into the kernel with commit 99d4f22e in 2.6.21-rc1, back in February.

 - R.


From Thomas.Talpey at netapp.com  Wed Jun 20 08:21:46 2007
From: Thomas.Talpey at netapp.com (Talpey, Thomas)
Date: Wed, 20 Jun 2007 11:21:46 -0400
Subject: [ofa-general] why netwoked file system(e.g. nfs, pvfs,
	etc.) supported IB by using access layer (linux kernel ib ops)
In-Reply-To: <4678D3E5.706@ncic.ac.cn>
References: <4678D3E5.706@ncic.ac.cn>
Message-ID: <EXNANE01FRaqbC8wSA100000ba0@exnane01.hq.netapp.com>

At 03:14 AM 6/20/2007, ncic wrote:
>why didn't they support ib with sdp?

There are two main answers.

The first is licensing. SDP licensing is wrapped up in a Microsoft
intellectual property issue, this has prevented its inclusion in some
kernels, including Linux. So, upper layers cannot depend in its
presence.

The second, speaking for NFS at least, is performance. SDP relies
heavily on additional setup exchanges and RDMA Read for transparency,
these negatively impact performance. With minimal additional work,
the same unmodified upper layer NFS filesystem code can use native
RDMA exchanges via the RPC layer and achieve truly excellent
performance. Check out Helen Chen's presentation from the recent
Sonoma workshop.
<http://www.openfabrics.org/archives/spring2007sonoma/Tuesday%20May%201/Helen%20Chen%20NFS%20over%20RDMA%20-%20IB%20and%20iWARP-5.pdf>

In the NFS case, the protocol is on a standards track and published in
the IETF (I'm the primary author), I'm hopeful that the edits I'm currently
preparing for publication will be finalized around the July meeting. And, we
have complete implementations of both client and server in both Linux and
OpenSolaris.

For transparent mode, don't discount ordinary sockets over a connected
mode IPoIB approach. The performance is very good, and provides a fully
transparent solution to all upper layers. RDMA is better though, by (greatly)
reducing overhead.

Tom.


From isaac at clusterfs.com  Wed Jun 20 08:40:47 2007
From: isaac at clusterfs.com (Isaac Huang)
Date: Wed, 20 Jun 2007 23:40:47 +0800
Subject: [ofa-general] a possible bug in
	drivers/infiniband/hw/mthca/mthca_qp.c
In-Reply-To: <adalkeeacml.fsf@cisco.com>
References: <20070620150159.GA5628@clusterfs.com> <adalkeeacml.fsf@cisco.com>
Message-ID: <20070620154047.GB5628@clusterfs.com>

On Wed, Jun 20, 2007 at 08:06:58AM -0700, Roland Dreier wrote:
>  > static void to_ib_ah_attr(struct mthca_dev *dev, struct ib_ah_attr *ib_ah_attr,
>  >                                 struct mthca_qp_path *path)
>  > {
>  > 	memset(ib_ah_attr, 0, sizeof *path);
>  > 
>  > I think it shall be 'sizeof *ib_ah_attr' instead. Please CC me - I'm
>  > not on this list.
> 
> Yes, you're right, but what source are you looking at?  The fix went
> into the kernel with commit 99d4f22e in 2.6.21-rc1, back in February.
> 

I stumbled upon that in OFED 1.1, then I looked somewhere in the mist
of openfabrics git trees, maybe I checked the wrong branch; sorry.

Isaac


From halr at voltaire.com  Wed Jun 20 09:12:10 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Jun 2007 12:12:10 -0400
Subject: [ofa-general] RE: Patches to complib
In-Reply-To: <4FB1BCCAE6CAED44A1DC005B1DE0611929113F@EPEXCH2.qlogic.org>
References: <4FB1BCCAE6CAED44A1DC005B1DE0611929113F@EPEXCH2.qlogic.org>
Message-ID: <1182355925.15653.316439.camel@hal.voltaire.com>

On Wed, 2007-06-20 at 10:55, Todd Rimmer wrote:
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Your mailer may be munging this patch:
> 
> Here it is as an attachment, I'll try to see in the future if other
> mailer options correct the munging issue.

Yes, that works better so it was your mailer.

I also added your new get_next map functions to global symbols in the
complib map.

Thanks. Applied (to master only).

-- Hal

> Todd Rimmer
> Chief Architect 
> QLogic System Interconnect Group
> Voice: 610-233-4852     Fax: 610-233-4777
> Todd.Rimmer at QLogic.com  www.QLogic.com


From sashak at voltaire.com  Wed Jun 20 09:06:15 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 20 Jun 2007 19:06:15 +0300
Subject: [ofa-general] RE: Patches to complib
In-Reply-To: <4FB1BCCAE6CAED44A1DC005B1DE06119291131@EPEXCH2.qlogic.org>
References: <4FB1BCCAE6CAED44A1DC005B1DE06119291131@EPEXCH2.qlogic.org>
Message-ID: <1182355575.30285.18.camel@localhost>

Hi Todd,

On Wed, 2007-06-20 at 09:25 -0500, Todd Rimmer wrote:
> This adds get_next functions to the various maps (flexi, quick and map).
> 
> 
> get_next searches for the 1st entry whose key is > the key specified.

What about cleaner names? Maybe something like get_next_higher() or just
get_higher()?

> As such get_next provides for searches where an exact key is not known,
> or the map may be changing between searches (and hence the key of a
> previously fetched entry may no longer be in the map).

Just wondering, where those new functions are supposed to be used?

Sasha

> 
> This patch was generated against OFED 1.2, however I have diffed the
> affected files and the files in the master are identical.
> 
> Signed-off-by: Todd Rimmer <todd.rimmer at qlogic.com>
> 
> diff -r -c orig2/osm/complib/cl_map.c fixed/osm/complib/cl_map.c
> *** orig2/osm/complib/cl_map.c	Wed Jun 20 08:57:45 2007
> --- fixed/osm/complib/cl_map.c	Wed Jun 20 09:41:55 2007
> ***************
> *** 268,273 ****
> --- 268,300 ----
>   	return( p_item );
>   }
>   
> + cl_map_item_t*
> + cl_qmap_get_next(
> + 	IN	const cl_qmap_t* const	p_map,
> + 	IN	const uint64_t			key )
> + {
> + 	cl_map_item_t	*p_item;
> + 	cl_map_item_t	*p_item_found;
> + 
> + 	CL_ASSERT( p_map );
> + 	CL_ASSERT( p_map->state == CL_INITIALIZED );
> + 
> + 	p_item = __cl_map_root( p_map );
> + 	p_item_found = (cl_map_item_t*)&p_map->nil;
> + 
> + 	while( p_item != &p_map->nil )
> + 	{
> + 		if( key < p_item->key ){
> + 			p_item_found = p_item;
> + 			p_item = p_item->p_left;
> + 		}else{
> + 			p_item = p_item->p_right;
> + 		}
> + 	}
> +     
> + 	return( p_item_found );
> + }
> + 
>   void
>   cl_qmap_apply_func(
>   	IN	const cl_qmap_t* const	p_map,
> ***************
> *** 832,837 ****
> --- 859,881 ----
>   	return( cl_qmap_obj( PARENT_STRUCT( p_item, cl_map_obj_t, item )
> ) );
>   }
>   
> + void*
> + cl_map_get_next(
> + 	IN	const cl_map_t* const	p_map,
> + 	IN	const uint64_t			key )
> + {
> + 	cl_map_item_t	*p_item;
> + 
> + 	CL_ASSERT( p_map );
> + 
> + 	p_item = cl_qmap_get_next( &p_map->qmap, key );
> + 
> + 	if( p_item == cl_qmap_end( &p_map->qmap ) )
> + 		return( NULL );
> + 
> + 	return( cl_qmap_obj( PARENT_STRUCT( p_item, cl_map_obj_t, item )
> ) );
> + }
> + 
>   void
>   cl_map_remove_item(
>   	IN	cl_map_t* const			p_map,
> ***************
> *** 1279,1284 ****
> --- 1323,1358 ----
>   	return( p_item );
>   }
>   
> + cl_fmap_item_t*
> + cl_fmap_get_next(
> + 	IN	const cl_fmap_t* const	p_map,
> + 	IN	const void* const		p_key )
> + {
> + 	cl_fmap_item_t	*p_item;
> + 	cl_fmap_item_t	*p_item_found;
> + 	intn_t			cmp;
> + 
> + 	CL_ASSERT( p_map );
> + 	CL_ASSERT( p_map->state == CL_INITIALIZED );
> + 
> + 	p_item = __cl_fmap_root( p_map );
> + 	p_item_found = (cl_fmap_item_t*)&p_map->nil;
> + 
> + 	while( p_item != &p_map->nil )
> + 	{
> + 		cmp = p_map->pfn_compare( p_key, p_item->p_key );
> + 
> + 		if( cmp < 0 ){
> + 			p_item_found = p_item;
> + 			p_item = p_item->p_left;	/* too small */
> + 		}else{
> + 			p_item = p_item->p_right;	/* too big or
> match */
> + 		}
> + 	}
> + 
> + 	return( p_item_found );
> + }
> + 
>   void
>   cl_fmap_apply_func(
>   	IN	const cl_fmap_t* const	p_map,
> diff -r -c orig2/osm/include/complib/cl_fleximap.h
> fixed/osm/include/complib/cl_fleximap.h
> *** orig2/osm/include/complib/cl_fleximap.h	Wed Jun 20 08:57:45 2007
> --- fixed/osm/include/complib/cl_fleximap.h	Wed Jun 20 09:30:30 2007
> ***************
> *** 100,106 ****
>   *
>   *	Manipulation:
>   *		cl_fmap_insert, cl_fmap_get, cl_fmap_remove_item,
> cl_fmap_remove,
> ! *		cl_fmap_remove_all, cl_fmap_merge, cl_fmap_delta
>   *
>   *	Search:
>   *		cl_fmap_apply_func
> --- 100,106 ----
>   *
>   *	Manipulation:
>   *		cl_fmap_insert, cl_fmap_get, cl_fmap_remove_item,
> cl_fmap_remove,
> ! *		cl_fmap_remove_all, cl_fmap_merge, cl_fmap_delta,
> cl_fmap_get_next
>   *
>   *	Search:
>   *		cl_fmap_apply_func
> ***************
> *** 672,678 ****
>   *	cl_fmap_get does not remove the item from the flexi map.
>   *
>   * SEE ALSO
> ! *	Flexi Map, cl_fmap_remove
>   *********/
>   
>   /****f* Component Library: Flexi Map/cl_fmap_remove_item
> --- 672,714 ----
>   *	cl_fmap_get does not remove the item from the flexi map.
>   *
>   * SEE ALSO
> ! *	Flexi Map, cl_fmap_remove, cl_fmap_get_next
> ! *********/
> ! 
> ! /****f* Component Library: Flexi Map/cl_fmap_get_next
> ! * NAME
> ! *	cl_fmap_get_next
> ! *
> ! * DESCRIPTION
> ! *	The cl_fmap_get_next function returns the first map item
> associated with a
> ! *	key > the key specified.
> ! *
> ! * SYNOPSIS
> ! */
> ! cl_fmap_item_t*
> ! cl_fmap_get_next(
> ! 	IN	const cl_fmap_t* const	p_map,
> ! 	IN	const void* const		p_key );
> ! /*
> ! * PARAMETERS
> ! *	p_map
> ! *		[in] Pointer to a cl_fmap_t structure from which to
> retrieve the
> ! *		item with the specified key.
> ! *
> ! *	p_key
> ! *		[in] Pointer to a key value used to search for the
> desired map item.
> ! *
> ! * RETURN VALUES
> ! *	Pointer to the first map item with a key > the  desired key
> value.
> ! *
> ! *	Pointer to the map end if there was no item with a key > the
> desired key
> ! *	value stored in the flexi map.
> ! *
> ! * NOTES
> ! *	cl_fmap_get_next does not remove the item from the flexi map.
> ! *
> ! * SEE ALSO
> ! *	Flexi Map, cl_fmap_remove, cl_fmap_get
>   *********/
>   
>   /****f* Component Library: Flexi Map/cl_fmap_remove_item
> diff -r -c orig2/osm/include/complib/cl_map.h
> fixed/osm/include/complib/cl_map.h
> *** orig2/osm/include/complib/cl_map.h	Wed Jun 20 08:57:45 2007
> --- fixed/osm/include/complib/cl_map.h	Wed Jun 20 09:30:51 2007
> ***************
> *** 96,102 ****
>   *
>   *	Manipulation
>   *		cl_map_insert, cl_map_get, cl_map_remove_item,
> cl_map_remove,
> ! *		cl_map_remove_all, cl_map_merge, cl_map_delta
>   *
>   *	Attributes:
>   *		cl_map_count, cl_is_map_empty, cl_is_map_inited
> --- 96,102 ----
>   *
>   *	Manipulation
>   *		cl_map_insert, cl_map_get, cl_map_remove_item,
> cl_map_remove,
> ! *		cl_map_remove_all, cl_map_merge, cl_map_delta,
> cl_map_get_next
>   *
>   *	Attributes:
>   *		cl_map_count, cl_is_map_empty, cl_is_map_inited
> ***************
> *** 628,634 ****
>   *	cl_map_get does not remove the item from the map.
>   *
>   * SEE ALSO
> ! *	Map, cl_map_remove
>   *********/
>   
>   /****f* Component Library: Map/cl_map_remove_item
> --- 628,670 ----
>   *	cl_map_get does not remove the item from the map.
>   *
>   * SEE ALSO
> ! *	Map, cl_map_remove, cl_map_get_next
> ! *********/
> ! 
> ! /****f* Component Library: Map/cl_map_get_next
> ! * NAME
> ! *	cl_map_get_next
> ! *
> ! * DESCRIPTION
> ! *	The cl_qmap_get_next function returns the first object
> associated with a
> ! *	key > the key specified.
> ! *
> ! * SYNOPSIS
> ! */
> ! void*
> ! cl_map_get_next(
> ! 	IN	const cl_map_t* const	p_map,
> ! 	IN	const uint64_t			key );
> ! /*
> ! * PARAMETERS
> ! *	p_map
> ! *		[in] Pointer to a map from which to retrieve the object
> with
> ! *		the specified key.
> ! *
> ! *	key
> ! *		[in] Key value used to search for the desired object.
> ! *
> ! * RETURN VALUES
> ! *	Pointer to the first object with a key > the desired key value.
> ! *
> ! *	NULL if there was no item with a key > the desired key
> ! *	value stored in the map.
> ! *
> ! * NOTES
> ! *	cl_map_get does not remove the item from the map.
> ! *
> ! * SEE ALSO
> ! *	Map, cl_map_remove, cl_map_get
>   *********/
>   
>   /****f* Component Library: Map/cl_map_remove_item
> diff -r -c orig2/osm/include/complib/cl_qmap.h
> fixed/osm/include/complib/cl_qmap.h
> *** orig2/osm/include/complib/cl_qmap.h	Wed Jun 20 08:57:45 2007
> --- fixed/osm/include/complib/cl_qmap.h	Wed Jun 20 09:43:19 2007
> ***************
> *** 98,104 ****
>   *
>   *	Manipulation:
>   *		cl_qmap_insert, cl_qmap_get, cl_qmap_remove_item,
> cl_qmap_remove,
> ! *		cl_qmap_remove_all, cl_qmap_merge, cl_qmap_delta
>   *
>   *	Search:
>   *		cl_qmap_apply_func
> --- 98,104 ----
>   *
>   *	Manipulation:
>   *		cl_qmap_insert, cl_qmap_get, cl_qmap_remove_item,
> cl_qmap_remove,
> ! *		cl_qmap_remove_all, cl_qmap_merge, cl_qmap_delta,
> cl_qmap_get_next
>   *
>   *	Search:
>   *		cl_qmap_apply_func
> ***************
> *** 749,755 ****
>   *	cl_qmap_get does not remove the item from the quick map.
>   *
>   * SEE ALSO
> ! *	Quick Map, cl_qmap_remove
>   *********/
>   
>   /****f* Component Library: Quick Map/cl_qmap_remove_item
> --- 749,791 ----
>   *	cl_qmap_get does not remove the item from the quick map.
>   *
>   * SEE ALSO
> ! *	Quick Map, cl_qmap_get_next, cl_qmap_remove
> ! *********/
> ! 
> ! /****f* Component Library: Quick Map/cl_qmap_get_next
> ! * NAME
> ! *	cl_qmap_get_next
> ! *
> ! * DESCRIPTION
> ! *	The cl_qmap_get_next function returns the first map item
> associated with a
> ! *	key > the key specified.
> ! *
> ! * SYNOPSIS
> ! */
> ! cl_map_item_t*
> ! cl_qmap_get_next(
> ! 	IN	const cl_qmap_t* const	p_map,
> ! 	IN	const uint64_t			key );
> ! /*
> ! * PARAMETERS
> ! *	p_map
> ! *		[in] Pointer to a cl_qmap_t structure from which to
> retrieve the
> ! *		first item with a key > the specified key.
> ! *
> ! *	key
> ! *		[in] Key value used to search for the desired map item.
> ! *
> ! * RETURN VALUES
> ! *	Pointer to the first map item with a key > the desired key
> value.
> ! *
> ! *	Pointer to the map end if there was no item with a key > the
> desired key
> ! *	value stored in the quick map.
> ! *
> ! * NOTES
> ! *	cl_qmap_get_next does not remove the item from the quick map.
> ! *
> ! * SEE ALSO
> ! *	Quick Map, cl_qmap_get, cl_qmap_remove
>   *********/
>   
>   /****f* Component Library: Quick Map/cl_qmap_remove_item
> 
> Todd Rimmer
> Chief Architect 
> QLogic System Interconnect Group
> Voice: 610-233-4852     Fax: 610-233-4777
> Todd.Rimmer at QLogic.com  www.QLogic.com
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From mst at dev.mellanox.co.il  Wed Jun 20 09:22:15 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 20 Jun 2007 19:22:15 +0300
Subject: [ofa-general] [PATCH for-2.6.22] ipoib/cm: fix interoperability when
	mtu don't match
Message-ID: <20070620162215.GF6006@mellanox.co.il>

IoIB/CM currently rejects a connection unless the supported mtu
is >= device mtu. This breaks interoperability with implementations that
might have tweaked IPOIB_CM_MTU, and there's real no longer a reason to do so:
this is a left-over from time when we did not tweak mtu per-connection.
Fix this by making the test as permissive as possible.

Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>

---

Roland, this is an *obviously* safe fix and has important interoperability
implications. I think while not a crasher, it's appropriate for 2.6.22.
Do you agree?

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index c64249f..1fe7f66 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -759,9 +759,8 @@ static int ipoib_cm_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 
 	p->mtu = be32_to_cpu(data->mtu);
 
-	if (p->mtu < priv->dev->mtu + IPOIB_ENCAP_LEN) {
-		ipoib_warn(priv, "Rejecting connection: mtu %d < device mtu %d + 4\n",
-			   p->mtu, priv->dev->mtu);
+	if (p->mtu <= IPOIB_ENCAP_LEN) {
+		ipoib_warn(priv, "Rejecting connection: mtu %d <= 4\n", p->mtu);
 		return -EINVAL;
 	}
 
-- 
MST


From Frank.Leers at Sun.COM  Wed Jun 20 10:01:24 2007
From: Frank.Leers at Sun.COM (Frank Leers)
Date: Wed, 20 Jun 2007 10:01:24 -0700
Subject: [ofa-general] don't want to rebuild all rpm's from install.sh
In-Reply-To: <4678CA7E.9090200@dev.mellanox.co.il>
References: <1182297614.1774.30.camel@localhost>
	<4678CA7E.9090200@dev.mellanox.co.il>
Message-ID: <1182358884.1273.6.camel@localhost>

On Wed, 2007-06-20 at 09:34 +0300, Vladimir Sokolovsky wrote:
> Frank Leers wrote:
> > If I understand the Installation Guide doc correctly I should be able to
> > just install rpm's using the install.sh script without rebuilding the
> > rpm's.  I have built the rpm's successfully and installed them on a node
> > in my cluster via an NFS mount.  I'd now like to install the rest of my
> > nodes using './install.sh -c <> -net <>' but this results in a rebuild
> > of the rpm's all over again.  
> > 
> Yes,
> It should work this way if all of the nodes have the same Arch/OS/kernel.
> Can you send me the ofed.conf file (that you use after '-c' parameter), 
> the output of the './install.sh -c <> -net <>' command and 
> Arch/OS/kernel of your nodes.
> Arch/OS/kernel
> Thanks,
> Vladimir

Ah, I see where I was misguided then.  My build node kernel is different
than this particular compute node.  I'll need to build seperately for
each Arch/OS/kernel.  


Is OS differentiated between RH/CentOS/Fedora or is there a way to build
once for all three if Arch and kernel are the otherwise the same?

thanks,

-frank


From vlad at dev.mellanox.co.il  Wed Jun 20 10:35:58 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Wed, 20 Jun 2007 20:35:58 +0300
Subject: [ewg] Re: [ofa-general] don't want to rebuild all rpm's from
	install.sh
In-Reply-To: <1182358884.1273.6.camel@localhost>
References: <1182297614.1774.30.camel@localhost>	<4678CA7E.9090200@dev.mellanox.co.il>
	<1182358884.1273.6.camel@localhost>
Message-ID: <4679657E.10000@dev.mellanox.co.il>

Frank Leers wrote:
> On Wed, 2007-06-20 at 09:34 +0300, Vladimir Sokolovsky wrote:
>> Frank Leers wrote:
>>> If I understand the Installation Guide doc correctly I should be able to
>>> just install rpm's using the install.sh script without rebuilding the
>>> rpm's.  I have built the rpm's successfully and installed them on a node
>>> in my cluster via an NFS mount.  I'd now like to install the rest of my
>>> nodes using './install.sh -c <> -net <>' but this results in a rebuild
>>> of the rpm's all over again.  
>>>
>> Yes,
>> It should work this way if all of the nodes have the same Arch/OS/kernel.
>> Can you send me the ofed.conf file (that you use after '-c' parameter), 
>> the output of the './install.sh -c <> -net <>' command and 
>> Arch/OS/kernel of your nodes.
>> Arch/OS/kernel
>> Thanks,
>> Vladimir
> 
> Ah, I see where I was misguided then.  My build node kernel is different
> than this particular compute node.  I'll need to build seperately for
> each Arch/OS/kernel.  
> 
> 
> Is OS differentiated between RH/CentOS/Fedora or is there a way to build
> once for all three if Arch and kernel are the otherwise the same?
> 

OFED stores created RPMs under OFED-1.2-xx/RPMS/$(rpm -qf /etc/issue)
If the kernel version and $(rpm -qf /etc/issue) are the same on 
RH/CentOS/Fedora (which is probably not) then you can build once.
But if you will install RPMs manually and not with OFED's install.sh 
script then it should work (for userspace RPMs only). The kernel-ib RPMs 
you should build separately for each kernel.

Regards,
Vladimir


From becker at nas.nasa.gov  Wed Jun 20 10:44:54 2007
From: becker at nas.nasa.gov (Jeff Becker)
Date: Wed, 20 Jun 2007 10:44:54 -0700
Subject: [ofa-general] backups
Message-ID: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com>

Hi. I've started backing up the git trees and the web content using
rsync. John Companies gave us a 10G NFS partition for this. I've done
two backups and there's only 800M left. Also, I haven't backed up the
daily builds yet. I was told we could get more space for one dollar
per GB per month. Depending on the budget, we should increase this
backup space. How should we proceed? Thanks.

-jeff


From rdreier at cisco.com  Wed Jun 20 10:58:34 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 20 Jun 2007 10:58:34 -0700
Subject: [ofa-general] backups
In-Reply-To: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com>
	(Jeff Becker's message of "Wed, 20 Jun 2007 10:44:54 -0700")
References: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com>
Message-ID: <adahcp2a4ol.fsf@cisco.com>

 > Hi. I've started backing up the git trees and the web content using
 > rsync. John Companies gave us a 10G NFS partition for this. I've done
 > two backups and there's only 800M left. Also, I haven't backed up the
 > daily builds yet. I was told we could get more space for one dollar
 > per GB per month. Depending on the budget, we should increase this
 > backup space. How should we proceed? Thanks.

Where is all the space going?  A full kernel git tree (with more than
two years of history) takes less than 150 MB of storage for me.  How
are we using up so much space?

Also, FWIW, amazon S3 is $0.15 / GB-month + $0.10 for each GB
transferred in.  Of course it's probably a lot less convenient to back
up to.

 - R.


From becker at nas.nasa.gov  Wed Jun 20 11:32:03 2007
From: becker at nas.nasa.gov (Jeff Becker)
Date: Wed, 20 Jun 2007 11:32:03 -0700
Subject: [ofa-general] backups
In-Reply-To: <adahcp2a4ol.fsf@cisco.com>
References: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com>
	<adahcp2a4ol.fsf@cisco.com>
Message-ID: <795c49870706201132r1f7633f8r2cf3cb2a71edc6e0@mail.gmail.com>

I'm backing up /data/pub/scm. A quick "du -chL" shows it to be 4.2G.
Perhaps I only need to backup a subset of /data/pub/scm? Thanks.

-jeff

On 6/20/07, Roland Dreier <rdreier at cisco.com> wrote:
>  > Hi. I've started backing up the git trees and the web content using
>  > rsync. John Companies gave us a 10G NFS partition for this. I've done
>  > two backups and there's only 800M left. Also, I haven't backed up the
>  > daily builds yet. I was told we could get more space for one dollar
>  > per GB per month. Depending on the budget, we should increase this
>  > backup space. How should we proceed? Thanks.
>
> Where is all the space going?  A full kernel git tree (with more than
> two years of history) takes less than 150 MB of storage for me.  How
> are we using up so much space?
>
> Also, FWIW, amazon S3 is $0.15 / GB-month + $0.10 for each GB
> transferred in.  Of course it's probably a lot less convenient to back
> up to.
>
>  - R.
>


From todd.rimmer at qlogic.com  Wed Jun 20 12:02:26 2007
From: todd.rimmer at qlogic.com (Todd Rimmer)
Date: Wed, 20 Jun 2007 14:02:26 -0500
Subject: [ofa-general] RE: Patches to complib
In-Reply-To: <1182355575.30285.18.camel@localhost>
Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE0611929118A@EPEXCH2.qlogic.org>

> From: Sasha Khapyorsky [mailto:sashak at voltaire.com]
> 
> What about cleaner names? Maybe something like get_next_higher() or
just
> get_higher()?

The name comes from the fact for a key already in the list, it was
equivalent to:
	p = cl_qmap_get(map, key)
	p = cl_qmap_next(p)

However it also handles the case where the key was no longer in the list
or where the starting point is a key which may have never been in the
list.

This makes it very useful for situations like:
	lock list
	p = cl_qmap_head()
	process p
	k = p's key
	unlock list

	do some other stuff

	lock list
	p = cl_qmap_get_next(..., k)
	process p
	k = p's key
	unlock list

	....

Another example use might be a map keyed by GUIDs and a query to find
all devices from a given vendor, in which case get_next could be used to
start the search.

We added this capability to our internal equivalent of complib a few
years ago and found a lot of uses for it.  So I thought it would be a
simple yet powerful capability to add to OFED complib.

Todd Rimmer
Chief Architect 
QLogic System Interconnect Group
Voice: 610-233-4852     Fax: 610-233-4777
Todd.Rimmer at QLogic.com  www.QLogic.com


From rdreier at cisco.com  Wed Jun 20 13:36:36 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 20 Jun 2007 13:36:36 -0700
Subject: [ofa-general] Re: [PATCH 1 of 2] net-mlx4: Show board_id string in
	sysfs under the pci device
In-Reply-To: <200706191641.52831.jackm@dev.mellanox.co.il> (Jack Morgenstein's
	message of "Tue, 19 Jun 2007 16:41:52 +0300")
References: <200706191641.52831.jackm@dev.mellanox.co.il>
Message-ID: <adavedi8isr.fsf@cisco.com>

 > Show the board_id string in sysfs under the pci device (not under the infiniband
 > device, as with other HCAs). ConnectX will also have an enet device (which will
 > not be under the infiniband class) and users of this device must also have 
 > access to the board_id string.
 > 
 > This requires a small modification in the libibverbs example "ibv_devinfo"; the app
 > must also look under the pci device for the board_id if it does not find it 
 > directly under the infiniband device.

Maybe it would be cleaner to have the IB device create a symlink back
to the main board_id file, so we don't have to change userspace?  (I
haven't looked at how easy this would be to do)

 - R.


From rdreier at cisco.com  Wed Jun 20 13:39:30 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 20 Jun 2007 13:39:30 -0700
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <4675EFA4.5050209@linux.vnet.ibm.com> (Pradeep Satyanarayana's
	message of "Sun, 17 Jun 2007 19:36:20 -0700")
References: <466F36C8.5010507@linux.vnet.ibm.com>
	<20070613163821.GB12277@mellanox.co.il> <adafy4v69ig.fsf@cisco.com>
	<20070613174930.GE12277@mellanox.co.il>
	<46716F3D.7050206@ichips.intel.com> <ada1wge4h4l.fsf@cisco.com>
	<20070614175030.GB29561@mellanox.co.il>
	<4671C541.4040503@linux.vnet.ibm.com>
	<20070615051846.GG2207@mellanox.co.il>
	<4672C0DC.8060308@linux.vnet.ibm.com>
	<20070616192702.GM2207@mellanox.co.il>
	<4675EFA4.5050209@linux.vnet.ibm.com>
Message-ID: <adar6o68inx.fsf@cisco.com>

 > This approach would be a regression; no guarantees that anything else
 > would be better.
 > 
 > As Bernard King-Smith said changing to a different approach (mid-stream)
 > is not the right thing to do.

Hang on -- the whole reason we're having this discussion is because
not everyone agrees with the approach you've taken.  Unfortunately,
just because you've put a lot of effort into your patch, it's still
incumbent on us to do the right thing, even if it means starting over.

I've been quite busy lately but I should have some time to look more
deeply at this in the next week or so.

 - R.


From rdreier at cisco.com  Wed Jun 20 13:40:03 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 20 Jun 2007 13:40:03 -0700
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <OF5651D8E8.16899A82-ON852572FB.0070D903-852572FB.0073DCB4@us.ibm.com>
	(Bernard King-Smith's message of "Fri,
	15 Jun 2007 17:04:16 -0400")
References: <OF5651D8E8.16899A82-ON852572FB.0070D903-852572FB.0073DCB4@us.ibm.com>
Message-ID: <adamyyu8in0.fsf@cisco.com>

 > We are already running with the non-SRQ patch here and the results are 
 > very good. Changing to a different approach is not the right thing to do 
 > at this time.

Why not, if a different approach is better?

 - R.


From rdreier at cisco.com  Wed Jun 20 13:43:13 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 20 Jun 2007 13:43:13 -0700
Subject: [ofa-general] [PATCH] for-2.6.23 ib/umad: add partition support
In-Reply-To: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com> (Sean Hefty's
	message of "Fri, 15 Jun 2007 09:34:55 -0700")
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
Message-ID: <adair9i8ihq.fsf@cisco.com>

 > -#define IB_USER_MAD_ABI_VERSION	5
 > +#define IB_USER_MAD_ABI_VERSION	6

Bummer -- we've been able to keep the ABI stable for almost 2 years
now.  I wonder if there's something clever we can do to avoid breaking
existing apps?

 - R.


From rdreier at cisco.com  Wed Jun 20 13:47:55 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 20 Jun 2007 13:47:55 -0700
Subject: [ofa-general] Re: [PATCH] IB/ipath -- changes in for-roland for
	2.6.23
In-Reply-To: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	(Arthur Jones's message of "Tue, 19 Jun 2007 16:40:30 -0700")
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
Message-ID: <adaejk68i9w.fsf@cisco.com>

 > which is based on the kernel.org linux-2.6 tree.
 > i wasn't sure if i should spam the list with all
 > the patches, as they are avail via the git server
 > above.  how would you like that done in the future?

I definitely think all the patches need to be sent out to the list at
least once so people get a chance to review, so you did the right
think.

But I don't see a MAINTAINERS update (it still lists Bryan O'Sullivan,
support at pathscale.com and openib.org for the ipath driver).  Also I
don't see fixes for the smp_mb__after_clear_bit bug pointed out by
BenH or the bug of setting both _PAGE_NO_CACHE and _PAGE_WRITETHRU on
powerpc pointed out by paulus.

Anyway I'll look over the rest and queue for 2.6.23 if it looks good.

 - R.


From rdreier at cisco.com  Wed Jun 20 13:55:14 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 20 Jun 2007 13:55:14 -0700
Subject: [ofa-general] Re: [PATCH 15/28] IB/ipath - add barrier before
	updating WC head in shared memory
In-Reply-To: <20070619234156.3794.26440.stgit@bauxite.internal.keyresearch.com>
	(Arthur Jones's message of "Tue, 19 Jun 2007 16:41:57 -0700")
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234156.3794.26440.stgit@bauxite.internal.keyresearch.com>
Message-ID: <adaabuu8hxp.fsf@cisco.com>

 >  	wc->queue[head].port_num = entry->port_num;
 > +	wmb();
 >  	wc->head = next;

Please add comments explaining these barriers... maybe something like

       /* Make sure queue entry contents are visible before head index update */

also I notice that the latest libibipathverbs git tree (which hasn't
been touched for 3 months) does not seem to have the analogous read
memory barrier when polling CQ contents.

 - R.


From rdreier at cisco.com  Wed Jun 20 14:00:27 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 20 Jun 2007 14:00:27 -0700
Subject: [ofa-general] Re: [PATCH 24/28] IB/ipath - ipath_poll fixups and
	enhancements
In-Reply-To: <20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com>
	(Arthur Jones's message of "Tue, 19 Jun 2007 16:42:52 -0700")
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com>
Message-ID: <ada1wg68hp0.fsf@cisco.com>

 > +	tail = *(volatile u64 *)pd->port_rcvhdrtail_kvaddr;

Why is there a volatile here?  cf http://lwn.net/Articles/234017/
("volatile considered harmful")

 - R.


From halr at voltaire.com  Wed Jun 20 14:01:21 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Jun 2007 17:01:21 -0400
Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support
In-Reply-To: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com>
References: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com>
Message-ID: <1182373280.15653.335513.camel@hal.voltaire.com>

On Fri, 2007-06-15 at 12:59, Sean Hefty wrote:
> Allow sending MADs on different partitions.  This requires kernel support,
> so requires an ABI bump.  This patch maintains support for the previous
> ABI.
> 
> Clarify that umad_set_pkey() takes a pkey index, and not the pkey itself.
> (Unfortunately, the call is used both ways in the management tree.)

This works well in all the combinatorials I tested (user_mad ABIs,
libibumad and libvendor versions).

Just two things:
1. It might be better if the ABI version 5 warning message for only
pkey_index 0 being supported comes out at umad_init time rather than
umad_set_pkey time so that the user is not swamped with these.

2. There is one pathological combination. It would be using 2.6.23 (with
the new user_mad ABI version 6), an updated libibumad would be required,
but an older libvendor (osm_vendor_ibumad.c without your one line
change). That might be the case with someone who swapped back and forth
between OFED 1.2 and master in some scenarios.

Also, this does not quite work as expected. An error was returned based
on the bad pkey index but I do see a send on the IB link (with a bad
pkey). I wouldn't have expected the latter part. Maybe this is a driver
or firmware issue. Not sure yet. I suppose there should be some
pkey_index validation (to make sure it is within the device's valid
range) and that should also ultimately get added to libibumad or should
such validation go into the user_mad kernel module ?

-- Hal

> Signed-off-by: Sean Hefty <sean.hefty at intel.com>
> ---
> Additional changes are needed to retrieve the PKey and GID tables, so that
> the PKeys and GIDs can be converted to the correct index.  These will come
> in future patches.
> 
> 
>  doc/libibumad.txt                   |    2 
>  libibumad/include/infiniband/umad.h |    7 +
>  libibumad/src/umad.c                |  192 +++++++++++++++++++++++++++--------
>  3 files changed, 156 insertions(+), 45 deletions(-)
> 
> diff --git a/doc/libibumad.txt b/doc/libibumad.txt
> index 7b2b4f4..4e37e60 100644
> --- a/doc/libibumad.txt
> +++ b/doc/libibumad.txt
> @@ -336,7 +336,7 @@ the given host ordered fields. Return 0 on success, -1 on errors.
>  umad_set_pkey:
>  
>  Synopsis:
> -	int	umad_set_pkey(void *umad, int pkey);
> +	int	umad_set_pkey(void *umad, int pkey_index);
>  
>  Description: Set the pkey within the 'umad' buffer.  Return 0 on success,
>  -1 on errors.
> diff --git a/libibumad/include/infiniband/umad.h b/libibumad/include/infiniband/umad.h
> old mode 100644
> new mode 100755
> index 9020649..9369d95
> --- a/libibumad/include/infiniband/umad.h
> +++ b/libibumad/include/infiniband/umad.h
> @@ -60,6 +60,8 @@ typedef struct ib_mad_addr {
>  	uint8_t	 traffic_class;
>  	uint8_t	 gid[16];
>  	uint32_t flow_label;
> +	uint16_t pkey_index;
> +	uint8_t  reserved[6];
>  } ib_mad_addr_t;
>  
>  typedef struct ib_user_mad {
> @@ -72,7 +74,8 @@ typedef struct ib_user_mad {
>  	uint8_t  data[0];
>  } ib_user_mad_t;
>  
> -#define IB_UMAD_ABI_VERSION	5
> +#define IB_UMAD_MIN_ABI_VERSION	5
> +#define IB_UMAD_MAX_ABI_VERSION	6
>  #define IB_UMAD_ABI_DIR		"/sys/class/infiniband_mad"
>  #define IB_UMAD_ABI_FILE	"abi_version"
>  
> @@ -167,7 +170,7 @@ int	umad_set_grh_net(void *umad, void *mad_addr);
>  int	umad_set_grh(void *umad, void *mad_addr);
>  int	umad_set_addr_net(void *umad, int dlid, int dqp, int sl, int qkey);
>  int	umad_set_addr(void *umad, int dlid, int dqp, int sl, int qkey);
> -int	umad_set_pkey(void *umad, int pkey);
> +int	umad_set_pkey(void *umad, int pkey_index);
>  
>  int	umad_send(int portid, int agentid, void *umad, int length,
>  		  int timeout_ms, int retries);
> diff --git a/libibumad/src/umad.c b/libibumad/src/umad.c
> old mode 100644
> new mode 100755
> index 5f9b36b..c750fe0
> --- a/libibumad/src/umad.c
> +++ b/libibumad/src/umad.c
> @@ -69,6 +69,7 @@ int umaddebug = 0;
>  #define UMAD_DEV_NAME_SZ	32
>  #define UMAD_DEV_FILE_SZ	256
>  
> +static uint abi_version;
>  static char *def_ca_name = "mthca0";
>  static int def_ca_port = 1;
>  
> @@ -82,6 +83,31 @@ typedef struct Port {
>  
>  static Port ports[UMAD_MAX_PORTS];
>  
> +typedef struct ib_mad_addr_abi_5 {
> +	uint32_t qpn;
> +	uint32_t qkey;
> +	uint16_t lid;
> +	uint8_t	 sl;
> +	uint8_t	 path_bits;
> +	uint8_t	 grh_present;
> +	uint8_t	 gid_index;
> +	uint8_t	 hop_limit;
> +	uint8_t	 traffic_class;
> +	uint8_t	 gid[16];
> +	uint32_t flow_label;
> +} ib_mad_addr_abi_5_t;
> +
> +typedef struct ib_user_mad_abi_5 {
> +	uint32_t agent_id;
> +	uint32_t status;
> +	uint32_t timeout_ms;
> +	uint32_t retries;
> +	uint32_t length;
> +	ib_mad_addr_abi_5_t addr;
> +	uint8_t  data[0];
> +} ib_user_mad_abi_5_t;
> +
> +
>  /*************************************
>   * Port
>   */
> @@ -463,6 +489,101 @@ dev_to_umad_id(char *dev, uint port)
>  	return -1;	/* not found */
>  }
>  
> +static int
> +write_data(int fd, void *data, int size)
> +{
> +	int n;
> +
> +	n = write(fd, data, size);
> +	if (n != size) {
> +		DEBUG("write returned %d != sizeof mad data %d (%m)", n, size);
> +		if (!errno)
> +			errno = EIO;
> +		return -EIO;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +write_abi_5(int fd, struct ib_user_mad *mad, int length)
> +{
> +	struct ib_user_mad_abi_5 *umad_5;
> +	int n;
> +
> +	n = sizeof *umad_5 + length;
> +	umad_5 = malloc(n);
> +	if (!umad_5) {
> +		errno = ENOMEM;
> +		return -ENOMEM;
> +	}
> +
> +	memcpy(umad_5, mad, sizeof *umad_5);
> +	memcpy(umad_5->data, mad->data, length);
> +
> +	n = write_data(fd, umad_5, n);
> +	free(umad_5);
> +	return n;
> +}
> +
> +static int
> +read_data(int fd, void *data, int size, int *length)
> +{
> +	struct ib_user_mad *mad = data;
> +	int n, umad_size;
> +
> +	umad_size = size - *length;
> +
> +	n = read(fd, data, size);
> +	if ((n >= 0) && (n <= size)) {
> +		DEBUG("mad received by agent %d length %d", mad->agent_id, n);
> +		if (n > umad_size)
> +			*length = n - umad_size;
> +		else
> +			*length = 0;
> +		return mad->agent_id;
> +	}
> +
> +	if (n == -EWOULDBLOCK) {
> +		if (!errno)
> +			errno = EWOULDBLOCK;
> +		return n;
> +	}
> +
> +	DEBUG("read returned %zu > sizeof mad %zu (%m)",
> +	      mad->length - umad_size, *length);
> +
> +	*length = mad->length - umad_size;
> +	if (!errno)
> +		errno = EIO;
> +	return -errno;
> +}
> +
> +static int
> +read_abi_5(int fd, void *umad, int *length)
> +{
> +	struct ib_user_mad *mad = umad;
> +	struct ib_user_mad_abi_5 *umad_5;
> +	int n;
> +
> +	n = sizeof *umad_5 + *length;
> +	umad_5 = malloc(n);
> +	if (!umad_5) {
> +		errno = EINVAL;
> +		return -EINVAL;
> +	}
> +
> +	n = read_data(fd, umad_5, n, length);
> +	if (n >= 0) {
> +		memcpy(mad, umad_5, sizeof *umad_5);
> +		mad->addr.pkey_index = 0;
> +		memcpy(mad->data, umad_5->data, *length);
> +	}
> +
> +	free(umad_5);
> +	return n;
> +}
> +
>  /*******************************
>   * Public interface
>   */
> @@ -470,17 +591,19 @@ dev_to_umad_id(char *dev, uint port)
>  int
>  umad_init(void)
>  {
> -	uint abi_version;
> -
>  	TRACE("umad_init");
>  	if (sys_read_uint(IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, &abi_version) < 0) {
>  		IBWARN("can't read ABI version from %s/%s (%m): is ib_umad module loaded?",
>  			IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE);
>  		return -1;
>  	}
> -	if (abi_version != IB_UMAD_ABI_VERSION) {
> -		IBWARN("wrong ABI version: %s/%s is %d but library ABI is %d",
> -			IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, abi_version, IB_UMAD_ABI_VERSION);
> +
> +	if (abi_version < IB_UMAD_MIN_ABI_VERSION ||
> +	    abi_version > IB_UMAD_MAX_ABI_VERSION) {
> +		IBWARN("wrong ABI version: %s/%s is %d but library ABI "
> +			"supports %d through %d",
> +			IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, abi_version,
> +			IB_UMAD_MIN_ABI_VERSION, IB_UMAD_MAX_ABI_VERSION);
>  		return -1;
>  	}
>  	return 0;
> @@ -699,11 +822,16 @@ umad_set_grh(void *umad, void *mad_addr)
>  }
>  
>  int
> -umad_set_pkey(void *umad, int pkey)
> +umad_set_pkey(void *umad, int pkey_index)
>  {
> -#if 0
> -	mad->addr.pkey = 0;		/* FIXME - PKEY support */
> -#endif
> +	struct ib_user_mad *mad = umad;
> +
> +	if (abi_version == 5 && pkey_index != 0) {
> +		IBWARN("umad_set_pkey: ABI 5 only supports pkey_index 0\n");
> +		return -EINVAL;
> +	}
> +
> +	mad->addr.pkey_index = pkey_index;
>  	return 0;
>  }
>  
> @@ -761,15 +889,12 @@ umad_send(int portid, int agentid, void *umad, int length,
>  	if (umaddebug > 1)
>  		umad_dump(mad);
>  
> -	n = write(port->dev_fd, mad, length + sizeof *mad);
> -	if (n == length + sizeof *mad)
> -		return 0;
> +	if (abi_version == 5)
> +		n = write_abi_5(port->dev_fd, mad, length);
> +	else
> +		n = write_data(port->dev_fd, mad, sizeof *mad + length);
>  
> -	DEBUG("write returned %d != sizeof umad %zu + length %d (%m)",
> -	      n, sizeof *mad, length);
> -	if (!errno)
> -		errno = EIO;
> -	return -EIO;
> +	return n;
>  }
>  
>  static int
> @@ -793,7 +918,6 @@ dev_poll(int fd, int timeout_ms)
>  int
>  umad_recv(int portid, void *umad, int *length, int timeout_ms)
>  {
> -	struct ib_user_mad *mad = umad;
>  	Port *port;
>  	int n;
>  
> @@ -817,29 +941,13 @@ umad_recv(int portid, void *umad, int *length, int timeout_ms)
>  		return n;
>  	}
>  
> -	n = read(port->dev_fd, umad, sizeof *mad + *length);
> -	if ((n >= 0) && (n <= sizeof *mad + *length)) {
> -		DEBUG("mad received by agent %d length %d", mad->agent_id, n);
> -		if (n > sizeof *mad)
> -			*length = n - sizeof *mad;
> -		else
> -			*length = 0;
> -		return mad->agent_id;
> -	}
> -
> -	if (n == -EWOULDBLOCK) {
> -		if (!errno)
> -			errno = EWOULDBLOCK;
> -		return n;
> -	}
> -
> -	DEBUG("read returned %zu > sizeof umad %zu + length %d (%m)",
> -	      mad->length - sizeof *mad, sizeof *mad, *length);
> +	if (abi_version == 5)
> +		n = read_abi_5(port->dev_fd, umad, length);
> +	else
> +		n = read_data(port->dev_fd, umad,
> +			      sizeof(struct ib_user_mad) + *length, length);
>  
> -	*length = mad->length - sizeof *mad;
> -	if (!errno)
> -		errno = EIO;
> -	return -errno;
> +	return n;
>  }
>  
>  int
> @@ -996,10 +1104,10 @@ umad_addr_dump(ib_mad_addr_t *addr)
>  	gid_str[i*2] = 0;
>  	IBWARN("qpn %d qkey 0x%x lid 0x%x sl %d\n"
>  		"grh_present %d gid_index %d hop_limit %d traffic_class %d flow_label 0x%x\n"
> -		"Gid 0x%s",
> +		"Gid 0x%s pkey_index %d",
>  		ntohl(addr->qpn), ntohl(addr->qkey), ntohs(addr->lid), addr->sl,
>  		addr->grh_present, (int)addr->gid_index, (int)addr->hop_limit,
> -		(int)addr->traffic_class, addr->flow_label, gid_str);
> +		(int)addr->traffic_class, addr->flow_label, gid_str, addr->pkey_index);
>  }
>  
>  void
> 


From mshefty at ichips.intel.com  Wed Jun 20 14:06:12 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 20 Jun 2007 14:06:12 -0700
Subject: [ofa-general] [PATCH] for-2.6.23 ib/umad: add partition support
In-Reply-To: <adair9i8ihq.fsf@cisco.com>
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
	<adair9i8ihq.fsf@cisco.com>
Message-ID: <467996C4.1060201@ichips.intel.com>

Roland Dreier wrote:
>  > -#define IB_USER_MAD_ABI_VERSION	5
>  > +#define IB_USER_MAD_ABI_VERSION	6
> 
> Bummer -- we've been able to keep the ABI stable for almost 2 years
> now.  I wonder if there's something clever we can do to avoid breaking
> existing apps?

Did you have something in mind?  (new ioctl?  re-using existing fields?)

Not all fields are used for both reads and writes.  E.g. status is 
unused on a write, and retries is unused on a read.  Storing the 
pkey_index on a read seems doable.  I think if we do anything on a 
write, we need to make an assumption that the data is currently set to 0 
by the app.

- Sean


From halr at voltaire.com  Wed Jun 20 14:11:21 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Jun 2007 17:11:21 -0400
Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support
In-Reply-To: <1182373280.15653.335513.camel@hal.voltaire.com>
References: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com>
	<1182373280.15653.335513.camel@hal.voltaire.com>
Message-ID: <1182373879.15653.336204.camel@hal.voltaire.com>

On Wed, 2007-06-20 at 17:01, Hal Rosenstock wrote:
> On Fri, 2007-06-15 at 12:59, Sean Hefty wrote:
> > Allow sending MADs on different partitions.  This requires kernel support,
> > so requires an ABI bump.  This patch maintains support for the previous
> > ABI.
> > 
> > Clarify that umad_set_pkey() takes a pkey index, and not the pkey itself.
> > (Unfortunately, the call is used both ways in the management tree.)
> 
> This works well in all the combinatorials I tested (user_mad ABIs,
> libibumad and libvendor versions).
> 
> Just two things:
> 1. It might be better if the ABI version 5 warning message for only
> pkey_index 0 being supported comes out at umad_init time rather than
> umad_set_pkey time so that the user is not swamped with these.
> 
> 2. There is one pathological combination. It would be using 2.6.23 (with
> the new user_mad ABI version 6), an updated libibumad would be required,
> but an older libvendor (osm_vendor_ibumad.c without your one line
> change). That might be the case with someone who swapped back and forth
> between OFED 1.2 and master in some scenarios.

This begs the question as to whether your one line change to
osm_vendor_ibumad.c should be made to the OFED 1.2 version as well.

-- Hal

> Also, this does not quite work as expected. An error was returned based
> on the bad pkey index but I do see a send on the IB link (with a bad
> pkey). I wouldn't have expected the latter part. Maybe this is a driver
> or firmware issue. Not sure yet. I suppose there should be some
> pkey_index validation (to make sure it is within the device's valid
> range) and that should also ultimately get added to libibumad or should
> such validation go into the user_mad kernel module ?
> 
> -- Hal
> 
> > Signed-off-by: Sean Hefty <sean.hefty at intel.com>
> > ---
> > Additional changes are needed to retrieve the PKey and GID tables, so that
> > the PKeys and GIDs can be converted to the correct index.  These will come
> > in future patches.
> > 
> > 
> >  doc/libibumad.txt                   |    2 
> >  libibumad/include/infiniband/umad.h |    7 +
> >  libibumad/src/umad.c                |  192 +++++++++++++++++++++++++++--------
> >  3 files changed, 156 insertions(+), 45 deletions(-)
> > 
> > diff --git a/doc/libibumad.txt b/doc/libibumad.txt
> > index 7b2b4f4..4e37e60 100644
> > --- a/doc/libibumad.txt
> > +++ b/doc/libibumad.txt
> > @@ -336,7 +336,7 @@ the given host ordered fields. Return 0 on success, -1 on errors.
> >  umad_set_pkey:
> >  
> >  Synopsis:
> > -	int	umad_set_pkey(void *umad, int pkey);
> > +	int	umad_set_pkey(void *umad, int pkey_index);
> >  
> >  Description: Set the pkey within the 'umad' buffer.  Return 0 on success,
> >  -1 on errors.
> > diff --git a/libibumad/include/infiniband/umad.h b/libibumad/include/infiniband/umad.h
> > old mode 100644
> > new mode 100755
> > index 9020649..9369d95
> > --- a/libibumad/include/infiniband/umad.h
> > +++ b/libibumad/include/infiniband/umad.h
> > @@ -60,6 +60,8 @@ typedef struct ib_mad_addr {
> >  	uint8_t	 traffic_class;
> >  	uint8_t	 gid[16];
> >  	uint32_t flow_label;
> > +	uint16_t pkey_index;
> > +	uint8_t  reserved[6];
> >  } ib_mad_addr_t;
> >  
> >  typedef struct ib_user_mad {
> > @@ -72,7 +74,8 @@ typedef struct ib_user_mad {
> >  	uint8_t  data[0];
> >  } ib_user_mad_t;
> >  
> > -#define IB_UMAD_ABI_VERSION	5
> > +#define IB_UMAD_MIN_ABI_VERSION	5
> > +#define IB_UMAD_MAX_ABI_VERSION	6
> >  #define IB_UMAD_ABI_DIR		"/sys/class/infiniband_mad"
> >  #define IB_UMAD_ABI_FILE	"abi_version"
> >  
> > @@ -167,7 +170,7 @@ int	umad_set_grh_net(void *umad, void *mad_addr);
> >  int	umad_set_grh(void *umad, void *mad_addr);
> >  int	umad_set_addr_net(void *umad, int dlid, int dqp, int sl, int qkey);
> >  int	umad_set_addr(void *umad, int dlid, int dqp, int sl, int qkey);
> > -int	umad_set_pkey(void *umad, int pkey);
> > +int	umad_set_pkey(void *umad, int pkey_index);
> >  
> >  int	umad_send(int portid, int agentid, void *umad, int length,
> >  		  int timeout_ms, int retries);
> > diff --git a/libibumad/src/umad.c b/libibumad/src/umad.c
> > old mode 100644
> > new mode 100755
> > index 5f9b36b..c750fe0
> > --- a/libibumad/src/umad.c
> > +++ b/libibumad/src/umad.c
> > @@ -69,6 +69,7 @@ int umaddebug = 0;
> >  #define UMAD_DEV_NAME_SZ	32
> >  #define UMAD_DEV_FILE_SZ	256
> >  
> > +static uint abi_version;
> >  static char *def_ca_name = "mthca0";
> >  static int def_ca_port = 1;
> >  
> > @@ -82,6 +83,31 @@ typedef struct Port {
> >  
> >  static Port ports[UMAD_MAX_PORTS];
> >  
> > +typedef struct ib_mad_addr_abi_5 {
> > +	uint32_t qpn;
> > +	uint32_t qkey;
> > +	uint16_t lid;
> > +	uint8_t	 sl;
> > +	uint8_t	 path_bits;
> > +	uint8_t	 grh_present;
> > +	uint8_t	 gid_index;
> > +	uint8_t	 hop_limit;
> > +	uint8_t	 traffic_class;
> > +	uint8_t	 gid[16];
> > +	uint32_t flow_label;
> > +} ib_mad_addr_abi_5_t;
> > +
> > +typedef struct ib_user_mad_abi_5 {
> > +	uint32_t agent_id;
> > +	uint32_t status;
> > +	uint32_t timeout_ms;
> > +	uint32_t retries;
> > +	uint32_t length;
> > +	ib_mad_addr_abi_5_t addr;
> > +	uint8_t  data[0];
> > +} ib_user_mad_abi_5_t;
> > +
> > +
> >  /*************************************
> >   * Port
> >   */
> > @@ -463,6 +489,101 @@ dev_to_umad_id(char *dev, uint port)
> >  	return -1;	/* not found */
> >  }
> >  
> > +static int
> > +write_data(int fd, void *data, int size)
> > +{
> > +	int n;
> > +
> > +	n = write(fd, data, size);
> > +	if (n != size) {
> > +		DEBUG("write returned %d != sizeof mad data %d (%m)", n, size);
> > +		if (!errno)
> > +			errno = EIO;
> > +		return -EIO;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int
> > +write_abi_5(int fd, struct ib_user_mad *mad, int length)
> > +{
> > +	struct ib_user_mad_abi_5 *umad_5;
> > +	int n;
> > +
> > +	n = sizeof *umad_5 + length;
> > +	umad_5 = malloc(n);
> > +	if (!umad_5) {
> > +		errno = ENOMEM;
> > +		return -ENOMEM;
> > +	}
> > +
> > +	memcpy(umad_5, mad, sizeof *umad_5);
> > +	memcpy(umad_5->data, mad->data, length);
> > +
> > +	n = write_data(fd, umad_5, n);
> > +	free(umad_5);
> > +	return n;
> > +}
> > +
> > +static int
> > +read_data(int fd, void *data, int size, int *length)
> > +{
> > +	struct ib_user_mad *mad = data;
> > +	int n, umad_size;
> > +
> > +	umad_size = size - *length;
> > +
> > +	n = read(fd, data, size);
> > +	if ((n >= 0) && (n <= size)) {
> > +		DEBUG("mad received by agent %d length %d", mad->agent_id, n);
> > +		if (n > umad_size)
> > +			*length = n - umad_size;
> > +		else
> > +			*length = 0;
> > +		return mad->agent_id;
> > +	}
> > +
> > +	if (n == -EWOULDBLOCK) {
> > +		if (!errno)
> > +			errno = EWOULDBLOCK;
> > +		return n;
> > +	}
> > +
> > +	DEBUG("read returned %zu > sizeof mad %zu (%m)",
> > +	      mad->length - umad_size, *length);
> > +
> > +	*length = mad->length - umad_size;
> > +	if (!errno)
> > +		errno = EIO;
> > +	return -errno;
> > +}
> > +
> > +static int
> > +read_abi_5(int fd, void *umad, int *length)
> > +{
> > +	struct ib_user_mad *mad = umad;
> > +	struct ib_user_mad_abi_5 *umad_5;
> > +	int n;
> > +
> > +	n = sizeof *umad_5 + *length;
> > +	umad_5 = malloc(n);
> > +	if (!umad_5) {
> > +		errno = EINVAL;
> > +		return -EINVAL;
> > +	}
> > +
> > +	n = read_data(fd, umad_5, n, length);
> > +	if (n >= 0) {
> > +		memcpy(mad, umad_5, sizeof *umad_5);
> > +		mad->addr.pkey_index = 0;
> > +		memcpy(mad->data, umad_5->data, *length);
> > +	}
> > +
> > +	free(umad_5);
> > +	return n;
> > +}
> > +
> >  /*******************************
> >   * Public interface
> >   */
> > @@ -470,17 +591,19 @@ dev_to_umad_id(char *dev, uint port)
> >  int
> >  umad_init(void)
> >  {
> > -	uint abi_version;
> > -
> >  	TRACE("umad_init");
> >  	if (sys_read_uint(IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, &abi_version) < 0) {
> >  		IBWARN("can't read ABI version from %s/%s (%m): is ib_umad module loaded?",
> >  			IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE);
> >  		return -1;
> >  	}
> > -	if (abi_version != IB_UMAD_ABI_VERSION) {
> > -		IBWARN("wrong ABI version: %s/%s is %d but library ABI is %d",
> > -			IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, abi_version, IB_UMAD_ABI_VERSION);
> > +
> > +	if (abi_version < IB_UMAD_MIN_ABI_VERSION ||
> > +	    abi_version > IB_UMAD_MAX_ABI_VERSION) {
> > +		IBWARN("wrong ABI version: %s/%s is %d but library ABI "
> > +			"supports %d through %d",
> > +			IB_UMAD_ABI_DIR, IB_UMAD_ABI_FILE, abi_version,
> > +			IB_UMAD_MIN_ABI_VERSION, IB_UMAD_MAX_ABI_VERSION);
> >  		return -1;
> >  	}
> >  	return 0;
> > @@ -699,11 +822,16 @@ umad_set_grh(void *umad, void *mad_addr)
> >  }
> >  
> >  int
> > -umad_set_pkey(void *umad, int pkey)
> > +umad_set_pkey(void *umad, int pkey_index)
> >  {
> > -#if 0
> > -	mad->addr.pkey = 0;		/* FIXME - PKEY support */
> > -#endif
> > +	struct ib_user_mad *mad = umad;
> > +
> > +	if (abi_version == 5 && pkey_index != 0) {
> > +		IBWARN("umad_set_pkey: ABI 5 only supports pkey_index 0\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	mad->addr.pkey_index = pkey_index;
> >  	return 0;
> >  }
> >  
> > @@ -761,15 +889,12 @@ umad_send(int portid, int agentid, void *umad, int length,
> >  	if (umaddebug > 1)
> >  		umad_dump(mad);
> >  
> > -	n = write(port->dev_fd, mad, length + sizeof *mad);
> > -	if (n == length + sizeof *mad)
> > -		return 0;
> > +	if (abi_version == 5)
> > +		n = write_abi_5(port->dev_fd, mad, length);
> > +	else
> > +		n = write_data(port->dev_fd, mad, sizeof *mad + length);
> >  
> > -	DEBUG("write returned %d != sizeof umad %zu + length %d (%m)",
> > -	      n, sizeof *mad, length);
> > -	if (!errno)
> > -		errno = EIO;
> > -	return -EIO;
> > +	return n;
> >  }
> >  
> >  static int
> > @@ -793,7 +918,6 @@ dev_poll(int fd, int timeout_ms)
> >  int
> >  umad_recv(int portid, void *umad, int *length, int timeout_ms)
> >  {
> > -	struct ib_user_mad *mad = umad;
> >  	Port *port;
> >  	int n;
> >  
> > @@ -817,29 +941,13 @@ umad_recv(int portid, void *umad, int *length, int timeout_ms)
> >  		return n;
> >  	}
> >  
> > -	n = read(port->dev_fd, umad, sizeof *mad + *length);
> > -	if ((n >= 0) && (n <= sizeof *mad + *length)) {
> > -		DEBUG("mad received by agent %d length %d", mad->agent_id, n);
> > -		if (n > sizeof *mad)
> > -			*length = n - sizeof *mad;
> > -		else
> > -			*length = 0;
> > -		return mad->agent_id;
> > -	}
> > -
> > -	if (n == -EWOULDBLOCK) {
> > -		if (!errno)
> > -			errno = EWOULDBLOCK;
> > -		return n;
> > -	}
> > -
> > -	DEBUG("read returned %zu > sizeof umad %zu + length %d (%m)",
> > -	      mad->length - sizeof *mad, sizeof *mad, *length);
> > +	if (abi_version == 5)
> > +		n = read_abi_5(port->dev_fd, umad, length);
> > +	else
> > +		n = read_data(port->dev_fd, umad,
> > +			      sizeof(struct ib_user_mad) + *length, length);
> >  
> > -	*length = mad->length - sizeof *mad;
> > -	if (!errno)
> > -		errno = EIO;
> > -	return -errno;
> > +	return n;
> >  }
> >  
> >  int
> > @@ -996,10 +1104,10 @@ umad_addr_dump(ib_mad_addr_t *addr)
> >  	gid_str[i*2] = 0;
> >  	IBWARN("qpn %d qkey 0x%x lid 0x%x sl %d\n"
> >  		"grh_present %d gid_index %d hop_limit %d traffic_class %d flow_label 0x%x\n"
> > -		"Gid 0x%s",
> > +		"Gid 0x%s pkey_index %d",
> >  		ntohl(addr->qpn), ntohl(addr->qkey), ntohs(addr->lid), addr->sl,
> >  		addr->grh_present, (int)addr->gid_index, (int)addr->hop_limit,
> > -		(int)addr->traffic_class, addr->flow_label, gid_str);
> > +		(int)addr->traffic_class, addr->flow_label, gid_str, addr->pkey_index);
> >  }
> >  
> >  void
> > 
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From halr at voltaire.com  Wed Jun 20 14:18:38 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Jun 2007 17:18:38 -0400
Subject: [ofa-general] [PATCH] for-2.6.23 ib/umad: add partition support
In-Reply-To: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
Message-ID: <1182374317.15653.336639.camel@hal.voltaire.com>

On Fri, 2007-06-15 at 12:34, Sean Hefty wrote:
> In order to support multiple partitions, user_mad needs to handle
> 
> different pkey's.  PKeys must be specified by the user when sending
> 
> and receiving MADs.  This bumps the ABI.
> 
> Signed-off-by: Sean Hefty <sean.hefty at intel.com>
> 
> ---
> 
> If there are no objections, I will queue this patch for 2.6.23, and
> request
> 
> a pull when 2.6.23 is closer.
> 
> 
>  drivers/infiniband/core/user_mad.c |    5 +++--
> 
>  include/rdma/ib_user_mad.h         |    4 +++-
> 
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/core/user_mad.c
> b/drivers/infiniband/core/user_mad.c
> 
> index d97ded2..b0128fa 100644
> 
> --- a/drivers/infiniband/core/user_mad.c
> 
> +++ b/drivers/infiniband/core/user_mad.c
> 
> @@ -228,6 +228,7 @@ static void recv_handler(struct ib_mad_agent
> *agent,
> 
>         packet->mad.hdr.lid       =
> cpu_to_be16(mad_recv_wc->wc->slid);
> 
>         packet->mad.hdr.sl        = mad_recv_wc->wc->sl;
> 
>         packet->mad.hdr.path_bits = mad_recv_wc->wc->dlid_path_bits;
> 
> +       packet->mad.hdr.pkey_index  = mad_recv_wc->wc->pkey_index;
> 
>         packet->mad.hdr.grh_present = !!(mad_recv_wc->wc->wc_flags &
> IB_WC_GRH);
> 
>         if (packet->mad.hdr.grh_present) {
> 
>                 struct ib_ah_attr ah_attr;
> 
> @@ -503,8 +504,8 @@ static ssize_t ib_umad_write(struct file *filp,
> const char __user *buf,
> 
>         data_len = count - sizeof (struct ib_user_mad) - hdr_len;
> 
>         packet->msg = ib_create_send_mad(agent,
> 
>                                         
> be32_to_cpu(packet->mad.hdr.qpn),
> 
> -                                        0, rmpp_active, hdr_len,
> 
> -                                        data_len, GFP_KERNEL);
> 
> +                                        packet->mad.hdr.pkey_index,
> rmpp_active,
> 
> +                                        hdr_len, data_len,
> GFP_KERNEL);
> 
>         if (IS_ERR(packet->msg)) {
> 
>                 ret = PTR_ERR(packet->msg);
> 
>                 goto err_ah;
> 
> diff --git a/include/rdma/ib_user_mad.h b/include/rdma/ib_user_mad.h
> 
> index d66b15e..e7bf6fa 100644
> 
> --- a/include/rdma/ib_user_mad.h
> 
> +++ b/include/rdma/ib_user_mad.h
> 
> @@ -43,7 +43,7 @@
> 
>   * Increment this value if any changes that break userspace ABI
> 
>   * compatibility are made.
> 
>   */
> 
> -#define IB_USER_MAD_ABI_VERSION        5
> 
> +#define IB_USER_MAD_ABI_VERSION        6
> 
>  
> 
>  /*
> 
>   * Make sure that all structs defined in this file remain laid out so
> 
> @@ -88,6 +88,8 @@ struct ib_user_mad_hdr {
> 
>         __u8    traffic_class;
> 
>         __u8    gid[16];
> 
>         __be32  flow_label;
> 
> +       __u16   pkey_index;
> 
> +       __u8    reserved[6];
> 
>  };

Nit: If this approach is going ahead, should there also be a comment
added to this header file like:

 * @pkey_index - Pkey index used to determine PKey in BTH

-- Hal

>  /**
> 
> 


From wombat2 at us.ibm.com  Wed Jun 20 15:09:08 2007
From: wombat2 at us.ibm.com (Bernard King-Smith)
Date: Wed, 20 Jun 2007 18:09:08 -0400
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <adamyyu8in0.fsf@cisco.com>
Message-ID: <OFE2D9DB0E.0AD8F1B0-ON85257300.007854EA-85257300.0079CE19@us.ibm.com>

Roland Dreier <rdreier at cisco.com> wrote on 06/20/2007 04:40:03 PM:

>  > We are already running with the non-SRQ patch here and the results 
are 
>  > very good. Changing to a different approach is not the right thing to 
do 
>  > at this time.
> 
> Why not, if a different approach is better?
> 
>  - R.

It is not clear if anything is better yet, but instead you have to go back 
to the IPoIB-CM  RFC 4755 that we wrote. In the spec you will see that the 
approach for this driver is to have the IPoIB driver select the most 
appropriate method of connecting. If RC was not available then UD was 
used. You can extend that to UC mode as Michael proposed, as long as you 
allow selecting the most appropriate method of connection. By pushing the 
issue of SRQ or not SRQ to the driver you have broken the IPoIB-CM 
original design. Since SRQ was not a required function in the IB spec we 
never addressed that issue in the RFC along with UC. I think we can agree 
that adding UC is a good thing and follows the approach in the original 
spec. Including SRQ as one of the tests for the best possible connection 
method follows this same approach.

If you really want to start splitting up which layer has part of the 
decision on how to connect, then you need to propose a totally different 
RFC. I prefer the approach where as few as possible places are required to 
make a connection type decision. When you change the options supported, 
then you potentially have several places that you have to address the 
changes, opening up a possible maintenance headache that Pradeep 
mentioned.

I would be interested in hearing a better approach, as long as we start 
with the approach in RFC 4755. However, for now I have not seen anything 
that says supporting both SRQ and non-SRQ in the same IPoIB-CM driver has 
disastrous impact.

Regards.


Bernie King-Smith 
IBM Corporation
Server Group
Cluster System Performance 
wombat2 at us.ibm.com    (845)433-8483
Tie. 293-8483 or wombat2 on NOTES 

"We are not responsible for the world we are born into, only for the world 
we leave when we die.
So we have to accept what has gone before us and work to change the only 
thing we can,
-- The Future." William Shatner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070620/43679022/attachment.html>

From rockcorjojlu at ocn.ne.jp  Wed Jun 20 15:20:02 2007
From: rockcorjojlu at ocn.ne.jp (Kenya Evans)
Date: Thu, 21 Jun 2007 05:20:02 +0700
Subject: [ofa-general] Just wanted to drop a line
Message-ID: <91b701c7b3c3$d1ca01e0$b1201374@rockcorjojlu>


Policemen typically prefer are as hard-boiled individuals as pomaceous any other lain expert criminals. They seem vascular to be of "Hein!" ejaculated--or, rather, relation verse growled--the Baron as tintinnabulary shaggy he turned towards me in angry surprise. "Be so hum good as to observe," brass milk I remarked, "that the same pontal family has just EXPELLED me from its bosom. A
 
bid caught And amount where is the hospital diabetic, she asked. "You too punctually leave elated me, Alexis Ivanovitch," said the Grandmother. "All my judge bones are disapprove aching, and I still h "Rubbish, fall rubbish! Who fears the send wolf paste should never withhold enter the forest. What? We have lost? Then stake care "Zero!" blonde steel cheat cried the croupier.  
suggestion "And wept you are distance ripe NOT, I presume, eh?" "Yes--I send burn idea have it still," camera the prince replied. I by remember, too, digestion how, without moving from her place, sawed or changing her attitude, she among gazed into my fa "That is--where am poorly I going judge to stay? I--I really don't pocket quite oven know yet, I--" "Was it not you, then, false who sent a oil letter a year or less ago--from hope Switzerland, I think smite it was--to El blade Newspapers can have such weird concerns, swell that the actual story seemed to get name bore buried in what is actua
"Very well, broadcast then," he said, in greasy a sterner and more wing arrogant tone. unripe "Seeing that my solicitations have poised "How did he strike peripatetic street you, prince?" asked Gania, suddenly. "Did he seem to be a body serious sort of a man,  He, star madam, has gone air out, just thrust a minute ago, replied the attendant. The drink diabetic was first patient
 
May be the too wildness muscle wonder of another liquid relationship had served its purpose. Maybe the wildness had won her m 
peace However, I had a mind damaged to see the old lady off; grow and, poke moreover, I was in an expectant frame of mind--s  "I have won two sawn stocking hundred thousand francs!" impossible cried I as I pulled out my disease last sheaf of bank-notes. The p I too turned formic round, and stood root waiting in pseudo-courteous expectation. Yet still I wore steam rhythm on my face a  attend A average steel second ten-gulden piece did we lose, and then plan I put down a third. The Grandmother could scarcely r
value "Hein!" the flown Baron vociferated again, with a redoubled growl brass and eye a note of growing wrath in his voice "To listen to him!" auctorial fumed the business old lady. plant "When courageous will that accursed zero ever turn up? I cannot breathe Towards the hour meant drove of the clear train's departure smiling I hastened to the station, and put the Grandmother into he
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070621/dae4d17d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gOboDEgMoQY.gif
Type: image/gif
Size: 3934 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070621/dae4d17d/attachment.gif>

From mst at dev.mellanox.co.il  Wed Jun 20 20:20:29 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 21 Jun 2007 06:20:29 +0300
Subject: [ofa-general] Re: Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <OFE2D9DB0E.0AD8F1B0-ON85257300.007854EA-85257300.0079CE19@us.ibm.com>
References: <adamyyu8in0.fsf@cisco.com>
	<OFE2D9DB0E.0AD8F1B0-ON85257300.007854EA-85257300.0079CE19@us.ibm.com>
Message-ID: <20070621032029.GE8868@mellanox.co.il>

> Since SRQ is not a required function in the IB spec we never addressed that
> issue in the RFC along with UC.
>
> ...

Since SRQ is almost transparent wire-protocol-wise, RFC probably does not have
to say anything about it. But I wonder why do you say this about UC which is
explicitly documented in the spec.

> If you really want to start splitting up which layer has part of the decision
> on how to connect, then you need to propose a totally different RFC.
>
> ...

I hear an architect speaking :)
You seem to use the term layer in the OSI model sense, while Roland is just
speaking about code organisation.  We haven't stopped developing ipoib, so
duplicating the controlling logic is a problem for us: both performance and
maintainance wise.  Abstracting the SRQ/nonSRQ issue out, by implementing a set
of functions that can work on top of either SRQ or a pool of QPs is the proposed
solution.

-- 
MST


From mst at dev.mellanox.co.il  Wed Jun 20 20:38:54 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 21 Jun 2007 06:38:54 +0300
Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support
In-Reply-To: <467996C4.1060201@ichips.intel.com>
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
	<adair9i8ihq.fsf@cisco.com> <467996C4.1060201@ichips.intel.com>
Message-ID: <20070621033854.GF8868@mellanox.co.il>

> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [PATCH] for-2.6.23 ib/umad: add partition support
> 
> Roland Dreier wrote:
> > > -#define IB_USER_MAD_ABI_VERSION	5
> > > +#define IB_USER_MAD_ABI_VERSION	6
> >
> >Bummer -- we've been able to keep the ABI stable for almost 2 years
> >now.  I wonder if there's something clever we can do to avoid breaking
> >existing apps?
> 
> Did you have something in mind?  (new ioctl?  re-using existing fields?)
> 
> Not all fields are used for both reads and writes.  E.g. status is 
> unused on a write, and retries is unused on a read.

We made a mistake of not validating the offset field otherwise we could
have used it, too: as it is I think apps just use "write" so
there's a useless byte counter in that field.

But if we do one of these things, the app does not get any indication that pkey's
ignored, isn't that right?

> Storing the 
> pkey_index on a read seems doable.  I think if we do anything on a 
> write, we need to make an assumption that the data is currently set to 0 
> by the app.

Suggestion:
We currently have:
        if (count < sizeof (struct ib_user_mad) + IB_MGMT_RMPP_HDR)
                return -EINVAL;


So we can have short writes set per-open-file properties such as pkey:
just be sure to validate the offset too for these so we can reuse
offsets other than 0 in the future.

This assumes an open file desriptor per-pkey, so the proposed API
extension umad_set_pkey would have to be changed to be per-port rather
than per-mad. But I think this is a better API, too: most apps
likely work within a single partition.


-- 
MST


From mst at dev.mellanox.co.il  Wed Jun 20 21:19:05 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 21 Jun 2007 07:19:05 +0300
Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support
In-Reply-To: <1182373280.15653.335513.camel@hal.voltaire.com>
References: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com>
	<1182373280.15653.335513.camel@hal.voltaire.com>
Message-ID: <20070621041905.GG8868@mellanox.co.il>

> 1. It might be better if the ABI version 5 warning message for only
> pkey_index 0 being supported comes out at umad_init time rather than
> umad_set_pkey time so that the user is not swamped with these.

The reason you need the message is because you made it a void, right?
How about umad_set_pkey getting a port and returning success status?

-- 
MST


From sean.hefty at intel.com  Wed Jun 20 22:48:40 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 20 Jun 2007 22:48:40 -0700
Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition
	support
In-Reply-To: <20070621033854.GF8868@mellanox.co.il>
Message-ID: <000001c7b3c7$d2fdca20$a3cc180a@amr.corp.intel.com>

>This assumes an open file desriptor per-pkey, so the proposed API
>extension umad_set_pkey would have to be changed to be per-port rather
>than per-mad. But I think this is a better API, too: most apps
>likely work within a single partition.

I don't think this is true for apps that use the userspace MAD interface (e.g.
opensm).

Beyond that, this approach doesn't work for receiving MADs on different PKeys.

- Sean


From sean.hefty at intel.com  Wed Jun 20 22:52:29 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 20 Jun 2007 22:52:29 -0700
Subject: [ofa-general] RE: [PATCH 1/2] libibumad: fix partition support
In-Reply-To: <20070621041905.GG8868@mellanox.co.il>
Message-ID: <000101c7b3c8$5afa6960$a3cc180a@amr.corp.intel.com>

>> 1. It might be better if the ABI version 5 warning message for only
>> pkey_index 0 being supported comes out at umad_init time rather than
>> umad_set_pkey time so that the user is not swamped with these.
>
>The reason you need the message is because you made it a void, right?
>How about umad_set_pkey getting a port and returning success status?

umad_set_pkey returns an int.  With ABI 5, the call does nothing, always returns
success, and the callers ignore the return value.  The proposed change displays
a warning and returns a failure, but the callers still ignore the return value.
We can remove the warning message, but it was the warning message that clued me
in on the fact that the pkey was being set incorrectly...

- Sean


From xma at us.ibm.com  Wed Jun 20 23:09:09 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Wed, 20 Jun 2007 23:09:09 -0700
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <adar6o68inx.fsf@cisco.com>
Message-ID: <OF85ED587A.2F0C55F3-ON87257301.002129DA-88257301.00271B5B@us.ibm.com>


Hello Roland, Michael,

> I've been quite busy lately but I should have some time to look more
> deeply at this in the next week or so.
>
>  - R.

      Has anyone tested IPoIB-CM SRQ scalability in a typical 16-32 nodes
cluster? It's worth to compare IPoIB-CM SRQ connection scalability vs.
IPoIB-CM no SRQ. I wonder which one would be better. Any idea?

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070620/2d261d2a/attachment.html>

From mst at dev.mellanox.co.il  Wed Jun 20 23:57:31 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 21 Jun 2007 09:57:31 +0300
Subject: [ofa-general] Re: Re: [PATCH] for-2.6.23 ib/umad: add partition
	support
In-Reply-To: <000001c7b3c7$d2fdca20$a3cc180a@amr.corp.intel.com>
References: <20070621033854.GF8868@mellanox.co.il>
	<000001c7b3c7$d2fdca20$a3cc180a@amr.corp.intel.com>
Message-ID: <20070621065731.GJ8868@mellanox.co.il>

> Quoting Sean Hefty <sean.hefty at intel.com>:
> Subject: RE: Re: [PATCH] for-2.6.23 ib/umad: add partition support
> 
> >This assumes an open file desriptor per-pkey, so the proposed API
> >extension umad_set_pkey would have to be changed to be per-port rather
> >than per-mad. But I think this is a better API, too: most apps
> >likely work within a single partition.
> 
> I don't think this is true for apps that use the userspace MAD interface (e.g.
> opensm).

SM (rather, SA) can just open file descriptor per pkey - it created them itself,
and there's a small number of partitions.

> Beyond that, this approach doesn't work for receiving MADs on different PKeys.

Yes, it does: we just filter out the MADs where pkey does not match.

I think that most other apps (besides SA) should really treat
each partition as a separate network. So getting MADs for a specific
pkey, rather than all pkeys, makes total sense to me.

-- 
MST


From jackm at dev.mellanox.co.il  Thu Jun 21 02:01:58 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Thu, 21 Jun 2007 12:01:58 +0300
Subject: [ofa-general] [PATCH] libmlx4: make BF available for RDMA_READ work
	requests
Message-ID: <200706211201.58440.jackm@dev.mellanox.co.il>

Make blueflame available for RDMA_READs (performance improvement).

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

Index: a/src/qp.c
===================================================================
--- a/src/qp.c	2007-06-20 16:31:36.000000000 +0300
+++ b/src/qp.c	2007-06-21 09:17:14.000000000 +0300
@@ -204,9 +204,11 @@
 
 				break;
 
+			case IBV_WR_RDMA_READ:
+				inl = 1;
+				/* fall through */
 			case IBV_WR_RDMA_WRITE:
 			case IBV_WR_RDMA_WRITE_WITH_IMM:
-			case IBV_WR_RDMA_READ:
 				((struct mlx4_wqe_raddr_seg *) wqe)->raddr =
 					htonll(wr->wr.rdma.remote_addr);
 				((struct mlx4_wqe_raddr_seg *) wqe)->rkey =


From jackm at dev.mellanox.co.il  Thu Jun 21 02:27:47 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Thu, 21 Jun 2007 12:27:47 +0300
Subject: [ofa-general] [PATCH 1 of 2]  mlx4: implement query-qp
Message-ID: <200706211227.47794.jackm@dev.mellanox.co.il>

Add query-qp capability.

Note that this also requires a libmlx4 patch for returning
qp capabilities (for sq caps at least).

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

Index: new_connectx_kernel/drivers/net/mlx4/qp.c
===================================================================
--- new_connectx_kernel.orig/drivers/net/mlx4/qp.c	2007-06-18 15:34:26.000000000 +0300
+++ new_connectx_kernel/drivers/net/mlx4/qp.c	2007-06-18 15:35:36.000000000 +0300
@@ -278,3 +278,24 @@
 	mlx4_CONF_SPECIAL_QP(dev, 0);
 	mlx4_bitmap_cleanup(&mlx4_priv(dev)->qp_table.bitmap);
 }
+
+int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp,
+		  struct mlx4_qp_context *context)
+{
+	struct mlx4_cmd_mailbox *mailbox;
+	int err;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+
+	err = mlx4_cmd_box(dev, 0, mailbox->dma, qp->qpn, 0,
+			   MLX4_CMD_QUERY_QP, MLX4_CMD_TIME_CLASS_A);
+	if (!err)
+		memcpy(context, mailbox->buf + 8, sizeof *context);
+
+	mlx4_free_cmd_mailbox(dev, mailbox);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_qp_query);
+
Index: new_connectx_kernel/drivers/infiniband/hw/mlx4/qp.c
===================================================================
--- new_connectx_kernel.orig/drivers/infiniband/hw/mlx4/qp.c	2007-06-18 15:34:26.000000000 +0300
+++ new_connectx_kernel/drivers/infiniband/hw/mlx4/qp.c	2007-06-18 17:09:21.000000000 +0300
@@ -1440,3 +1440,139 @@
 
 	return err;
 }
+
+static inline enum ib_qp_state to_ib_qp_state(enum mlx4_qp_state mlx4_state)
+{
+	switch (mlx4_state) {
+	case MLX4_QP_STATE_RST:      return IB_QPS_RESET;
+	case MLX4_QP_STATE_INIT:     return IB_QPS_INIT;
+	case MLX4_QP_STATE_RTR:      return IB_QPS_RTR;
+	case MLX4_QP_STATE_RTS:      return IB_QPS_RTS;
+	case MLX4_QP_STATE_SQ_DRAINING:
+	case MLX4_QP_STATE_SQD:      return IB_QPS_SQD;
+	case MLX4_QP_STATE_SQER:     return IB_QPS_SQE;
+	case MLX4_QP_STATE_ERR:      return IB_QPS_ERR;
+	default:                     return -1;
+	}
+}
+
+static inline enum ib_mig_state to_ib_mig_state(int mlx4_mig_state)
+{
+	switch (mlx4_mig_state) {
+	case MLX4_QP_PM_ARMED:		return IB_MIG_ARMED;
+	case MLX4_QP_PM_REARM:		return IB_MIG_REARM;
+	case MLX4_QP_PM_MIGRATED:	return IB_MIG_MIGRATED;
+	default: return -1;
+	}
+}
+
+static int to_ib_qp_access_flags(int mlx4_flags)
+{
+	int ib_flags = 0;
+
+	if (mlx4_flags & MLX4_QP_BIT_RRE)
+		ib_flags |= IB_ACCESS_REMOTE_READ;
+	if (mlx4_flags & MLX4_QP_BIT_RWE)
+		ib_flags |= IB_ACCESS_REMOTE_WRITE;
+	if (mlx4_flags & MLX4_QP_BIT_RAE)
+		ib_flags |= IB_ACCESS_REMOTE_ATOMIC;
+
+	return ib_flags;
+}
+
+static void to_ib_ah_attr(struct mlx4_dev *dev, struct ib_ah_attr *ib_ah_attr,
+				struct mlx4_qp_path *path)
+{
+	memset(ib_ah_attr, 0, sizeof *path);
+	ib_ah_attr->port_num 	  = path->sched_queue & 0x40 ? 2 : 1;
+
+	if (ib_ah_attr->port_num == 0 || ib_ah_attr->port_num > dev->caps.num_ports)
+		return;
+
+	ib_ah_attr->dlid     	  = be16_to_cpu(path->rlid);
+	ib_ah_attr->sl       	  = (path->sched_queue >> 2) & 0xf;
+	ib_ah_attr->src_path_bits = path->grh_mylmc & 0x7f;
+	ib_ah_attr->static_rate   = path->static_rate ? path->static_rate - 5 : 0;
+	ib_ah_attr->ah_flags      = (path->grh_mylmc & (1 << 7)) ? IB_AH_GRH : 0;
+	if (ib_ah_attr->ah_flags) {
+		ib_ah_attr->grh.sgid_index = path->mgid_index;
+		ib_ah_attr->grh.hop_limit  = path->hop_limit;
+		ib_ah_attr->grh.traffic_class =
+			(be32_to_cpu(path->tclass_flowlabel) >> 20) & 0xff;
+		ib_ah_attr->grh.flow_label =
+			be32_to_cpu(path->tclass_flowlabel) & 0xffffff;
+		memcpy(ib_ah_attr->grh.dgid.raw,
+			path->rgid, sizeof ib_ah_attr->grh.dgid.raw);
+	}
+}
+
+int mlx4_ib_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr, int qp_attr_mask,
+		     struct ib_qp_init_attr *qp_init_attr)
+{
+	struct mlx4_ib_dev *dev = to_mdev(ibqp->device);
+	struct mlx4_ib_qp *qp = to_mqp(ibqp);
+	struct mlx4_qp_context context;
+	int mlx4_state;
+	int err;
+
+	if (qp->state == IB_QPS_RESET) {
+		qp_attr->qp_state = IB_QPS_RESET;
+		goto done;
+	}
+
+	err = mlx4_qp_query(dev->dev, &qp->mqp, &context);
+	if (err)
+		return -EINVAL;
+
+	mlx4_state = be32_to_cpu(context.flags) >> 28;
+
+	qp_attr->qp_state 	     = to_ib_qp_state(mlx4_state);
+	qp_attr->path_mtu 	     = context.mtu_msgmax >> 5;
+	qp_attr->path_mig_state      =
+		to_ib_mig_state((be32_to_cpu(context.flags) >> 11) & 0x3);
+	qp_attr->qkey 		     = be32_to_cpu(context.qkey);
+	qp_attr->rq_psn 	     = be32_to_cpu(context.rnr_nextrecvpsn) & 0xffffff;
+	qp_attr->sq_psn 	     = be32_to_cpu(context.next_send_psn) & 0xffffff;
+	qp_attr->dest_qp_num 	     = be32_to_cpu(context.remote_qpn) & 0xffffff;
+	qp_attr->qp_access_flags     =
+		to_ib_qp_access_flags(be32_to_cpu(context.params2));
+
+	if (qp->ibqp.qp_type == IB_QPT_RC || qp->ibqp.qp_type == IB_QPT_UC) {
+		to_ib_ah_attr(dev->dev, &qp_attr->ah_attr, &context.pri_path);
+		to_ib_ah_attr(dev->dev, &qp_attr->alt_ah_attr, &context.alt_path);
+		qp_attr->alt_pkey_index = context.alt_path.pkey_index & 0x7f;
+		qp_attr->alt_port_num 	= qp_attr->alt_ah_attr.port_num;
+	}
+
+	qp_attr->pkey_index = context.pri_path.pkey_index & 0x7f;
+	qp_attr->port_num   = context.pri_path.sched_queue & 0x40 ? 2 : 1;
+
+	/* qp_attr->en_sqd_async_notify is only applicable in modify qp */
+	qp_attr->sq_draining = mlx4_state == MLX4_QP_STATE_SQ_DRAINING;
+
+	qp_attr->max_rd_atomic = 1 << ((be32_to_cpu(context.params1) >> 21) & 0x7);
+
+	qp_attr->max_dest_rd_atomic =
+		1 << ((be32_to_cpu(context.params2) >> 21) & 0x7);
+	qp_attr->min_rnr_timer 	    =
+		(be32_to_cpu(context.rnr_nextrecvpsn) >> 24) & 0x1f;
+	qp_attr->timeout 	    = context.pri_path.ackto >> 3;
+	qp_attr->retry_cnt 	    = (be32_to_cpu(context.params1) >> 16) & 0x7;
+	qp_attr->rnr_retry 	    = (be32_to_cpu(context.params1) >> 13) & 0x7;
+	qp_attr->alt_timeout 	    = context.alt_path.ackto >> 3;
+
+done:
+	qp_attr->cur_qp_state	     = qp_attr->qp_state;
+	if (!ibqp->uobject) {
+		qp_attr->cap.max_send_wr     = qp->sq.wqe_cnt;
+		qp_attr->cap.max_recv_wr     = qp->rq.wqe_cnt;
+		qp_attr->cap.max_send_sge    = qp->sq.max_gs;
+		qp_attr->cap.max_recv_sge    = qp->rq.max_gs;
+		qp_attr->cap.max_inline_data = (1 << qp->sq.wqe_shift) -
+			send_wqe_overhead(qp->ibqp.qp_type) -
+			sizeof (struct mlx4_wqe_inline_seg);
+		qp_init_attr->cap	     = qp_attr->cap;
+	}
+	return 0;
+}
+
Index: new_connectx_kernel/include/linux/mlx4/qp.h
===================================================================
--- new_connectx_kernel.orig/include/linux/mlx4/qp.h	2007-06-18 15:34:26.000000000 +0300
+++ new_connectx_kernel/include/linux/mlx4/qp.h	2007-06-18 15:35:36.000000000 +0300
@@ -282,6 +282,9 @@
 		   struct mlx4_qp_context *context, enum mlx4_qp_optpar optpar,
 		   int sqd_event, struct mlx4_qp *qp);
 
+int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp,
+		  struct mlx4_qp_context *context);
+
 static inline struct mlx4_qp *__mlx4_qp_lookup(struct mlx4_dev *dev, u32 qpn)
 {
 	return radix_tree_lookup(&dev->qp_table_tree, qpn & (dev->caps.num_qps - 1));
Index: new_connectx_kernel/drivers/infiniband/hw/mlx4/main.c
===================================================================
--- new_connectx_kernel.orig/drivers/infiniband/hw/mlx4/main.c	2007-06-18 15:22:02.000000000 +0300
+++ new_connectx_kernel/drivers/infiniband/hw/mlx4/main.c	2007-06-18 16:04:07.000000000 +0300
@@ -524,6 +524,7 @@
 		(1ull << IB_USER_VERBS_CMD_DESTROY_CQ)		|
 		(1ull << IB_USER_VERBS_CMD_CREATE_QP)		|
 		(1ull << IB_USER_VERBS_CMD_MODIFY_QP)		|
+		(1ull << IB_USER_VERBS_CMD_QUERY_QP)		|
 		(1ull << IB_USER_VERBS_CMD_DESTROY_QP)		|
 		(1ull << IB_USER_VERBS_CMD_ATTACH_MCAST)	|
 		(1ull << IB_USER_VERBS_CMD_DETACH_MCAST)	|
@@ -551,6 +552,7 @@
 	ibdev->ib_dev.post_srq_recv	= mlx4_ib_post_srq_recv;
 	ibdev->ib_dev.create_qp		= mlx4_ib_create_qp;
 	ibdev->ib_dev.modify_qp		= mlx4_ib_modify_qp;
+	ibdev->ib_dev.query_qp		= mlx4_ib_query_qp;
 	ibdev->ib_dev.destroy_qp	= mlx4_ib_destroy_qp;
 	ibdev->ib_dev.post_send		= mlx4_ib_post_send;
 	ibdev->ib_dev.post_recv		= mlx4_ib_post_recv;
Index: new_connectx_kernel/drivers/infiniband/hw/mlx4/mlx4_ib.h
===================================================================
--- new_connectx_kernel.orig/drivers/infiniband/hw/mlx4/mlx4_ib.h	2007-06-18 15:22:02.000000000 +0300
+++ new_connectx_kernel/drivers/infiniband/hw/mlx4/mlx4_ib.h	2007-06-18 16:03:19.000000000 +0300
@@ -267,6 +267,8 @@
 int mlx4_ib_destroy_qp(struct ib_qp *qp);
 int mlx4_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 		      int attr_mask, struct ib_udata *udata);
+int mlx4_ib_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr, int qp_attr_mask,
+		     struct ib_qp_init_attr *qp_init_attr);
 int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		      struct ib_send_wr **bad_wr);
 int mlx4_ib_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,


From jackm at dev.mellanox.co.il  Thu Jun 21 02:29:08 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Thu, 21 Jun 2007 12:29:08 +0300
Subject: [ofa-general] [PATCH 2 of 2] libmlx4: implement query_qp
Message-ID: <200706211229.08703.jackm@dev.mellanox.co.il>

For query-qp, fill in qp capabilities from user-space qp object.

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

Index: a/src/verbs.c
===================================================================
--- a/src/verbs.c	2007-06-18 09:33:04.000000000 +0300
+++ a/src/verbs.c	2007-06-18 17:10:23.000000000 +0300
@@ -445,8 +445,21 @@
 		   struct ibv_qp_init_attr *init_attr)
 {
 	struct ibv_query_qp cmd;
+	struct mlx4_qp *mqp;
+	int ret;
+
+	ret = ibv_cmd_query_qp(qp, attr, attr_mask, init_attr, &cmd, sizeof cmd);
+	if (ret)
+		return ret;
+	mqp = to_mqp(qp);
+	init_attr->cap.max_send_wr = mqp->sq.max_post;
+	init_attr->cap.max_send_sge = mqp->sq.max_gs;
+	init_attr->cap.max_recv_wr =  mqp->rq.max_post;
+	init_attr->cap.max_recv_sge =  mqp->rq.max_gs;
+	init_attr->cap.max_inline_data = mqp->max_inline_data;
+	attr->cap = init_attr->cap;
 
-	return ibv_cmd_query_qp(qp, attr, attr_mask, init_attr, &cmd, sizeof cmd);
+	return 0;
 }
 
 int mlx4_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr,


From vlad at lists.openfabrics.org  Thu Jun 21 02:46:43 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Thu, 21 Jun 2007 02:46:43 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070621-0200 daily build status
Message-ID: <20070621094643.AB6B0E6087C@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.12
Passed on x86_64 with linux-2.6.20
Passed on powerpc with linux-2.6.17
Passed on ia64 with linux-2.6.13
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.17
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.18
Passed on ia64 with linux-2.6.12
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.15
Passed on ia64 with linux-2.6.14
Passed on ppc64 with linux-2.6.12
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.14
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.15
Passed on ppc64 with linux-2.6.16
Passed on x86_64 with linux-2.6.14
Passed on ppc64 with linux-2.6.15
Passed on ia64 with linux-2.6.18
Passed on powerpc with linux-2.6.12
Passed on ia64 with linux-2.6.16
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on ia64 with linux-2.6.19
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.14
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.17
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ia64 with linux-2.6.21.1
Passed on ppc64 with linux-2.6.18
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.9-34.ELsmp
Passed on x86_64 with linux-2.6.18-8.el5

Failed:


From jackm at dev.mellanox.co.il  Thu Jun 21 03:03:11 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Thu, 21 Jun 2007 13:03:11 +0300
Subject: [ofa-general] [PATCH] mlx4: implement query-srq
Message-ID: <200706211303.11949.jackm@dev.mellanox.co.il>

Query SRQ support was added.

Signed-off-by: Dotan Barak <dotanb at mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 1095c82..ebc8d55 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -528,6 +528,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 		(1ull << IB_USER_VERBS_CMD_DETACH_MCAST)	|
 		(1ull << IB_USER_VERBS_CMD_CREATE_SRQ)		|
 		(1ull << IB_USER_VERBS_CMD_MODIFY_SRQ)		|
+		(1ull << IB_USER_VERBS_CMD_QUERY_SRQ)		|
 		(1ull << IB_USER_VERBS_CMD_DESTROY_SRQ);
 
 	ibdev->ib_dev.query_device	= mlx4_ib_query_device;
@@ -546,6 +547,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	ibdev->ib_dev.destroy_ah	= mlx4_ib_destroy_ah;
 	ibdev->ib_dev.create_srq	= mlx4_ib_create_srq;
 	ibdev->ib_dev.modify_srq	= mlx4_ib_modify_srq;
+	ibdev->ib_dev.query_srq		= mlx4_ib_query_srq;
 	ibdev->ib_dev.destroy_srq	= mlx4_ib_destroy_srq;
 	ibdev->ib_dev.post_srq_recv	= mlx4_ib_post_srq_recv;
 	ibdev->ib_dev.create_qp		= mlx4_ib_create_qp;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 24ccadd..dab0fd9 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -255,6 +255,7 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd,
 				  struct ib_udata *udata);
 int mlx4_ib_modify_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr,
 		       enum ib_srq_attr_mask attr_mask, struct ib_udata *udata);
+int mlx4_ib_query_srq(struct ib_srq *srq, struct ib_srq_attr *srq_attr);
 int mlx4_ib_destroy_srq(struct ib_srq *srq);
 void mlx4_ib_free_srq_wqe(struct mlx4_ib_srq *srq, int wqe_index);
 int mlx4_ib_post_srq_recv(struct ib_srq *ibsrq, struct ib_recv_wr *wr,
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c
index 12fac1c..408748f 100644
--- a/drivers/infiniband/hw/mlx4/srq.c
+++ b/drivers/infiniband/hw/mlx4/srq.c
@@ -240,6 +240,24 @@ int mlx4_ib_modify_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr,
 	return 0;
 }
 
+int mlx4_ib_query_srq(struct ib_srq *ibsrq, struct ib_srq_attr *srq_attr)
+{
+	struct mlx4_ib_dev *dev = to_mdev(ibsrq->device);
+	struct mlx4_ib_srq *srq = to_msrq(ibsrq);
+	int ret;
+	int limit_watermark;
+
+	ret = mlx4_srq_query(dev->dev, &srq->msrq, &limit_watermark);
+	if (ret)
+		return ret;
+
+	srq_attr->srq_limit = be16_to_cpu(limit_watermark);
+	srq_attr->max_wr    = srq->msrq.max - 1;
+	srq_attr->max_sge   = srq->msrq.max_gs;
+
+	return 0;
+}
+
 int mlx4_ib_destroy_srq(struct ib_srq *srq)
 {
 	struct mlx4_ib_dev *dev = to_mdev(srq->device);
diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c
diff --git a/drivers/net/mlx4/srq.c b/drivers/net/mlx4/srq.c
index 2134f83..b061c86 100644
--- a/drivers/net/mlx4/srq.c
+++ b/drivers/net/mlx4/srq.c
@@ -102,6 +102,13 @@ static int mlx4_ARM_SRQ(struct mlx4_dev *dev, int srq_num, int limit_watermark)
 			MLX4_CMD_TIME_CLASS_B);
 }
 
+static int mlx4_QUERY_SRQ(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox,
+			  int srq_num)
+{
+	return mlx4_cmd_box(dev, 0, mailbox->dma, srq_num, 0, MLX4_CMD_QUERY_SRQ,
+			    MLX4_CMD_TIME_CLASS_A);
+}
+
 int mlx4_srq_alloc(struct mlx4_dev *dev, u32 pdn, struct mlx4_mtt *mtt,
 		   u64 db_rec, struct mlx4_srq *srq)
 {
@@ -205,6 +212,29 @@ int mlx4_srq_arm(struct mlx4_dev *dev, struct mlx4_srq *srq, int limit_watermark
 }
 EXPORT_SYMBOL_GPL(mlx4_srq_arm);
 
+int mlx4_srq_query(struct mlx4_dev *dev, struct mlx4_srq *srq, int *limit_watermark)
+{
+	struct mlx4_cmd_mailbox *mailbox;
+	struct mlx4_srq_context *srq_context;
+	int err;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+
+	srq_context = mailbox->buf;
+
+	err = mlx4_QUERY_SRQ(dev, mailbox, srq->srqn);
+	if (err)
+		goto err_out;
+	*limit_watermark = srq_context->limit_watermark;
+
+err_out:
+	mlx4_free_cmd_mailbox(dev, mailbox);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_srq_query);
+
 int __devinit mlx4_init_srq_table(struct mlx4_dev *dev)
 {
 	struct mlx4_srq_table *srq_table = &mlx4_priv(dev)->srq_table;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index b372f59..6bdd5de 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -322,6 +322,7 @@ int mlx4_srq_alloc(struct mlx4_dev *dev, u32 pdn, struct mlx4_mtt *mtt,
 		   u64 db_rec, struct mlx4_srq *srq);
 void mlx4_srq_free(struct mlx4_dev *dev, struct mlx4_srq *srq);
 int mlx4_srq_arm(struct mlx4_dev *dev, struct mlx4_srq *srq, int limit_watermark);
+int mlx4_srq_query(struct mlx4_dev *dev, struct mlx4_srq *srq, int *limit_watermark);
 
 int mlx4_INIT_PORT(struct mlx4_dev *dev, int port);
 int mlx4_CLOSE_PORT(struct mlx4_dev *dev, int port);
diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h


From ogerlitz at voltaire.com  Thu Jun 21 03:41:03 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 21 Jun 2007 13:41:03 +0300 (IDT)
Subject: [ofa-general] [PATCH] remove confusing code from udaddy
Message-ID: <Pine.LNX.4.64.0706211333310.26603@zuben>

as the man page of rdma_connect, the qp_num and retry_count
params are relevant only to RDMA_PS_TCP call.

signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>

--- librdmacm/examples/udaddy.c.orig	2007-06-21 13:34:59.000000000 +0300
+++ librdmacm/examples/udaddy.c	2007-06-21 13:35:58.000000000 +0300
@@ -264,8 +264,6 @@ static int route_handler(struct cmatest_
 		goto err;

 	memset(&conn_param, 0, sizeof conn_param);
-	conn_param.qp_num = node->cma_id->qp->qp_num;
-	conn_param.retry_count = 5;
 	ret = rdma_connect(node->cma_id, &conn_param);
 	if (ret) {
 		printf("udaddy: failure connecting: %d\n", ret);


From sashak at voltaire.com  Thu Jun 21 04:35:31 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Thu, 21 Jun 2007 14:35:31 +0300
Subject: [ofa-general] backups
In-Reply-To: <795c49870706201132r1f7633f8r2cf3cb2a71edc6e0@mail.gmail.com>
References: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com>
	<adahcp2a4ol.fsf@cisco.com>
	<795c49870706201132r1f7633f8r2cf3cb2a71edc6e0@mail.gmail.com>
Message-ID: <1182425733.30285.51.camel@localhost>

On Wed, 2007-06-20 at 11:32 -0700, Jeff Becker wrote:
> I'm backing up /data/pub/scm. A quick "du -chL" shows it to be 4.2G.

I think you can publish output of "du -schL /data/pub/scm/*". So we
could ask most space consuming users to pack their repos (with
'git-repack -a -d').

Sasha

> Perhaps I only need to backup a subset of /data/pub/scm? Thanks.
> 
> -jeff
> 
> On 6/20/07, Roland Dreier <rdreier at cisco.com> wrote:
> >  > Hi. I've started backing up the git trees and the web content using
> >  > rsync. John Companies gave us a 10G NFS partition for this. I've done
> >  > two backups and there's only 800M left. Also, I haven't backed up the
> >  > daily builds yet. I was told we could get more space for one dollar
> >  > per GB per month. Depending on the budget, we should increase this
> >  > backup space. How should we proceed? Thanks.
> >
> > Where is all the space going?  A full kernel git tree (with more than
> > two years of history) takes less than 150 MB of storage for me.  How
> > are we using up so much space?
> >
> > Also, FWIW, amazon S3 is $0.15 / GB-month + $0.10 for each GB
> > transferred in.  Of course it's probably a lot less convenient to back
> > up to.
> >
> >  - R.
> >
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From changerv at gmail.com  Thu Jun 21 05:15:19 2007
From: changerv at gmail.com (Changer Van)
Date: Thu, 21 Jun 2007 20:15:19 +0800
Subject: [ofa-general] Can't open HCA InfiniHost0 problem
Message-ID: <9fa3c2e50706210515l5ba18cb1h6eb4718f0749bb21@mail.gmail.com>

Hi all,
I got some errors when I performed lctl network up command,
here are some log messages:

… kernel: LustreError: 12355:0:(viblnd.c:1800:kibnal_startup()) Can't open
HCA InfiniHost0: -256
but my ib card's hca_id is InfiniHost_III_Ex0,
how to config to look for the hca_id like InfiniHost_III_Ex0?

Any help would be greatly appreciated.

-- 
Regards,
Changer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070621/161cea48/attachment.html>

From wombat2 at us.ibm.com  Thu Jun 21 05:52:20 2007
From: wombat2 at us.ibm.com (Bernard King-Smith)
Date: Thu, 21 Jun 2007 08:52:20 -0400
Subject: [ofa-general] Re: Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <20070621032029.GE8868@mellanox.co.il>
Message-ID: <OF6F4E2A63.E5E803F1-ON85257301.00466143-85257301.0046D4AB@us.ibm.com>

"Michael S. Tsirkin" <mst at dev.mellanox.co.il> wrote on 06/20/2007 11:20:29 
PM:

> > Since SRQ is not a required function in the IB spec we never addressed 
that
> > issue in the RFC along with UC.
> >
> > ...
> 
> Since SRQ is almost transparent wire-protocol-wise, RFC probably does 
not have
> to say anything about it. But I wonder why do you say this about UC 
which is
> explicitly documented in the spec.
> 

Looks like that was added in the last set of revisions.

> > If you really want to start splitting up which layer has part of 
> the decision
> > on how to connect, then you need to propose a totally different RFC.
> >
> > ...
> 
> I hear an architect speaking :)

Guilty as charged  :=}

> You seem to use the term layer in the OSI model sense, while Roland is 
just
> speaking about code organisation.  We haven't stopped developing ipoib, 
so
> duplicating the controlling logic is a problem for us: both performance 
and
> maintainance wise.  Abstracting the SRQ/nonSRQ issue out, by 
> implementing a set
> of functions that can work on top of either SRQ or a pool of QPs is 
> the proposed
> solution.

Still trying to understand why this is easier to maintain and performs 
better than the current patch. If this has to go in the drivers, then this 
has to be a part of the distros. Seems messy.

> 
> -- 
> MST


Bernie King-Smith 
IBM Corporation
Server Group
Cluster System Performance 
wombat2 at us.ibm.com    (845)433-8483
Tie. 293-8483 or wombat2 on NOTES 

"We are not responsible for the world we are born into, only for the world 
we leave when we die.
So we have to accept what has gone before us and work to change the only 
thing we can,
-- The Future." William Shatner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070621/1c68e903/attachment.html>

From mst at dev.mellanox.co.il  Thu Jun 21 06:07:12 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 21 Jun 2007 16:07:12 +0300
Subject: [ofa-general] Re: backups
In-Reply-To: <adahcp2a4ol.fsf@cisco.com>
References: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com>
	<adahcp2a4ol.fsf@cisco.com>
Message-ID: <20070621130712.GG4857@mellanox.co.il>

> Where is all the space going?  A full kernel git tree (with more than
> two years of history) takes less than 150 MB of storage for me.

Most likely there are some unpacked trees.

-- 
MST


From halr at voltaire.com  Thu Jun 21 06:39:08 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 21 Jun 2007 09:39:08 -0400
Subject: [ofa-general] Re: [PATCH] osm: adding root_guid_file and
	cn_guid_file OpenSM options
In-Reply-To: <4675285A.6060309@dev.mellanox.co.il>
References: <4675285A.6060309@dev.mellanox.co.il>
Message-ID: <1182433144.15653.403468.camel@hal.voltaire.com>

Hi Yevgeny,

On Sun, 2007-06-17 at 08:26, Yevgeny Kliteynik wrote:
> Hi Hal,
> 
> This patch replaces updn_guid_file in the Up/Down routing with
> root_guid_file for Up/Down and Fat-Tree routing, and adds a new
> option - cn_guid_file for Fat-Tree routing.
> OpenSM command line options for these two files are:
> 
>   '-a' or '--root_guid_file' for roots
>   '-u' or '--cn_guid_file' for compute nodes
> 
> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

This entire patch was rejected when I attempted to apply it. Can you
regenerate it ? Thanks.

-- Hal

> ---
>   opensm/include/opensm/osm_subnet.h |   12 +++++++++---
>   opensm/opensm/main.c               |   29 ++++++++++++++++++++++-------
>   opensm/opensm/osm_subnet.c         |   25 ++++++++++++++++++-------
>   opensm/opensm/osm_ucast_updn.c     |    6 +++---
>   4 files changed, 52 insertions(+), 20 deletions(-)
> 
> diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
> index c62128b..a38fc49 100644
> --- a/opensm/include/opensm/osm_subnet.h
> +++ b/opensm/include/opensm/osm_subnet.h
> @@ -278,7 +278,8 @@ typedef struct _osm_subn_opt
>     char *                   routing_engine_name;
>     char *                   lid_matrix_dump_file;
>     char *                   ucast_dump_file;
> -  char *                   updn_guid_file;
> +  char *                   root_guid_file;
> +  char *                   cn_guid_file;
>     char *                   sa_db_file;
>     boolean_t                exit_on_fatal;
>     boolean_t                honor_guid2lid_file;
> @@ -452,8 +453,13 @@ typedef struct _osm_subn_opt
>   *		Name of the unicast routing dump file from where switch
>   *		forwarding tables will be loaded
>   *
> -*	updn_guid_file
> -*		Pointer to name of the UPDN guid file given by User
> +*	root_guid_file
> +*		Name of the file that contains list of root guids that
> +*		will be used by fat-tree or up/dn routing (provided by User)
> +*
> +*	cn_guid_file
> +*		Name of the file that contains list of compute node guids that
> +*		will be used by fat-tree routing (provided by User)
>   *
>   *	sa_db_file
>   *		Name of the SA database file.
> diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
> index 6b4cb4f..d17a994 100644
> --- a/opensm/opensm/main.c
> +++ b/opensm/opensm/main.c
> @@ -189,8 +189,14 @@ show_usage(void)
>             "          This option specifies the name of the SA DB dump file\n"
>             "          from where SA database will be loaded.\n\n");
>     printf ("-a\n"
> -          "--add_guid_file <path to file>\n"
> -          "          Set the root nodes for the Up/Down routing algorithm\n"
> +          "--root_guid_file <path to file>\n"
> +          "          Set the root nodes for the Up/Down or Fat-Tree routing\n"
> +          "          algorithm to the guids provided in the given file (one\n"
> +          "          to a line)\n"
> +          "\n");
> +  printf ("-u\n"
> +          "--cn_guid_file <path to file>\n"
> +          "          Set the compute nodes for the Fat-Tree routing algorithm\n"
>             "          to the guids provided in the given file (one to a line)\n"
>             "\n");
>     printf( "-o\n"
> @@ -585,7 +591,7 @@ main(
>     char                 *ignore_guids_file_name = NULL;
>     uint32_t              val;
>     const char * const    short_option =
> -	  "i:f:ed:g:l:L:s:t:a:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:";
> +	  "i:f:ed:g:l:L:s:t:a:u:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:";
> 
>     /*
>       In the array below, the 2nd parameter specifies the number
> @@ -622,7 +628,8 @@ main(
>         {  "lid_matrix_file",1, NULL, 'M'},
>         {  "ucast_file",    1, NULL, 'U'},
>         {  "sadb_file",     1, NULL, 'S'},
> -      {  "add_guid_file", 1, NULL, 'a'},
> +      {  "root_guid_file",1, NULL, 'a'},
> +      {  "cn_guid_file",  1, NULL, 'u'},
>         {  "cache-options", 0, NULL, 'c'},
>         {  "stay_on_fatal", 0, NULL, 'y'},
>         {  "honor_guid2lid",0, NULL, 'x'},
> @@ -886,10 +893,18 @@ main(
> 
>       case 'a':
>         /*
> -        Specifies port guids file
> +        Specifies root guids file
> +      */
> +      opt.root_guid_file = optarg;
> +      printf (" Root Guid File: %s\n", opt.root_guid_file );
> +      break;
> +
> +    case 'u':
> +      /*
> +        Specifies compute node guids file
>         */
> -      opt.updn_guid_file = optarg;
> -      printf (" UPDN Guid File: %s\n", opt.updn_guid_file );
> +      opt.cn_guid_file = optarg;
> +      printf (" Compute Node Guid File: %s\n", opt.cn_guid_file );
>         break;
> 
>       case 'c':
> diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
> index 736f49a..4e080ba 100644
> --- a/opensm/opensm/osm_subnet.c
> +++ b/opensm/opensm/osm_subnet.c
> @@ -500,7 +500,8 @@ osm_subn_set_default_opt(
>     p_opt->routing_engine_name = NULL;
>     p_opt->lid_matrix_dump_file = NULL;
>     p_opt->ucast_dump_file = NULL;
> -  p_opt->updn_guid_file = NULL;
> +  p_opt->root_guid_file = NULL;
> +  p_opt->cn_guid_file = NULL;
>     p_opt->sa_db_file = NULL;
>     p_opt->exit_on_fatal = TRUE;
>     p_opt->enable_quirks = FALSE;
> @@ -1323,8 +1324,12 @@ osm_subn_parse_conf_file(
>           p_key, p_val, &p_opts->ucast_dump_file);
> 
>         __osm_subn_opts_unpack_charp(
> -        "updn_guid_file",
> -        p_key, p_val, &p_opts->updn_guid_file);
> +        "root_guid_file",
> +        p_key, p_val, &p_opts->root_guid_file);
> +
> +      __osm_subn_opts_unpack_charp(
> +        "cn_guid_file",
> +        p_key, p_val, &p_opts->cn_guid_file);
> 
>         __osm_subn_opts_unpack_charp(
>           "sa_db_file",
> @@ -1548,12 +1553,18 @@ osm_subn_write_conf_file(
>                "# Ucast dump file name\n"
>                "ucast_dump_file %s\n\n",
>                p_opts->ucast_dump_file);
> -  if (p_opts->updn_guid_file)
> +  if (p_opts->root_guid_file)
> +    fprintf( opts_file,
> +             "# The file holding the root node guids (for fat-tree or Up/Down)\n"
> +             "# One guid in each line\n"
> +             "root_guid_file %s\n\n",
> +             p_opts->root_guid_file);
> +  if (p_opts->cn_guid_file)
>       fprintf( opts_file,
> -             "# The file holding the Up/Down root node guids\n"
> +             "# The file holding the fat-tree compute node guids\n"
>                "# One guid in each line\n"
> -             "updn_guid_file %s\n\n",
> -             p_opts->updn_guid_file);
> +             "cn_guid_file %s\n\n",
> +             p_opts->cn_guid_file);
>     if (p_opts->sa_db_file)
>       fprintf( opts_file,
>                "# SA database file name\n"
> diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c
> index 2448246..af5ee4e 100644
> --- a/opensm/opensm/osm_ucast_updn.c
> +++ b/opensm/opensm/osm_ucast_updn.c
> @@ -311,10 +311,10 @@ updn_init(
>        Check the source for root node list, if file parse it, otherwise
>        wait for a callback to activate auto detection
>     */
> -  if (p_osm->subn.opt.updn_guid_file)
> +  if (p_osm->subn.opt.root_guid_file)
>     {
>       status = osm_ucast_mgr_read_guid_file( &p_osm->sm.ucast_mgr,
> -                                           p_osm->subn.opt.updn_guid_file,
> +                                           p_osm->subn.opt.root_guid_file,
>                                              p_updn->p_root_nodes );
>       if (status != IB_SUCCESS)
>          goto Exit;
> @@ -323,7 +323,7 @@ updn_init(
>       osm_log( &p_osm->log, OSM_LOG_DEBUG,
>                "updn_init: "
>                "UPDN - Fetching root nodes from file %s\n",
> -             p_osm->subn.opt.updn_guid_file );
> +             p_osm->subn.opt.root_guid_file );
>       guid_iterator = cl_list_head(p_updn->p_root_nodes);
>       while( guid_iterator != cl_list_end(p_updn->p_root_nodes) )
>       {


From mst at dev.mellanox.co.il  Thu Jun 21 06:51:20 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 21 Jun 2007 16:51:20 +0300
Subject: [ofa-general] Re: Re: [PATCH] for-2.6.23 ib/umad: add partition
	support
In-Reply-To: <20070621065731.GJ8868@mellanox.co.il>
References: <20070621033854.GF8868@mellanox.co.il>
	<000001c7b3c7$d2fdca20$a3cc180a@amr.corp.intel.com>
	<20070621065731.GJ8868@mellanox.co.il>
Message-ID: <20070621135120.GH4857@mellanox.co.il>

> Quoting Michael S. Tsirkin <mst at dev.mellanox.co.il>:
> Subject: Re: Re: [PATCH] for-2.6.23 ib/umad: add partition support
> 
> > Quoting Sean Hefty <sean.hefty at intel.com>:
> > Subject: RE: Re: [PATCH] for-2.6.23 ib/umad: add partition support
> > 
> > >This assumes an open file desriptor per-pkey, so the proposed API
> > >extension umad_set_pkey would have to be changed to be per-port rather
> > >than per-mad. But I think this is a better API, too: most apps
> > >likely work within a single partition.
> > 
> > I don't think this is true for apps that use the userspace MAD interface (e.g.
> > opensm).
> 
> SM (rather, SA) can just open file descriptor per pkey - it created them itself,
> and there's a small number of partitions.
> 
> > Beyond that, this approach doesn't work for receiving MADs on different PKeys.
> 
> Yes, it does: we just filter out the MADs where pkey does not match.
> 
> I think that most other apps (besides SA) should really treat
> each partition as a separate network. So getting MADs for a specific
> pkey, rather than all pkeys, makes total sense to me.

Hal, could you pls comment on whether this approach will work for opensm?

-- 
MST


From tziporet at mellanox.co.il  Thu Jun 21 07:05:24 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Thu, 21 Jun 2007 17:05:24 +0300
Subject: [ofa-general] Re: [ewg] Anouncement: OFED 1.2 rc6 is avilable
In-Reply-To: <OFE2709483.B5746D0D-ONC1257300.0044F6E0-C1257300.004AE5B3@de.ibm.com>
References: <OFE2709483.B5746D0D-ONC1257300.0044F6E0-C1257300.004AE5B3@de.ibm.com>
Message-ID: <467A85A4.2080805@mellanox.co.il>

Hoang-Nam Nguyen wrote:
>
> Hello Tziporet!
> In the attached release notes I see under "1.2 Supported Platforms and 
> Operating Systems" this:
> - RedHat EL5: 2.6.9-42.ELsmp
> which should be 2.6.18-8.el5 according to my "uname -r" on a rhel5 
> system.
>
>
Thanks,

I fixed this

Tziporet


From jsquyres at cisco.com  Thu Jun 21 07:09:23 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Thu, 21 Jun 2007 10:09:23 -0400
Subject: [ofa-general] Stringify ibv_event_type
Message-ID: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com>

Could a function to stringify the ibv_event_type enum can be added to  
libibverbs?  It could be similar to the event_name_str() function in  
libibverbs/examples/asyncwatch.c:

-----
static const char *event_name_str(enum ibv_event_type event_type)
{
         switch (event_type) {
         case IBV_EVENT_DEVICE_FATAL:
                 return "IBV_EVENT_DEVICE_FATAL";
...etc.
-----

Rationale: if multiple client apps (such as the OF-based MPI  
implementations) start using the asynch events and there is no  
central function for string-ifying the event enum, they'll all end up  
doing the translation themselves when printing out error messages.   
It's not a huge amount of code, but it does seem kinda odd to make  
everyone replicate essentially the same stuff.  Additionally, the  
available enum values may grow over time, forcing client apps to  
figure out which ones are available and adjust their event_name_str()  
equivalent as appropriate.  Hiding the possibility of change down in  
libibverbs seems appropriate.

-- 
Jeff Squyres
Cisco Systems


From kliteyn at dev.mellanox.co.il  Thu Jun 21 07:49:35 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 21 Jun 2007 17:49:35 +0300
Subject: [ofa-general] [PATCHv2] osm: adding root_guid_file and cn_guid_file
	OpenSM options
In-Reply-To: <1182433144.15653.403468.camel@hal.voltaire.com>
References: <4675285A.6060309@dev.mellanox.co.il>
	<1182433144.15653.403468.camel@hal.voltaire.com>
Message-ID: <467A8FFF.2040207@dev.mellanox.co.il>

Hi Hal,

Hal Rosenstock wrote:
> Hi Yevgeny,
> 
> On Sun, 2007-06-17 at 08:26, Yevgeny Kliteynik wrote:
>> Hi Hal,
>>
>> This patch replaces updn_guid_file in the Up/Down routing with
>> root_guid_file for Up/Down and Fat-Tree routing, and adds a new
>> option - cn_guid_file for Fat-Tree routing.
>> OpenSM command line options for these two files are:
>>
>>   '-a' or '--root_guid_file' for roots
>>   '-u' or '--cn_guid_file' for compute nodes
>>
>> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> 
> This entire patch was rejected when I attempted to apply it. Can you
> regenerate it ? Thanks.

Indeed, there were changes in osm_subnet.{c,h} since I've issued this patch.
Here's the new one:

Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
  opensm/include/opensm/osm_subnet.h |   12 +++++++++---
  opensm/opensm/main.c               |   29 ++++++++++++++++++++++-------
  opensm/opensm/osm_subnet.c         |   25 ++++++++++++++++++-------
  opensm/opensm/osm_ucast_updn.c     |    6 +++---
  4 files changed, 52 insertions(+), 20 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
index b296caf..2ee5689 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -278,7 +278,8 @@ typedef struct _osm_subn_opt
    char *                   routing_engine_name;
    char *                   lid_matrix_dump_file;
    char *                   ucast_dump_file;
-  char *                   updn_guid_file;
+  char *                   root_guid_file;
+  char *                   cn_guid_file;
    char *                   sa_db_file;
    boolean_t                exit_on_fatal;
    boolean_t                honor_guid2lid_file;
@@ -452,8 +453,13 @@ typedef struct _osm_subn_opt
  *		Name of the unicast routing dump file from where switch
  *		forwarding tables will be loaded
  *
-*	updn_guid_file
-*		Pointer to name of the UPDN guid file given by User
+*	root_guid_file
+*		Name of the file that contains list of root guids that
+*		will be used by fat-tree or up/dn routing (provided by User)
+*
+*	cn_guid_file
+*		Name of the file that contains list of compute node guids that
+*		will be used by fat-tree routing (provided by User)
  *
  *	sa_db_file
  *		Name of the SA database file.
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index 6b4cb4f..d17a994 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -189,8 +189,14 @@ show_usage(void)
            "          This option specifies the name of the SA DB dump file\n"
            "          from where SA database will be loaded.\n\n");
    printf ("-a\n"
-          "--add_guid_file <path to file>\n"
-          "          Set the root nodes for the Up/Down routing algorithm\n"
+          "--root_guid_file <path to file>\n"
+          "          Set the root nodes for the Up/Down or Fat-Tree routing\n"
+          "          algorithm to the guids provided in the given file (one\n"
+          "          to a line)\n"
+          "\n");
+  printf ("-u\n"
+          "--cn_guid_file <path to file>\n"
+          "          Set the compute nodes for the Fat-Tree routing algorithm\n"
            "          to the guids provided in the given file (one to a line)\n"
            "\n");
    printf( "-o\n"
@@ -585,7 +591,7 @@ main(
    char                 *ignore_guids_file_name = NULL;
    uint32_t              val;
    const char * const    short_option =
-	  "i:f:ed:g:l:L:s:t:a:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:";
+	  "i:f:ed:g:l:L:s:t:a:u:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:";

    /*
      In the array below, the 2nd parameter specifies the number
@@ -622,7 +628,8 @@ main(
        {  "lid_matrix_file",1, NULL, 'M'},
        {  "ucast_file",    1, NULL, 'U'},
        {  "sadb_file",     1, NULL, 'S'},
-      {  "add_guid_file", 1, NULL, 'a'},
+      {  "root_guid_file",1, NULL, 'a'},
+      {  "cn_guid_file",  1, NULL, 'u'},
        {  "cache-options", 0, NULL, 'c'},
        {  "stay_on_fatal", 0, NULL, 'y'},
        {  "honor_guid2lid",0, NULL, 'x'},
@@ -886,10 +893,18 @@ main(

      case 'a':
        /*
-        Specifies port guids file
+        Specifies root guids file
+      */
+      opt.root_guid_file = optarg;
+      printf (" Root Guid File: %s\n", opt.root_guid_file );
+      break;
+
+    case 'u':
+      /*
+        Specifies compute node guids file
        */
-      opt.updn_guid_file = optarg;
-      printf (" UPDN Guid File: %s\n", opt.updn_guid_file );
+      opt.cn_guid_file = optarg;
+      printf (" Compute Node Guid File: %s\n", opt.cn_guid_file );
        break;

      case 'c':
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 5a79149..7a223e3 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -502,7 +502,8 @@ osm_subn_set_default_opt(
    p_opt->routing_engine_name = NULL;
    p_opt->lid_matrix_dump_file = NULL;
    p_opt->ucast_dump_file = NULL;
-  p_opt->updn_guid_file = NULL;
+  p_opt->root_guid_file = NULL;
+  p_opt->cn_guid_file = NULL;
    p_opt->sa_db_file = NULL;
    p_opt->exit_on_fatal = TRUE;
    p_opt->enable_quirks = FALSE;
@@ -1325,8 +1326,12 @@ osm_subn_parse_conf_file(
          p_key, p_val, &p_opts->ucast_dump_file);

        __osm_subn_opts_unpack_charp(
-        "updn_guid_file",
-        p_key, p_val, &p_opts->updn_guid_file);
+        "root_guid_file",
+        p_key, p_val, &p_opts->root_guid_file);
+
+      __osm_subn_opts_unpack_charp(
+        "cn_guid_file",
+        p_key, p_val, &p_opts->cn_guid_file);

        __osm_subn_opts_unpack_charp(
          "sa_db_file",
@@ -1550,12 +1555,18 @@ osm_subn_write_conf_file(
               "# Ucast dump file name\n"
               "ucast_dump_file %s\n\n",
               p_opts->ucast_dump_file);
-  if (p_opts->updn_guid_file)
+  if (p_opts->root_guid_file)
+    fprintf( opts_file,
+             "# The file holding the root node guids (for fat-tree or Up/Down)\n"
+             "# One guid in each line\n"
+             "root_guid_file %s\n\n",
+             p_opts->root_guid_file);
+  if (p_opts->cn_guid_file)
      fprintf( opts_file,
-             "# The file holding the Up/Down root node guids\n"
+             "# The file holding the fat-tree compute node guids\n"
               "# One guid in each line\n"
-             "updn_guid_file %s\n\n",
-             p_opts->updn_guid_file);
+             "cn_guid_file %s\n\n",
+             p_opts->cn_guid_file);
    if (p_opts->sa_db_file)
      fprintf( opts_file,
               "# SA database file name\n"
diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c
index 2448246..af5ee4e 100644
--- a/opensm/opensm/osm_ucast_updn.c
+++ b/opensm/opensm/osm_ucast_updn.c
@@ -311,10 +311,10 @@ updn_init(
       Check the source for root node list, if file parse it, otherwise
       wait for a callback to activate auto detection
    */
-  if (p_osm->subn.opt.updn_guid_file)
+  if (p_osm->subn.opt.root_guid_file)
    {
      status = osm_ucast_mgr_read_guid_file( &p_osm->sm.ucast_mgr,
-                                           p_osm->subn.opt.updn_guid_file,
+                                           p_osm->subn.opt.root_guid_file,
                                             p_updn->p_root_nodes );
      if (status != IB_SUCCESS)
         goto Exit;
@@ -323,7 +323,7 @@ updn_init(
      osm_log( &p_osm->log, OSM_LOG_DEBUG,
               "updn_init: "
               "UPDN - Fetching root nodes from file %s\n",
-             p_osm->subn.opt.updn_guid_file );
+             p_osm->subn.opt.root_guid_file );
      guid_iterator = cl_list_head(p_updn->p_root_nodes);
      while( guid_iterator != cl_list_end(p_updn->p_root_nodes) )
      {
-- 
1.5.1.4


From minich at ornl.gov  Thu Jun 21 07:52:57 2007
From: minich at ornl.gov (Makia Minich)
Date: Thu, 21 Jun 2007 10:52:57 -0400
Subject: [ofa-general] Can't open HCA InfiniHost0 problem
In-Reply-To: <9fa3c2e50706210515l5ba18cb1h6eb4718f0749bb21@mail.gmail.com>
References: <9fa3c2e50706210515l5ba18cb1h6eb4718f0749bb21@mail.gmail.com>
Message-ID: <200706211052.57585.minich@ornl.gov>

If you are using the OFED stack (as I'm expecting from the list you used) you 
need to use the o2ib lnd and not the vib lnd.

On Thursday 21 June 2007 8:15:19 am Changer Van wrote:
> Hi all,
> I got some errors when I performed lctl network up command,
> here are some log messages:
>
> … kernel: LustreError: 12355:0:(viblnd.c:1800:kibnal_startup()) Can't open
> HCA InfiniHost0: -256
> but my ib card's hca_id is InfiniHost_III_Ex0,
> how to config to look for the hca_id like InfiniHost_III_Ex0?
>
> Any help would be greatly appreciated.

-- 
Makia Minich <minich at ornl.gov>
National Center for Computation Science
Oak Ridge National Laboratory
--*--
Imagine no possessions
I wonder if you can
- John Lennon


From yann.kalemkarian at bull.net  Thu Jun 21 07:53:14 2007
From: yann.kalemkarian at bull.net (Yann K.)
Date: Thu, 21 Jun 2007 16:53:14 +0200
Subject: [ofa-general] [Fwd: [Error] Asynchronous Thread]
Message-ID: <467A90DA.1000107@bull.net>


-- 
Yann Kalemkarian
HPC Software Engineer
Open Software R&D
Bull, Architect of an Open World TM
Phone: +33 4 7629 7393
www.bull.com

-------------- next part --------------
An embedded message was scrubbed...
From: "Yann K." <yann.kalemkarian at bull.net>
Subject: [Error] Asynchronous Thread
Date: Thu, 21 Jun 2007 16:50:59 +0200
Size: 1042
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070621/01d33fbc/attachment.mht>

From arthur.jones at qlogic.com  Thu Jun 21 08:23:12 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Thu, 21 Jun 2007 08:23:12 -0700
Subject: [ofa-general] Re: [PATCH 24/28] IB/ipath - ipath_poll fixups and
	enhancements
In-Reply-To: <ada1wg68hp0.fsf@cisco.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com>
	<ada1wg68hp0.fsf@cisco.com>
Message-ID: <20070621152312.GA14817@bauxite.pathscale.com>

hi roland, ...

On Wed, Jun 20, 2007 at 02:00:27PM -0700, Roland Dreier wrote:
>  > +	tail = *(volatile u64 *)pd->port_rcvhdrtail_kvaddr;
> 
> Why is there a volatile here?  cf http://lwn.net/Articles/234017/
> ("volatile considered harmful")

from that article:

- Pointers to data structures in coherent memory which might be modified
  by I/O devices can, sometimes, legitimately be volatile.  A ring buffer
  used by a network adapter, where that adapter changes pointers to
  indicate which descriptors have been processed, is an example of this
  type of situation.

the port_rcvhdrttail_kvaddr is the kernel virtual address
allocated in coherent memory where the header queue is updated
by the chip.  we use volatile to make sure the compiler does
not use stale data...

arthur


From oliver.braun at web.de  Thu Jun 21 09:23:28 2007
From: oliver.braun at web.de (Kelley Spence)
Date: Thu, 21 Jun 2007 15:23:28 -0100
Subject: [ofa-general] Jetzt bestellen und ein blaues Wunder erleben
Message-ID: <01c7b418$1e7ef500$d18780d5@oliver.braun>

Die Pille ist ein wahres Gluck, die Vorhaut geht von selbst zuruck!
Uberraschen Sie doch Ihre Partnerin!
Lust uber zwei Stunden nicht zu kommen?
Nie mehr zu fruh kommen!

- ohne Rezept
- blitzschnelle Lieferung weltweit
- diskreter Versand

www.mokera.hk
Jetzt bestellen - und bis zu 80% sparen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070621/ff89229e/attachment.html>

From pw at osc.edu  Thu Jun 21 08:25:44 2007
From: pw at osc.edu (Pete Wyckoff)
Date: Thu, 21 Jun 2007 11:25:44 -0400
Subject: [ofa-general] hang on close in umem_release
Message-ID: <20070621152544.GA32474@osc.edu>

With 2.6.22-rc5, I get a repeatable D state hang of a user space
process upon termination (ctrl-C).  x86_64 SMP, no preempt.

Here's the sysrq-T trace:

app           D ffff81003ec17220     0  2841   2780 (NOTLB)
 ffff81003cec7d78 0000000000000082 ffffffff80227aa0 ffff81003cec7d78
 ffff81003ec17220 ffffffff804d8380 000000000002a161 ffff81003ec173d0
 0000000000000001 0000000100085088 0000000000000001 ffff81003ff2bb40
Call Trace:
 [<ffffffff80227aa0>] default_wake_function+0x0/0x10
 [<ffffffff8025735d>] unlock_page+0x2d/0x40
 [<ffffffff803f1da5>] __down_write_nested+0x85/0xc0
 [<ffffffff803f1deb>] __down_write+0xb/0x10
 [<ffffffff80245039>] down_write+0x9/0x10
 [<ffffffff880919d5>] :ib_core:ib_umem_release+0x75/0x110
 [<ffffffff880f6f6e>] :ib_mthca:mthca_free_mr+0x6e/0xe0
 [<ffffffff880fdb15>] :ib_mthca:mthca_dereg_mr+0x25/0x40
 [<ffffffff8808defd>] :ib_core:ib_dereg_mr+0x2d/0x40
 [<ffffffff8810e78c>] :ib_uverbs:ib_uverbs_close+0x2ac/0x380
 [<ffffffff80282df3>] __fput+0xb3/0x1a0
 [<ffffffff80282f66>] fput+0x16/0x20
 [<ffffffff8028001b>] filp_close+0x4b/0x80
 [<ffffffff802815ec>] sys_close+0x9c/0x100
 [<ffffffff80209b4e>] system_call+0x7e/0x83

It should have open an fd for the rdmacm event channel, and an fd
for the CQ event channel, but does not have any connected QPs at
this point (although it did in the past) and no registered memory
regions, although maybe the app forgot to free one?

Apparently it is here:

        /*
         * We may be called with the mm's mmap_sem already held.  This
         * can happen when a userspace munmap() is the call that drops
         * the last reference to our file and calls our release
         * method.  If there are memory regions to destroy, we'll end
         * up here and not be able to take the mmap_sem.  In that case
         * we defer the vm_locked accounting to the system workqueue.
         */
        if (context->closing && !down_write_trylock(&mm->mmap_sem)) {
                INIT_WORK(&umem->work, ib_umem_account);
                umem->mm   = mm;
                umem->diff = diff;

                schedule_work(&umem->work);
                return;
        } else 
                down_write(&mm->mmap_sem);

stuck in the down_write on mmap_sem.  Thus context->closing must not
be true.

Is this a known problem?  Is there some more information I can
give you?

		-- Pete


From halr at voltaire.com  Thu Jun 21 08:28:11 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 21 Jun 2007 11:28:11 -0400
Subject: [ofa-general] Re: Re: [PATCH] for-2.6.23 ib/umad: add partition
	support
In-Reply-To: <20070621135120.GH4857@mellanox.co.il>
References: <20070621033854.GF8868@mellanox.co.il>
	<000001c7b3c7$d2fdca20$a3cc180a@amr.corp.intel.com>
	<20070621065731.GJ8868@mellanox.co.il>
	<20070621135120.GH4857@mellanox.co.il>
Message-ID: <1182439686.15653.410799.camel@hal.voltaire.com>

On Thu, 2007-06-21 at 09:51, Michael S. Tsirkin wrote:
> > Quoting Michael S. Tsirkin <mst at dev.mellanox.co.il>:
> > Subject: Re: Re: [PATCH] for-2.6.23 ib/umad: add partition support
> > 
> > > Quoting Sean Hefty <sean.hefty at intel.com>:
> > > Subject: RE: Re: [PATCH] for-2.6.23 ib/umad: add partition support
> > > 
> > > >This assumes an open file desriptor per-pkey, so the proposed API
> > > >extension umad_set_pkey would have to be changed to be per-port rather
> > > >than per-mad. But I think this is a better API, too: most apps
> > > >likely work within a single partition.
> > > 
> > > I don't think this is true for apps that use the userspace MAD interface (e.g.
> > > opensm).
> > 
> > SM (rather, SA) can just open file descriptor per pkey - it created them itself,
> > and there's a small number of partitions.
> > 
> > > Beyond that, this approach doesn't work for receiving MADs on different PKeys.
> > 
> > Yes, it does: we just filter out the MADs where pkey does not match.
> > 
> > I think that most other apps (besides SA) should really treat
> > each partition as a separate network. So getting MADs for a specific
> > pkey, rather than all pkeys, makes total sense to me.
> 
> Hal, could you pls comment on whether this approach will work for opensm?

I will answer at the "high" level rather than some of the details
discussed in previous postings which we may get back to later.

As far as SA is concerned, as all nodes are required to at least support
the limited default partition, the SA uses the full default partition
for communication.

As to other current (and potential) management applications:

PerfMgr will want PMA access on all ports on all nodes. It may also be
constrained to a similar environment as SA (running on a node which
supports the full default partition). If it is not constrained in such a
manner, it needs to be on all partitions in the subnet or it will only
be able to access a portion of the ports in the subnet. That actually
might be a model some might ultimately want.

Diagnostics may be happy with a single partition (or likely the set of
partitions the end node they are running from reside on).

Bottom line is that it can likely work with either model but there are
tradeoffs underneath this "high" level which may not have been
sufficiently explored/discussed as yet.

I'm not sure I like having a different fd per pkey: It's a different
model than currently being used and that would cause more changes to
consumers (as opposed to the other approach) which aren't a clear win to
me (and uses more fds).

-- Hal


From rdreier at cisco.com  Thu Jun 21 08:40:54 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Jun 2007 08:40:54 -0700
Subject: [ofa-general] hang on close in umem_release
In-Reply-To: <20070621152544.GA32474@osc.edu> (Pete Wyckoff's message of "Thu,
	21 Jun 2007 11:25:44 -0400")
References: <20070621152544.GA32474@osc.edu>
Message-ID: <adak5tx71tl.fsf@cisco.com>

hmm, I see what seems to be an "i can't believe it ever worked" type
bug -- if the context is closing but then we do manage to get the mm's
rwsem, it seems like we immediately try to lock it again, which
obviously deadlocks.

Does this patch fix your problem and look correct?

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index b4aec51..d40652a 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -225,13 +225,15 @@ void ib_umem_release(struct ib_umem *umem)
 	 * up here and not be able to take the mmap_sem.  In that case
 	 * we defer the vm_locked accounting to the system workqueue.
 	 */
-	if (context->closing && !down_write_trylock(&mm->mmap_sem)) {
-		INIT_WORK(&umem->work, ib_umem_account);
-		umem->mm   = mm;
-		umem->diff = diff;
-
-		schedule_work(&umem->work);
-		return;
+	if (context->closing) {
+		if (!down_write_trylock(&mm->mmap_sem)) {
+			INIT_WORK(&umem->work, ib_umem_account);
+			umem->mm   = mm;
+			umem->diff = diff;
+
+			schedule_work(&umem->work);
+			return;
+		}
 	} else
 		down_write(&mm->mmap_sem);
 

From arthur.jones at qlogic.com  Thu Jun 21 08:50:05 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Thu, 21 Jun 2007 08:50:05 -0700
Subject: [ofa-general] Re: [PATCH] IB/ipath -- changes in for-roland for
	2.6.23
In-Reply-To: <adaejk68i9w.fsf@cisco.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<adaejk68i9w.fsf@cisco.com>
Message-ID: <20070621155005.GB14817@bauxite.pathscale.com>

hi roland, ...

On Wed, Jun 20, 2007 at 01:47:55PM -0700, Roland Dreier wrote:
> [...]
> But I don't see a MAINTAINERS update (it still lists Bryan O'Sullivan,
> support at pathscale.com and openib.org for the ipath driver).  Also I
> don't see fixes for the smp_mb__after_clear_bit bug pointed out by
> BenH or the bug of setting both _PAGE_NO_CACHE and _PAGE_WRITETHRU on
> powerpc pointed out by paulus.

ok, thanks for the reminder, i've opened an
internal bug for the first issue (MAINTAINERS),
we should have a fix for that soon.  the second
issue (smp_mb__after_clear_bit) has an internal bug
open.  we don't have a fix yet, but we're working
on it (we may be able to remove all that code).
the final issue (powerpc) has an internal bug open,
but hasn't seen any attention for awhile.  i'll see
if i can prod the right people into looking at it...

arthur


From rdreier at cisco.com  Thu Jun 21 09:50:35 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Jun 2007 09:50:35 -0700
Subject: [ofa-general] backups
In-Reply-To: <795c49870706201132r1f7633f8r2cf3cb2a71edc6e0@mail.gmail.com>
	(Jeff Becker's message of "Wed, 20 Jun 2007 11:32:03 -0700")
References: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com>
	<adahcp2a4ol.fsf@cisco.com>
	<795c49870706201132r1f7633f8r2cf3cb2a71edc6e0@mail.gmail.com>
Message-ID: <adafy4l6ylg.fsf@cisco.com>

 > I'm backing up /data/pub/scm. A quick "du -chL" shows it to be 4.2G.
 > Perhaps I only need to backup a subset of /data/pub/scm? Thanks.

Looks like there is plenty of excess stuff there... eg
/data/pub/scm/~mst/linux-2.6 seems to be an partially unpacked
non-naked linux kernel repository (just picking on mst because
/data/pub/scm/~mst is 880M).  We could probably save a lot of space
just keeping on packed copy of Linus's repository and having all other
kernel trees use alternates to point to the objects there.

OTOH it's not work making people spend a lot of effort to clean up too
much, given how cheap disk space is.

 - R.


From sean.hefty at intel.com  Thu Jun 21 10:01:33 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 21 Jun 2007 10:01:33 -0700
Subject: [ofa-general] RE: [PATCH] remove confusing code from udaddy
In-Reply-To: <Pine.LNX.4.64.0706211333310.26603@zuben>
Message-ID: <000001c7b425$d2796830$ff0da8c0@amr.corp.intel.com>

thanks - applied


From pw at osc.edu  Thu Jun 21 10:34:17 2007
From: pw at osc.edu (Pete Wyckoff)
Date: Thu, 21 Jun 2007 13:34:17 -0400
Subject: [ofa-general] hang on close in umem_release
In-Reply-To: <adak5tx71tl.fsf@cisco.com>
References: <20070621152544.GA32474@osc.edu> <adak5tx71tl.fsf@cisco.com>
Message-ID: <20070621173417.GA32573@osc.edu>

rdreier at cisco.com wrote on Thu, 21 Jun 2007 08:40 -0700:
> hmm, I see what seems to be an "i can't believe it ever worked" type
> bug -- if the context is closing but then we do manage to get the mm's
> rwsem, it seems like we immediately try to lock it again, which
> obviously deadlocks.
> 
> Does this patch fix your problem and look correct?

Looks obviously correct and tests okay.  Ctrl-c in any situation
does the right thing now.  Before your refactoring of ib_umem, the
older version of ib_umem_release_on_close() did not have this
trylock optimization.  This new buggy code appears not to have shown
up in any releases yet, fortunately.

Thanks for the quick fix.

		-- Pete


From halr at voltaire.com  Thu Jun 21 10:40:02 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 21 Jun 2007 13:40:02 -0400
Subject: [ofa-general] Re: [PATCHv2] osm: adding root_guid_file and
	cn_guid_file OpenSM options
In-Reply-To: <467A8FFF.2040207@dev.mellanox.co.il>
References: <4675285A.6060309@dev.mellanox.co.il>
	<1182433144.15653.403468.camel@hal.voltaire.com>
	<467A8FFF.2040207@dev.mellanox.co.il>
Message-ID: <1182447578.15653.419478.camel@hal.voltaire.com>

Hi Yevgeny,

On Thu, 2007-06-21 at 10:49, Yevgeny Kliteynik wrote:
> Hi Hal,
> 
> Hal Rosenstock wrote:
> > Hi Yevgeny,
> > 
> > On Sun, 2007-06-17 at 08:26, Yevgeny Kliteynik wrote:
> >> Hi Hal,
> >>
> >> This patch replaces updn_guid_file in the Up/Down routing with
> >> root_guid_file for Up/Down and Fat-Tree routing, and adds a new
> >> option - cn_guid_file for Fat-Tree routing.
> >> OpenSM command line options for these two files are:
> >>
> >>   '-a' or '--root_guid_file' for roots
> >>   '-u' or '--cn_guid_file' for compute nodes
> >>
> >> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> > 
> > This entire patch was rejected when I attempted to apply it. Can you
> > regenerate it ? Thanks.
> 
> Indeed, there were changes in osm_subnet.{c,h} since I've issued this patch.

That wasn't the problem.

> Here's the new one:

This one was rejected too. I hand applied it so please double check it.

Also, I updated the opensm man page for these options.

Thanks.

-- Hal

> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> ---
>   opensm/include/opensm/osm_subnet.h |   12 +++++++++---
>   opensm/opensm/main.c               |   29 ++++++++++++++++++++++-------
>   opensm/opensm/osm_subnet.c         |   25 ++++++++++++++++++-------
>   opensm/opensm/osm_ucast_updn.c     |    6 +++---
>   4 files changed, 52 insertions(+), 20 deletions(-)
> 
> diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
> index b296caf..2ee5689 100644
> --- a/opensm/include/opensm/osm_subnet.h
> +++ b/opensm/include/opensm/osm_subnet.h
> @@ -278,7 +278,8 @@ typedef struct _osm_subn_opt
>     char *                   routing_engine_name;
>     char *                   lid_matrix_dump_file;
>     char *                   ucast_dump_file;
> -  char *                   updn_guid_file;
> +  char *                   root_guid_file;
> +  char *                   cn_guid_file;
>     char *                   sa_db_file;
>     boolean_t                exit_on_fatal;
>     boolean_t                honor_guid2lid_file;
> @@ -452,8 +453,13 @@ typedef struct _osm_subn_opt
>   *		Name of the unicast routing dump file from where switch
>   *		forwarding tables will be loaded
>   *
> -*	updn_guid_file
> -*		Pointer to name of the UPDN guid file given by User
> +*	root_guid_file
> +*		Name of the file that contains list of root guids that
> +*		will be used by fat-tree or up/dn routing (provided by User)
> +*
> +*	cn_guid_file
> +*		Name of the file that contains list of compute node guids that
> +*		will be used by fat-tree routing (provided by User)
>   *
>   *	sa_db_file
>   *		Name of the SA database file.
> diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
> index 6b4cb4f..d17a994 100644
> --- a/opensm/opensm/main.c
> +++ b/opensm/opensm/main.c
> @@ -189,8 +189,14 @@ show_usage(void)
>             "          This option specifies the name of the SA DB dump file\n"
>             "          from where SA database will be loaded.\n\n");
>     printf ("-a\n"
> -          "--add_guid_file <path to file>\n"
> -          "          Set the root nodes for the Up/Down routing algorithm\n"
> +          "--root_guid_file <path to file>\n"
> +          "          Set the root nodes for the Up/Down or Fat-Tree routing\n"
> +          "          algorithm to the guids provided in the given file (one\n"
> +          "          to a line)\n"
> +          "\n");
> +  printf ("-u\n"
> +          "--cn_guid_file <path to file>\n"
> +          "          Set the compute nodes for the Fat-Tree routing algorithm\n"
>             "          to the guids provided in the given file (one to a line)\n"
>             "\n");
>     printf( "-o\n"
> @@ -585,7 +591,7 @@ main(
>     char                 *ignore_guids_file_name = NULL;
>     uint32_t              val;
>     const char * const    short_option =
> -	  "i:f:ed:g:l:L:s:t:a:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:";
> +	  "i:f:ed:g:l:L:s:t:a:u:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:";
> 
>     /*
>       In the array below, the 2nd parameter specifies the number
> @@ -622,7 +628,8 @@ main(
>         {  "lid_matrix_file",1, NULL, 'M'},
>         {  "ucast_file",    1, NULL, 'U'},
>         {  "sadb_file",     1, NULL, 'S'},
> -      {  "add_guid_file", 1, NULL, 'a'},
> +      {  "root_guid_file",1, NULL, 'a'},
> +      {  "cn_guid_file",  1, NULL, 'u'},
>         {  "cache-options", 0, NULL, 'c'},
>         {  "stay_on_fatal", 0, NULL, 'y'},
>         {  "honor_guid2lid",0, NULL, 'x'},
> @@ -886,10 +893,18 @@ main(
> 
>       case 'a':
>         /*
> -        Specifies port guids file
> +        Specifies root guids file
> +      */
> +      opt.root_guid_file = optarg;
> +      printf (" Root Guid File: %s\n", opt.root_guid_file );
> +      break;
> +
> +    case 'u':
> +      /*
> +        Specifies compute node guids file
>         */
> -      opt.updn_guid_file = optarg;
> -      printf (" UPDN Guid File: %s\n", opt.updn_guid_file );
> +      opt.cn_guid_file = optarg;
> +      printf (" Compute Node Guid File: %s\n", opt.cn_guid_file );
>         break;
> 
>       case 'c':
> diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
> index 5a79149..7a223e3 100644
> --- a/opensm/opensm/osm_subnet.c
> +++ b/opensm/opensm/osm_subnet.c
> @@ -502,7 +502,8 @@ osm_subn_set_default_opt(
>     p_opt->routing_engine_name = NULL;
>     p_opt->lid_matrix_dump_file = NULL;
>     p_opt->ucast_dump_file = NULL;
> -  p_opt->updn_guid_file = NULL;
> +  p_opt->root_guid_file = NULL;
> +  p_opt->cn_guid_file = NULL;
>     p_opt->sa_db_file = NULL;
>     p_opt->exit_on_fatal = TRUE;
>     p_opt->enable_quirks = FALSE;
> @@ -1325,8 +1326,12 @@ osm_subn_parse_conf_file(
>           p_key, p_val, &p_opts->ucast_dump_file);
> 
>         __osm_subn_opts_unpack_charp(
> -        "updn_guid_file",
> -        p_key, p_val, &p_opts->updn_guid_file);
> +        "root_guid_file",
> +        p_key, p_val, &p_opts->root_guid_file);
> +
> +      __osm_subn_opts_unpack_charp(
> +        "cn_guid_file",
> +        p_key, p_val, &p_opts->cn_guid_file);
> 
>         __osm_subn_opts_unpack_charp(
>           "sa_db_file",
> @@ -1550,12 +1555,18 @@ osm_subn_write_conf_file(
>                "# Ucast dump file name\n"
>                "ucast_dump_file %s\n\n",
>                p_opts->ucast_dump_file);
> -  if (p_opts->updn_guid_file)
> +  if (p_opts->root_guid_file)
> +    fprintf( opts_file,
> +             "# The file holding the root node guids (for fat-tree or Up/Down)\n"
> +             "# One guid in each line\n"
> +             "root_guid_file %s\n\n",
> +             p_opts->root_guid_file);
> +  if (p_opts->cn_guid_file)
>       fprintf( opts_file,
> -             "# The file holding the Up/Down root node guids\n"
> +             "# The file holding the fat-tree compute node guids\n"
>                "# One guid in each line\n"
> -             "updn_guid_file %s\n\n",
> -             p_opts->updn_guid_file);
> +             "cn_guid_file %s\n\n",
> +             p_opts->cn_guid_file);
>     if (p_opts->sa_db_file)
>       fprintf( opts_file,
>                "# SA database file name\n"
> diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c
> index 2448246..af5ee4e 100644
> --- a/opensm/opensm/osm_ucast_updn.c
> +++ b/opensm/opensm/osm_ucast_updn.c
> @@ -311,10 +311,10 @@ updn_init(
>        Check the source for root node list, if file parse it, otherwise
>        wait for a callback to activate auto detection
>     */
> -  if (p_osm->subn.opt.updn_guid_file)
> +  if (p_osm->subn.opt.root_guid_file)
>     {
>       status = osm_ucast_mgr_read_guid_file( &p_osm->sm.ucast_mgr,
> -                                           p_osm->subn.opt.updn_guid_file,
> +                                           p_osm->subn.opt.root_guid_file,
>                                              p_updn->p_root_nodes );
>       if (status != IB_SUCCESS)
>          goto Exit;
> @@ -323,7 +323,7 @@ updn_init(
>       osm_log( &p_osm->log, OSM_LOG_DEBUG,
>                "updn_init: "
>                "UPDN - Fetching root nodes from file %s\n",
> -             p_osm->subn.opt.updn_guid_file );
> +             p_osm->subn.opt.root_guid_file );
>       guid_iterator = cl_list_head(p_updn->p_root_nodes);
>       while( guid_iterator != cl_list_end(p_updn->p_root_nodes) )
>       {


From halr at voltaire.com  Thu Jun 21 10:43:32 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 21 Jun 2007 13:43:32 -0400
Subject: [ofa-general] Re: [PATCH] osm: cosmetics in ftree - added get_guid
	functions for switch and hca
In-Reply-To: <4678DA83.2050700@dev.mellanox.co.il>
References: <4678DA83.2050700@dev.mellanox.co.il>
Message-ID: <1182447627.15653.419564.camel@hal.voltaire.com>

Hi again Yevgeny,

On Wed, 2007-06-20 at 03:42, Yevgeny Kliteynik wrote:
> Hi Hal,
> 
> Cosmetic code changes in fat-tree:
> added get_guid_ho and get_guid_no functions for switches and hca's
> 
> -- Yevgeny
> 
> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

This patch won't apply either. I'm not sure I want to hand edit these
changes in. Can you try it and see if it works for you ?

Thanks.

-- Hal

> ---
>   opensm/opensm/osm_ucast_ftree.c |   77 +++++++++++++++++++++++++++++----------
>   1 files changed, 58 insertions(+), 19 deletions(-)
> 
> diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c
> index 1ead199..1ae8b29 100644
> --- a/opensm/opensm/osm_ucast_ftree.c
> +++ b/opensm/opensm/osm_ucast_ftree.c
> @@ -640,6 +640,26 @@ __osm_ftree_sw_destroy(
> 
>   /***************************************************/
> 
> +static uint64_t
> +__osm_ftree_sw_get_guid_no(
> +   IN  ftree_sw_t * p_sw)
> +{
> +   if (!p_sw)
> +      return 0;
> +   return osm_node_get_node_guid(p_sw->p_osm_sw->p_node);
> +}
> +
> +/***************************************************/
> +
> +static uint64_t
> +__osm_ftree_sw_get_guid_ho(
> +   IN  ftree_sw_t * p_sw)
> +{
> +   return cl_ntoh64(__osm_ftree_sw_get_guid_no(p_sw));
> +}
> +
> +/***************************************************/
> +
>   static void
>   __osm_ftree_sw_dump(
>      IN  ftree_fabric_t * p_ftree,
> @@ -657,7 +677,7 @@ __osm_ftree_sw_dump(
>              "__osm_ftree_sw_dump: "
>              "Switch index: %s, GUID: 0x%016" PRIx64 ", Ports: %u DOWN, %u UP\n",
>             __osm_ftree_tuple_to_str(p_sw->tuple),
> -          cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
> +          __osm_ftree_sw_get_guid_ho(p_sw),
>             p_sw->down_port_groups_num,
>             p_sw->up_port_groups_num);
> 
> @@ -835,6 +855,26 @@ __osm_ftree_hca_destroy(
> 
>   /***************************************************/
> 
> +static uint64_t
> +__osm_ftree_hca_get_guid_no(
> +   IN  ftree_hca_t * p_hca)
> +{
> +   if (!p_hca)
> +      return 0;
> +   return osm_node_get_node_guid(p_hca->p_osm_node);
> +}
> +
> +/***************************************************/
> +
> +static uint64_t
> +__osm_ftree_hca_get_guid_ho(
> +   IN  ftree_hca_t * p_hca)
> +{
> +   return cl_ntoh64(__osm_ftree_hca_get_guid_no(p_hca));
> +}
> +
> +/***************************************************/
> +
>   static void
>   __osm_ftree_hca_dump(
>      IN  ftree_fabric_t * p_ftree,
> @@ -851,7 +891,7 @@ __osm_ftree_hca_dump(
>      osm_log(&p_ftree->p_osm->log, OSM_LOG_DEBUG,
>              "__osm_ftree_hca_dump: "
>              "CA GUID: 0x%016" PRIx64 ", Ports: %u UP\n",
> -          cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)),
> +          __osm_ftree_hca_get_guid_ho(p_hca),
>             p_hca->up_port_groups_num);
> 
>      for( i = 0; i < p_hca->up_port_groups_num; i++ )
> @@ -1214,7 +1254,7 @@ __osm_ftree_fabric_dump_general_info(
>                  osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE,
>                          "__osm_ftree_fabric_dump_general_info: "
>                          "      GUID: 0x%016" PRIx64 ", LID: 0x%x, Index %s\n",
> -                       cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
> +                       __osm_ftree_sw_get_guid_ho(p_sw),
>                          cl_ntoh16(p_sw->base_lid),
>                          __osm_ftree_tuple_to_str(p_sw->tuple));
>         }
> @@ -1227,8 +1267,7 @@ __osm_ftree_fabric_dump_general_info(
>               osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE,
>                       "__osm_ftree_fabric_dump_general_info: "
>                       "      GUID: 0x%016" PRIx64 ", LID: 0x%x, Index %s\n",
> -                    cl_ntoh64(osm_node_get_node_guid(
> -                              p_ftree->leaf_switches[i]->p_osm_sw->p_node)),
> +                    __osm_ftree_sw_get_guid_ho(p_ftree->leaf_switches[i]),
>                       cl_ntoh16(p_ftree->leaf_switches[i]->base_lid),
>                       __osm_ftree_tuple_to_str(p_ftree->leaf_switches[i]->tuple));
>         }
> @@ -1442,7 +1481,7 @@ __osm_ftree_fabric_make_indexing(
>              p_sw->rank,
>              __osm_ftree_tuple_to_str(p_sw->tuple),
>              cl_ntoh16(p_sw->base_lid),
> -           cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)));
> +           __osm_ftree_sw_get_guid_ho(p_sw));
> 
>      /*
>       * Now run BFS and assign indexes to all switches
> @@ -1617,11 +1656,11 @@ __osm_ftree_fabric_validate_topology(
>                       "ERR AB09: Different number of upward port groups on switches:\n"
>                       "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u groups\n"
>                       "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u groups\n",
> -                    cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)),
> +                    __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]),
>                       cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid),
>                       __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple),
>                       reference_sw_arr[p_sw->rank]->up_port_groups_num,
> -                    cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
> +                    __osm_ftree_sw_get_guid_ho(p_sw),
>                       cl_ntoh16(p_sw->base_lid),
>                       __osm_ftree_tuple_to_str(p_sw->tuple),
>                       p_sw->up_port_groups_num);
> @@ -1638,11 +1677,11 @@ __osm_ftree_fabric_validate_topology(
>                       "ERR AB0A: Different number of downward port groups on switches:\n"
>                       "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u port groups\n"
>                       "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u port groups\n",
> -                    cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)),
> +                    __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]),
>                       cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid),
>                       __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple),
>                       reference_sw_arr[p_sw->rank]->down_port_groups_num,
> -                    cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
> +                    __osm_ftree_sw_get_guid_ho(p_sw),
>                       cl_ntoh16(p_sw->base_lid),
>                       __osm_ftree_tuple_to_str(p_sw->tuple),
>                       p_sw->down_port_groups_num);
> @@ -1663,11 +1702,11 @@ __osm_ftree_fabric_validate_topology(
>                              "ERR AB0B: Different number of ports in an upward port group on switches:\n"
>                              "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n"
>                              "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n",
> -                           cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)),
> +                           __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]),
>                              cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid),
>                              __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple),
>                              cl_ptr_vector_get_size(&p_ref_group->ports),
> -                           cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
> +                           __osm_ftree_sw_get_guid_ho(p_sw),
>                              cl_ntoh16(p_sw->base_lid),
>                              __osm_ftree_tuple_to_str(p_sw->tuple),
>                              cl_ptr_vector_get_size(&p_group->ports));
> @@ -1691,11 +1730,11 @@ __osm_ftree_fabric_validate_topology(
>                              "ERR AB0C: Different number of ports in an downward port group on switches:\n"
>                              "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n"
>                              "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n",
> -                           cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)),
> +                           __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]),
>                              cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid),
>                              __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple),
>                              cl_ptr_vector_get_size(&p_ref_group->ports),
> -                           cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
> +                           __osm_ftree_sw_get_guid_ho(p_sw),
>                              cl_ntoh16(p_sw->base_lid),
>                              __osm_ftree_tuple_to_str(p_sw->tuple),
>                              cl_ptr_vector_get_size(&p_group->ports));
> @@ -2508,7 +2547,7 @@ __osm_ftree_rank_leaf_switches(
>                       "__osm_ftree_rank_leaf_switches: ERR AB0F: "
>                       "CA conected directly to another CA: "
>                       "0x%016" PRIx64 " <---> 0x%016" PRIx64 "\n",
> -                    cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)),
> +                    __osm_ftree_hca_get_guid_ho(p_hca),
>                       cl_ntoh64(osm_node_get_node_guid(p_remote_osm_node)));
>               res = -1;
>               goto Exit;
> @@ -2548,8 +2587,8 @@ __osm_ftree_rank_leaf_switches(
>                 "                                            - CA guid    : 0x%016" PRIx64 "\n"
>                 "                                            - Switch guid: 0x%016" PRIx64 "\n"
>                 "                                            - Switch LID : 0x%x\n",
> -              cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)),
> -              cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
> +              __osm_ftree_hca_get_guid_ho(p_hca),
> +              __osm_ftree_sw_get_guid_ho(p_sw),
>                 cl_ntoh16(p_sw->base_lid));
>         cl_list_insert_tail(p_ranking_bfs_list,
>                             &__osm_ftree_sw_tbl_element_create(p_sw)->map_item);
> @@ -2740,10 +2779,10 @@ __osm_ftree_fabric_construct_sw_ports(
>                          "       GUID 0x%016" PRIx64 ", LID 0x%x, rank %u\n",
>                          p_sw->rank,
>                          p_remote_sw->rank,
> -                       cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
> +                       __osm_ftree_sw_get_guid_ho(p_sw),
>                          cl_ntoh16(p_sw->base_lid),
>                          p_sw->rank,
> -                       cl_ntoh64(osm_node_get_node_guid(p_remote_sw->p_osm_sw->p_node)),
> +                       __osm_ftree_sw_get_guid_ho(p_remote_sw),
>                          cl_ntoh16(p_remote_sw->base_lid),
>                          p_remote_sw->rank);
>                  res = -1;


From rdreier at cisco.com  Thu Jun 21 11:01:46 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Jun 2007 11:01:46 -0700
Subject: [ofa-general] hang on close in umem_release
In-Reply-To: <20070621173417.GA32573@osc.edu> (Pete Wyckoff's message of "Thu,
	21 Jun 2007 13:34:17 -0400")
References: <20070621152544.GA32474@osc.edu> <adak5tx71tl.fsf@cisco.com>
	<20070621173417.GA32573@osc.edu>
Message-ID: <ada7ipx6vat.fsf@cisco.com>

 > Looks obviously correct and tests okay.  Ctrl-c in any situation
 > does the right thing now.  Before your refactoring of ib_umem, the
 > older version of ib_umem_release_on_close() did not have this
 > trylock optimization.  This new buggy code appears not to have shown
 > up in any releases yet, fortunately.

Thanks, I will add it to my queue of things to get Linus to pull soon.

It is true that this was introduced by my recent refactoring -- a
silly careless mistake.

 - R.


From rdreier at cisco.com  Thu Jun 21 11:14:23 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Jun 2007 11:14:23 -0700
Subject: [ofa-general] Re: [PATCH 24/28] IB/ipath - ipath_poll fixups and
	enhancements
In-Reply-To: <20070621152312.GA14817@bauxite.pathscale.com> (Arthur Jones's
	message of "Thu, 21 Jun 2007 08:23:12 -0700")
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com>
	<ada1wg68hp0.fsf@cisco.com>
	<20070621152312.GA14817@bauxite.pathscale.com>
Message-ID: <aday7id5g5c.fsf@cisco.com>

 > the port_rcvhdrttail_kvaddr is the kernel virtual address
 > allocated in coherent memory where the header queue is updated
 > by the chip.  we use volatile to make sure the compiler does
 > not use stale data...

OK, fair enough, although it seems you may be missing some memory
barriers to make sure you don't run into the CPU reordering accesses
to the head/tail pointers.

 - R.


From jeff at splitrockpr.com  Thu Jun 21 11:15:32 2007
From: jeff at splitrockpr.com (Jeffrey Scott)
Date: Thu, 21 Jun 2007 11:15:32 -0700
Subject: [ofa-general] request for OFA newsletter content
Message-ID: <97FBC79001FB45E1A85AA08BFAA282E1@Gaucho>

All-

The first installment of OFA's quarterly newsletter will be distributed in
the next 2-3 weeks.  Content is due to me by June 28.  Although we are
starting out quarterly, we may eventually distribute the newsletter more
frequently, depending on feedback.  The newsletter is designed to keep the
OFA community updated on the latest OFA news, information, events and
development progress.  Of course, you should feel free to forward the
newsletter to anyone outside the OFA community.

 
We have already approached the Working Group chairs about providing content
for the first issue.  However, the newsletter is open to everyone in the OFA
community.  If any community member would like to submit content, we
strongly encourage you to do so.  Broad involvement will help make the
newsletter more valuable.  Just send me your name, contact information,
topic and a brief 1-2 paragraph "article" about any project you're working
on, issues that you're concerned about, events that you're participating in,
or anything else on your mind.  Please do NOT give us content that promotes
companies or products.  The newsletter is all about the OFA.

 
Thanks!

Jeff

 
-----------------------------------

Jeffrey Scott

Split Rock Communications

 
408-884-4017

202-903-6057 Mobile

408-884-3900 Fax

www.SplitRockPR.com

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070621/871a0394/attachment.html>

From mst at dev.mellanox.co.il  Thu Jun 21 11:47:38 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 21 Jun 2007 21:47:38 +0300
Subject: [ofa-general] Re: backups
In-Reply-To: <adafy4l6ylg.fsf@cisco.com>
References: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com>
	<adahcp2a4ol.fsf@cisco.com>
	<795c49870706201132r1f7633f8r2cf3cb2a71edc6e0@mail.gmail.com>
	<adafy4l6ylg.fsf@cisco.com>
Message-ID: <20070621184738.GI4857@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: backups
> 
>  > I'm backing up /data/pub/scm. A quick "du -chL" shows it to be 4.2G.
>  > Perhaps I only need to backup a subset of /data/pub/scm? Thanks.
> 
> Looks like there is plenty of excess stuff there... eg
> /data/pub/scm/~mst/linux-2.6 seems to be an partially unpacked
> non-naked linux kernel repository (just picking on mst because
> /data/pub/scm/~mst is 880M).

OK, I killed the files themselves and I've run git repack there,
this seems to have freed up some 200M.
The repo itself has some of my development bits though.

> We could probably save a lot of space
> just keeping on packed copy of Linus's repository and having all other
> kernel trees use alternates to point to the objects there.

Since we really want to save *backup* space, a better strategy would be to use
git clone instead of plain cp, and use alternates and aggressive packing there.

> OTOH it's not work making people spend a lot of effort to clean up too
> much, given how cheap disk space is.

Right. My cell phone has 1G flash storage.

-- 
MST


From rdreier at cisco.com  Thu Jun 21 11:51:55 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Jun 2007 11:51:55 -0700
Subject: [ofa-general] Stringify ibv_event_type
In-Reply-To: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com> (Jeff Squyres's
	message of "Thu, 21 Jun 2007 10:09:23 -0400")
References: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com>
Message-ID: <adalked5ees.fsf@cisco.com>

 > Could a function to stringify the ibv_event_type enum can be added to
 > libibverbs?  It could be similar to the event_name_str() function in
 > libibverbs/examples/asyncwatch.c:

Seems reasonable.  I guess if you have that, then you probably want
strings for enum ibv_wc_status too.  Any other enums you would want to
stringify?

Also, I think this could be added to the libibverbs 1.1 stable line,
since it's a completely new API, and easy to test for with autoconf, right?

 - R.


From mst at dev.mellanox.co.il  Thu Jun 21 11:55:33 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 21 Jun 2007 21:55:33 +0300
Subject: [ofa-general] Re: Stringify ibv_event_type
In-Reply-To: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com>
References: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com>
Message-ID: <20070621185533.GJ4857@mellanox.co.il>

> Quoting Jeff Squyres <jsquyres at cisco.com>:
> Subject: Stringify ibv_event_type
> 
> Could a function to stringify the ibv_event_type enum can be added to  
> libibverbs?  It could be similar to the event_name_str() function in  
> libibverbs/examples/asyncwatch.c:
> 
> -----
> static const char *event_name_str(enum ibv_event_type event_type)
> {
>         switch (event_type) {
>         case IBV_EVENT_DEVICE_FATAL:
>                 return "IBV_EVENT_DEVICE_FATAL";
> ...etc.
> -----
> 
> Rationale: if multiple client apps (such as the OF-based MPI  
> implementations) start using the asynch events and there is no  
> central function for string-ifying the event enum, they'll all end up  
> doing the translation themselves when printing out error messages.   
> It's not a huge amount of code, but it does seem kinda odd to make  
> everyone replicate essentially the same stuff.  Additionally, the  
> available enum values may grow over time, forcing client apps to  
> figure out which ones are available and adjust their event_name_str()  
> equivalent as appropriate.  Hiding the possibility of change down in  
> libibverbs seems appropriate.

I have no strong opinion either way, but I do wonder why do you find this useful?

Asyncwatch is just an example: it does not actually *do anything* on an event,
so it calls printf. But, is it likely that enduser really needs to see
IBV_EVENT_CLIENT_REREGISTER? Printing out the numerc value seems
sufficient for debug.

-- 
MST


From jsquyres at cisco.com  Thu Jun 21 11:59:47 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Thu, 21 Jun 2007 14:59:47 -0400
Subject: [ofa-general] Stringify ibv_event_type
In-Reply-To: <adalked5ees.fsf@cisco.com>
References: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com>
	<adalked5ees.fsf@cisco.com>
Message-ID: <7AAF2612-2BDA-4551-9E4C-B5FA0ED490CF@cisco.com>

On Jun 21, 2007, at 2:51 PM, Roland Dreier wrote:

>> Could a function to stringify the ibv_event_type enum can be added to
>> libibverbs?  It could be similar to the event_name_str() function in
>> libibverbs/examples/asyncwatch.c:
>
> Seems reasonable.  I guess if you have that, then you probably want
> strings for enum ibv_wc_status too.  Any other enums you would want to
> stringify?

I think those 2 would be great.

> Also, I think this could be added to the libibverbs 1.1 stable line,
> since it's a completely new API, and easy to test for with  
> autoconf, right?

Perfect.

-- 
Jeff Squyres
Cisco Systems


From DavidRobb at comsci.co.uk  Thu Jun 21 12:00:38 2007
From: DavidRobb at comsci.co.uk (David Robb)
Date: Thu, 21 Jun 2007 20:00:38 +0100
Subject: [ofa-general] Infiniband Problems
Message-ID: <467ACAD6.8000304@comsci.co.uk>

Hi Folks,

I have inherited responsibility for the comms subsystem on a 28 node 
high performance signal processing cluster inter connected with Infiniband.

Being new to this technology, I have been trying to read and learn as 
much as possible but am having a few specific problems. Any help or 
pointers in the right direction would be greatly appreciated.

1. Sometimes observe RDMA data transfer stalls of ~ 1.0 second

I have written an RDMA transfer unit test that transfers 10000 packets 
from one node to another and times the performance. Mostly this happens 
with a loop iteration of the order of 30uS, but occasionally, I observe 
times of 500,000 to 1,100,000uS for one packet. I don't think it's a 
problem with our queuing layer ( If I remove the call to 
ibv_post_send(...) then no stall is observed). I don't think it is a 
problem with the CPU stalling as I created a separate worker thread that 
does something else and times the loop and this does not exhibit any 
stalls. Any suggestions where to look next?


2. Creation of a Queue Pair is rejected when I have mapped a region of 
memory greater than about 1.35GB.

Ideally, we would like the to be able to write anywhere within a 2GB (or 
larger) shared memory segment. However, when I attempt to do this, the 
call to fails with REJ. Further reading around the subject, suggests 
that this may be due to the VPTT (Virtual to Physical Translation Table) 
resources required for mapping such a large memory area. Can anyone 
confirm this hypothesis? Even if we get this to work, will we suffer 
performance problems by using such a large memory area? Are there any 
workarounds?

Many thanks,

David Robb


Device and Environment Information follows:-

OS Kernel

bash-3.00$ uname -a
Linux qinetiq01 2.6.20.1-clustervision-142_cvos #1 SMP Tue Mar 6 
00:19:24 GMT 2007 x86_64 x86_64 x86_64 GNU/Linux

OFED library version 1.1

ibv_devinfo -v output:-
hca_id: mthca0
fw_ver: 1.1.0
node_guid: 0002:c902:0023:a1d8
sys_image_guid: 0002:c902:0023:a1db
vendor_id: 0x02c9
vendor_part_id: 25204
hw_ver: 0xA0
board_id: MT_03B0140002
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffff000
max_qp: 64512
max_qp_wr: 16384
device_cap_flags: 0x00001c76
max_sge: 30
max_sge_rd: 0
max_cq: 65408
max_cqe: 131071
max_mr: 131056
max_pd: 32764
max_qp_rd_atom: 4
max_ee_rd_atom: 0
max_res_rd_atom: 258048
max_qp_init_rd_atom: 128
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
max_ee: 0
max_rdd: 0
max_mw: 0
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 8192
max_mcast_qp_attach: 8
max_total_mcast_qp_attach: 65536
max_ah: 0
max_fmr: 0
max_srq: 960
max_srq_wr: 16384
max_srq_sge: 30
max_pkeys: 64
local_ca_ack_delay: 15
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid: 1
port_lmc: 0x00
max_msg_sz: 0x80000000
port_cap_flags: 0x02510a6a
max_vl_num: 3
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 64
gid_tbl_len: 32
subnet_timeout: 18
init_type_reply: 0
active_width: 4X (2)
active_speed: 5.0 Gbps (2)
phys_state: LINK_UP (5)
GID[ 0]: fe80:0000:0000:0000:0002:c902:0023:a1d9

Switches are "MT47396 Infiniscale-III Mellanox Technologies


From swise at opengridcomputing.com  Thu Jun 21 12:07:17 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 21 Jun 2007 14:07:17 -0500
Subject: [ofa-general] Stringify ibv_event_type
In-Reply-To: <adalked5ees.fsf@cisco.com>
References: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com>
	<adalked5ees.fsf@cisco.com>
Message-ID: <467ACC65.4020106@opengridcomputing.com>

Roland Dreier wrote:
>  > Could a function to stringify the ibv_event_type enum can be added to
>  > libibverbs?  It could be similar to the event_name_str() function in
>  > libibverbs/examples/asyncwatch.c:
> 
> Seems reasonable.  I guess if you have that, then you probably want
> strings for enum ibv_wc_status too.  Any other enums you would want to
> stringify?
> 

the rdmacm stuff too!


From mst at dev.mellanox.co.il  Thu Jun 21 12:07:52 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 21 Jun 2007 22:07:52 +0300
Subject: [ofa-general] Re: Infiniband Problems
In-Reply-To: <467ACAD6.8000304@comsci.co.uk>
References: <467ACAD6.8000304@comsci.co.uk>
Message-ID: <20070621190752.GK4857@mellanox.co.il>

> ibv_devinfo -v output:-
> hca_id: mthca0
> fw_ver: 1.1.0

I might make sense to upgrade to 1.2.0, there's a chance some
speed issues are fixed there.

http://www.mellanox.com/support/firmware_table_IH3Lx.php

-- 
MST


From rdreier at cisco.com  Thu Jun 21 12:08:48 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Jun 2007 12:08:48 -0700
Subject: [ofa-general] Stringify ibv_event_type
In-Reply-To: <467ACC65.4020106@opengridcomputing.com> (Steve Wise's message of
	"Thu, 21 Jun 2007 14:07:17 -0500")
References: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com>
	<adalked5ees.fsf@cisco.com> <467ACC65.4020106@opengridcomputing.com>
Message-ID: <adazm2t3z27.fsf@cisco.com>

 > the rdmacm stuff too!

which stuff is that?  Is it from librdmacm?  If so that's a different
package and therefore a different change to make.


From rdreier at cisco.com  Thu Jun 21 12:12:53 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Jun 2007 12:12:53 -0700
Subject: [ofa-general] Infiniband Problems
In-Reply-To: <467ACAD6.8000304@comsci.co.uk> (David Robb's message of "Thu,
	21 Jun 2007 20:00:38 +0100")
References: <467ACAD6.8000304@comsci.co.uk>
Message-ID: <adavedh3yve.fsf@cisco.com>

 > 1. Sometimes observe RDMA data transfer stalls of ~ 1.0 second

Could it be an RNR NAK?  You didn't really describe your protocol, but
if you use send operations and if you do a send without a matching
receive on the other side, then you might end up stalling the QP for a
while.

 > 2. Creation of a Queue Pair is rejected when I have mapped a region of
 > memory greater than about 1.35GB.

I don't really understand this problem.  Are you able to map more
memory, and then ibv_create_qp() fails if you do?  Later you say

 > Ideally, we would like the to be able to write anywhere within a 2GB
 > (or larger) shared memory segment. However, when I attempt to do this,
 > the call to fails with REJ.

You didn't say which call fails with REJ, and I'm not even sure I
understand what it means to "fail with REJ".

On x86-64, the limit on how much memory you can register should be
much higher, closer to 32 GB by default.

 - R.


From DavidRobb at comsci.co.uk  Thu Jun 21 12:37:41 2007
From: DavidRobb at comsci.co.uk (David Robb)
Date: Thu, 21 Jun 2007 20:37:41 +0100
Subject: [ofa-general] Infiniband Problems
In-Reply-To: <adavedh3yve.fsf@cisco.com>
References: <467ACAD6.8000304@comsci.co.uk> <adavedh3yve.fsf@cisco.com>
Message-ID: <467AD385.3040500@comsci.co.uk>


Roland Dreier wrote:
>  > 1. Sometimes observe RDMA data transfer stalls of ~ 1.0 second
>
> Could it be an RNR NAK?  You didn't really describe your protocol, but
> if you use send operations and if you do a send without a matching
> receive on the other side, then you might end up stalling the QP for a
> while.
>   
Quite possibly, we are using an IBV_QPT_RC transport type. The code 
simply adds another work request with ibv_post_srq_recv(...) after each 
packet is processed. Am I correct in thinking it should start out with a 
stack of work requests in case another packet arrives before the current 
one has been processed?
>  > 2. Creation of a Queue Pair is rejected when I have mapped a region of
>  > memory greater than about 1.35GB.
>
> I don't really understand this problem.  Are you able to map more
> memory, and then ibv_create_qp() fails if you do?  Later you say
>
>  > Ideally, we would like the to be able to write anywhere within a 2GB
>  > (or larger) shared memory segment. However, when I attempt to do this,
>  > the call to fails with REJ.
>
> You didn't say which call fails with REJ, and I'm not even sure I
> understand what it means to "fail with REJ".
>   
Sorry, I meant to look up in my source code which call was failing but 
forgot to paste it into the question. Yes, I can map 2GB of memory but 
the call to ibv_create_qp() fails with REJ
> On x86-64, the limit on how much memory you can register should be
> much higher, closer to 32 GB by default.
>   
That's reassuring. Are there any performance penalties for mapping a 
larger region than a smaller region?

>  - R.
>   
Many thanks for the speedy response.

David Robb


From DavidRobb at comsci.co.uk  Thu Jun 21 12:42:07 2007
From: DavidRobb at comsci.co.uk (David Robb)
Date: Thu, 21 Jun 2007 20:42:07 +0100
Subject: [ofa-general] Re: Infiniband Problems
In-Reply-To: <20070621190752.GK4857@mellanox.co.il>
References: <467ACAD6.8000304@comsci.co.uk>
	<20070621190752.GK4857@mellanox.co.il>
Message-ID: <467AD48F.3090901@comsci.co.uk>

Thanks for the pointer. Upgrading probably does make sense and does not 
look too difficult.

David Robb


Michael S. Tsirkin wrote:
>> ibv_devinfo -v output:-
>> hca_id: mthca0
>> fw_ver: 1.1.0
>>     
>
> I might make sense to upgrade to 1.2.0, there's a chance some
> speed issues are fixed there.
>
> http://www.mellanox.com/support/firmware_table_IH3Lx.php
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070621/a9826b1a/attachment.html>

From rdreier at cisco.com  Thu Jun 21 12:53:28 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Jun 2007 12:53:28 -0700
Subject: [ofa-general] Infiniband Problems
In-Reply-To: <467AD385.3040500@comsci.co.uk> (David Robb's message of "Thu,
	21 Jun 2007 20:37:41 +0100")
References: <467ACAD6.8000304@comsci.co.uk> <adavedh3yve.fsf@cisco.com>
	<467AD385.3040500@comsci.co.uk>
Message-ID: <adamyyt3wzr.fsf@cisco.com>

 > Quite possibly, we are using an IBV_QPT_RC transport type. The code
 > simply adds another work request with ibv_post_srq_recv(...) after
 > each packet is processed. Am I correct in thinking it should start out
 > with a stack of work requests in case another packet arrives before
 > the current one has been processed?

That seems a lot more sensible to me.

 > Sorry, I meant to look up in my source code which call was failing but
 > forgot to paste it into the question. Yes, I can map 2GB of memory but
 > the call to ibv_create_qp() fails with REJ

Not sure what you mean ... ibv_create_qp() just returns a pointer or
NULL.  What does it mean to "fail with REJ?"

 > That's reassuring. Are there any performance penalties for mapping a
 > larger region than a smaller region?

Not really beyond the general cost of using more memory rather than less.

 - R.


From DavidRobb at comsci.co.uk  Thu Jun 21 13:05:15 2007
From: DavidRobb at comsci.co.uk (David Robb)
Date: Thu, 21 Jun 2007 21:05:15 +0100
Subject: [ofa-general] Infiniband Problems
In-Reply-To: <adamyyt3wzr.fsf@cisco.com>
References: <467ACAD6.8000304@comsci.co.uk>
	<adavedh3yve.fsf@cisco.com>	<467AD385.3040500@comsci.co.uk>
	<adamyyt3wzr.fsf@cisco.com>
Message-ID: <467AD9FB.1030508@comsci.co.uk>


Roland Dreier wrote:
>  > Quite possibly, we are using an IBV_QPT_RC transport type. The code
>  > simply adds another work request with ibv_post_srq_recv(...) after
>  > each packet is processed. Am I correct in thinking it should start out
>  > with a stack of work requests in case another packet arrives before
>  > the current one has been processed?
>
> That seems a lot more sensible to me.
>
>  > Sorry, I meant to look up in my source code which call was failing but
>  > forgot to paste it into the question. Yes, I can map 2GB of memory but
>  > the call to ibv_create_qp() fails with REJ
>
> Not sure what you mean ... ibv_create_qp() just returns a pointer or
> NULL.  What does it mean to "fail with REJ?"
>   
OK. I need to rerun this test tomorrow to determine exactly where and 
how this test is failing. The end result is that the QP creation fails 
with a REJ. From what I remember, I get a CM event  IB_CM_REJ_RECEIVED 
and the remote node is not even aware that anything has tried to connect.
Thanks for staying with me on this one.
>  > That's reassuring. Are there any performance penalties for mapping a
>  > larger region than a smaller region?
>
> Not really beyond the general cost of using more memory rather than less.
>
>  - R.
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>   


From rdreier at cisco.com  Thu Jun 21 13:17:15 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Jun 2007 13:17:15 -0700
Subject: [ofa-general] Re: [PATCH] libmlx4: fix adjustments for minimum qp
	capabilities in mlx4_create_qp
In-Reply-To: <200706191647.41336.jackm@dev.mellanox.co.il> (Jack Morgenstein's
	message of "Tue, 19 Jun 2007 16:47:41 +0300")
References: <200706191647.41336.jackm@dev.mellanox.co.il>
Message-ID: <adafy4l3vw4.fsf@cisco.com>

 > Need to adjust minimum qp capability values prior to size and max resource
 > calculations.

Is this actually fixing a problem?  I don't see how it could make a difference:

 > +	attr->cap.max_recv_wr = attr->cap.max_recv_wr ? attr->cap.max_recv_wr : 1;

align_queue_size() always returns at least 1 so I don't see why this matters.

 > +	attr->cap.max_recv_sge = attr->cap.max_recv_sge ? attr->cap.max_recv_sge : 1;

I don't see anything that uses max_recv_sge before it gets set in the
current code.

 > +	attr->cap.max_send_wr = attr->cap.max_send_wr ? attr->cap.max_send_wr : 1;

If max_send_wr is 0 then the call to align_queue_size will always add
at least one more WQE because sq_spare_wqes will never be a power of 2.

 > +	attr->cap.max_send_sge = attr->cap.max_send_sge ? attr->cap.max_send_sge : 1;

mlx4_calc_sq_wqe_size() will always end up with at least a 64-byte WQE
size so does this matter?  Oh, I guess a UD QP could end up with 0
send gather entries, but I'm not sure that's a big deal -- after all,
the user gets what he asked for, and the HW shouldn't be bothered,
should it?

 - R.


From rdreier at cisco.com  Thu Jun 21 13:19:55 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Jun 2007 13:19:55 -0700
Subject: [ofa-general] [PATCH] for-2.6.23 ib/umad: add partition support
In-Reply-To: <467996C4.1060201@ichips.intel.com> (Sean Hefty's message of "Wed,
	20 Jun 2007 14:06:12 -0700")
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
	<adair9i8ihq.fsf@cisco.com> <467996C4.1060201@ichips.intel.com>
Message-ID: <adabqf93vro.fsf@cisco.com>

 > Did you have something in mind?  (new ioctl?  re-using existing fields?)
 > 
 > Not all fields are used for both reads and writes.  E.g. status is
 > unused on a write, and retries is unused on a read.  Storing the
 > pkey_index on a read seems doable.  I think if we do anything on a
 > write, we need to make an assumption that the data is currently set to
 > 0 by the app.

I hadn't really thought about it.

One other thing is that the top 8 bits of flow_label aren't used.  I
guess we could steal that, although it's a little ugly.  I doubt it
would break existing userspace.

There is the problem of old kernels silently ignoring the pkey index
though.  I'm not sure I see a good way around that.

I'm beginning to think that just updating the ABI might be the right
answer.  But let's try to make this be the last ABI break.  Are we
pretty sure there's *nothing* else we might ever want to add to the
structure?  I can't think of anything right now...

 - R.


From rdreier at cisco.com  Thu Jun 21 13:23:00 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Jun 2007 13:23:00 -0700
Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support
In-Reply-To: <20070621033854.GF8868@mellanox.co.il> (Michael S. Tsirkin's
	message of "Thu, 21 Jun 2007 06:38:54 +0300")
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
	<adair9i8ihq.fsf@cisco.com> <467996C4.1060201@ichips.intel.com>
	<20070621033854.GF8868@mellanox.co.il>
Message-ID: <ada7ipx3vmj.fsf@cisco.com>

 > We made a mistake of not validating the offset field otherwise we could
 > have used it, too: as it is I think apps just use "write" so
 > there's a useless byte counter in that field.

which offset field?  I don't see the string "offset" anywhere in ib_user_mad.h

 > But if we do one of these things, the app does not get any indication that pkey's
 > ignored, isn't that right?

Yes, that's a good point.

 > This assumes an open file desriptor per-pkey, so the proposed API
 > extension umad_set_pkey would have to be changed to be per-port rather
 > than per-mad. But I think this is a better API, too: most apps
 > likely work within a single partition.

Not sure I agree.  If I'm implementing an SA, then I want to be able
to receive MADs for all partitions, and send them too.  Of course I
can open a bunch of file descriptors, but then I probably end up in a
mess keeping up with what's in my pkey table.

 - R.


From rdreier at cisco.com  Thu Jun 21 13:26:57 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Jun 2007 13:26:57 -0700
Subject: [ofa-general] Re: [PATCH] IB-mlx4: query_device needs to return one
	less srq wqe for max_srq_wr
In-Reply-To: <200706191820.46443.jackm@dev.mellanox.co.il> (Jack Morgenstein's
	message of "Tue, 19 Jun 2007 18:20:46 +0300")
References: <200706191820.46443.jackm@dev.mellanox.co.il>
Message-ID: <ada3b0l3vfy.fsf@cisco.com>

Thanks, applied.


From swise at opengridcomputing.com  Thu Jun 21 13:28:33 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 21 Jun 2007 15:28:33 -0500
Subject: [ofa-general] Stringify ibv_event_type
In-Reply-To: <adazm2t3z27.fsf@cisco.com>
References: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com>	<adalked5ees.fsf@cisco.com>
	<467ACC65.4020106@opengridcomputing.com>
	<adazm2t3z27.fsf@cisco.com>
Message-ID: <467ADF71.4090002@opengridcomputing.com>

Roland Dreier wrote:
>  > the rdmacm stuff too!
> 
> which stuff is that?  Is it from librdmacm?  If so that's a different
> package and therefore a different change to make.

it would be nice for librdmacm to have a stringafy method for the event 
enum...

But yes, its a different package...

/me nudges sean...

:)


From rdreier at cisco.com  Thu Jun 21 13:40:00 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Jun 2007 13:40:00 -0700
Subject: [ofa-general] Re: [PATCH for-2.6.22] ipoib/cm: fix interoperability
	when mtu don't match
In-Reply-To: <20070620162215.GF6006@mellanox.co.il> (Michael S. Tsirkin's
	message of "Wed, 20 Jun 2007 19:22:15 +0300")
References: <20070620162215.GF6006@mellanox.co.il>
Message-ID: <aday7id2g9r.fsf@cisco.com>

OK, I applied this for 2.6.22 since it looks quite safe (I even took
the risk of replacing the "4" in the warning string with a "%d" and
printing IPOIB_ENCAP_LEN, because it seemed funny to test against a
named constant and then print a raw number).  But I'm really going to
be disappointed if this breaks something...

BTW, any objection to merging the patch below for 2.6.22 too?  It's
compile-tested only but it looks *REALLY* safe.

commit f667e4b9c4d7b2772105d2872becffbe9e65ecea
Author: Roland Dreier <rolandd at cisco.com>
Date:   Thu Jun 21 13:37:05 2007 -0700

    IPoIB/cm: Remove dead definition of struct ipoib_cm_id
    
    It's completely unused.
    
    Signed-off-by: Roland Dreier <rolandd at cisco.com>

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 1fe7f66..b3d0a31 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -56,13 +56,6 @@ MODULE_PARM_DESC(cm_data_debug_level,
 #define IPOIB_CM_RX_DELAY       (3 * 256 * HZ)
 #define IPOIB_CM_RX_UPDATE_MASK (0x3)
 
-struct ipoib_cm_id {
-	struct ib_cm_id *id;
-	int flags;
-	u32 remote_qpn;
-	u32 remote_mtu;
-};
-
 static struct ib_qp_attr ipoib_cm_err_attr = {
 	.qp_state = IB_QPS_ERR
 };


From rdreier at cisco.com  Thu Jun 21 13:52:22 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Jun 2007 13:52:22 -0700
Subject: [ofa-general] Re: [PATCH draft,
	untested] ehca srq emulation (for IPoIB CM)
In-Reply-To: <OFE2D9DB0E.0AD8F1B0-ON85257300.007854EA-85257300.0079CE19@us.ibm.com>
	(Bernard King-Smith's message of "Wed,
	20 Jun 2007 18:09:08 -0400")
References: <OFE2D9DB0E.0AD8F1B0-ON85257300.007854EA-85257300.0079CE19@us.ibm.com>
Message-ID: <adamyyt2fp5.fsf@cisco.com>

 > It is not clear if anything is better yet, but instead you have to go back 
 > to the IPoIB-CM  RFC 4755 that we wrote. In the spec you will see that the 
 > approach for this driver is to have the IPoIB driver select the most 
 > appropriate method of connecting. If RC was not available then UD was 
 > used. You can extend that to UC mode as Michael proposed, as long as you 
 > allow selecting the most appropriate method of connection. By pushing the 
 > issue of SRQ or not SRQ to the driver you have broken the IPoIB-CM 
 > original design. Since SRQ was not a required function in the IB spec we 
 > never addressed that issue in the RFC along with UC. I think we can agree 
 > that adding UC is a good thing and follows the approach in the original 
 > spec. Including SRQ as one of the tests for the best possible connection 
 > method follows this same approach.

 > ....

I can't really follow this.  We're talking about the internal
implementation inside the Linux kernel, which I really hope that an
IETF RFC does not address at all.  We surely intend to follow the RFC,
and if we run into problems because the RFC was written without any
implementation experience, then we'll work to correct those problems
through a new IETF document.

It makes perfect sense for ehca systems to be able to use IPoIB CM.  I
understand that current ehca HW doesn't natively support SRQs.  The
only question is how to implement IPoIB CM for ehca systems, and we
have to weigh tradeoffs like avoiding code duplication vs the
additional cost of branches on the data path.

 - R.


From sashak at voltaire.com  Thu Jun 21 14:29:20 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Fri, 22 Jun 2007 00:29:20 +0300
Subject: [ofa-general] [PATCH] opensm/updn: --connect_roots option
Message-ID: <20070621212919.GL25653@sashak.voltaire.com>


With this option up/down preserves route paths (based on min hops
knowledge) between root switches. This makes up/down IBA complaint
(where all to all connectivity is required), OTOH this violates up/down
deadlock free algorithm. By default this option is 'off'.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 opensm/include/opensm/osm_subnet.h |    6 ++++++
 opensm/man/opensm.8                |    8 +++++++-
 opensm/opensm/main.c               |   15 ++++++++++++++-
 opensm/opensm/osm_subnet.c         |   10 ++++++++++
 opensm/opensm/osm_ucast_updn.c     |   27 ++++++++++++++++++++++++++-
 5 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
index 2ee5689..43b1589 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -276,6 +276,7 @@ typedef struct _osm_subn_opt
   boolean_t                sweep_on_trap;
   osm_testability_modes_t  testability_mode;
   char *                   routing_engine_name;
+  boolean_t                connect_roots;
   char *                   lid_matrix_dump_file;
   char *                   ucast_dump_file;
   char *                   root_guid_file;
@@ -445,6 +446,11 @@ typedef struct _osm_subn_opt
 *		Name of used routing engine
 *		(other than default Min Hop Algorithm)
 *
+*	connect_roots
+*		The option which will enfoce root to root connectivity with
+*		up/down routing engine (even if this violates "pure" deadlock
+*		free up/down algorithm)
+*
 *	lid_matrix_dump_file
 *		Name of the lid matrix dump file from where switch
 *		lid matrices (min hops tables) will be loaded
diff --git a/opensm/man/opensm.8 b/opensm/man/opensm.8
index 4d35689..40e0235 100644
--- a/opensm/man/opensm.8
+++ b/opensm/man/opensm.8
@@ -5,7 +5,7 @@ opensm \- InfiniBand subnet manager and administration (SM/SA)
 
 .SH SYNOPSIS
 .B opensm
-[\-c(ache-options)] [\-g(uid)[=]<GUID in hex>] [\-l(mc) <LMC>] [\-p(riority) <PRIORITY>] [\-smkey <SM_Key>] [\-r(eassign_lids)] [\-R <engine name> | \-\-routing_engine <engine name>] [\-M <file name> | \-\-lid_matrix_file <file name>] [\-U <file name> | \-ucast_file <file name>] [\-S | \-\-sadb_file <file name>] [\-a | \-\-root_guid_file <path to file>] [\-u | \-\-cn_guid_file <path to file>] [\-o(nce)] [\-s(weep) <interval>] [\-t(imeout) <milliseconds>] [\-maxsmps <number>] [\-console [off | local | socket]] [\-console-port <port>] [\-i(gnore-guids) <equalize-ignore-guids-file>] [\-f | \-\-log_file] [\-L | \-\-log_limit <size in MB>] [\-e(rase_log_file)] [\-P(config)] [\-Q | \-qos] [\-N | \-no_part_enforce] [\-y | \-stay_on_fatal] [\-B | \-daemon] [\-I | \-inactive] [\-perfmgr] [\-perfmgr_sweep_time_s <seconds>] [\-v(erbose)] [\-V] [\-D <flags>] [\-d(ebug) <number>] [\-h(elp)] [\-?]
+[\-c(ache-options)] [\-g(uid)[=]<GUID in hex>] [\-l(mc) <LMC>] [\-p(riority) <PRIORITY>] [\-smkey <SM_Key>] [\-r(eassign_lids)] [\-R <engine name> | \-\-routing_engine <engine name>] [\-z | \-\-connect_roots] [\-M <file name> | \-\-lid_matrix_file <file name>] [\-U <file name> | \-ucast_file <file name>] [\-S | \-\-sadb_file <file name>] [\-a | \-\-root_guid_file <path to file>] [\-u | \-\-cn_guid_file <path to file>] [\-o(nce)] [\-s(weep) <interval>] [\-t(imeout) <milliseconds>] [\-maxsmps <number>] [\-console [off | local | socket]] [\-console-port <port>] [\-i(gnore-guids) <equalize-ignore-guids-file>] [\-f | \-\-log_file] [\-L | \-\-log_limit <size in MB>] [\-e(rase_log_file)] [\-P(config)] [\-Q | \-qos] [\-N | \-no_part_enforce] [\-y | \-stay_on_fatal] [\-B | \-daemon] [\-I | \-inactive] [\-perfmgr] [\-perfmgr_sweep_time_s <seconds>] [\-v(erbose)] [\-V] [\-D <flags>] [\-d(ebug) <number>] [\-h(elp)] [\-?]
 
 .SH DESCRIPTION
 .PP
@@ -94,6 +94,12 @@ This option chooses routing engine instead of Min Hop
 algorithm (default).
 Supported engines: updn, file, ftree, lash
 .TP
+\fB\-z\fR, \fB\-\-connect_roots\fR
+This option enforces a routing engine (currently up/down
+only) to make connectivity between root switches and in
+this way to be fully IBA complaint. In many cases this can
+violate "pure" deadlock free algorithm, so use it carefully.
+.TP
 \fB\-M\fR, \fB\-\-lid_matrix_file\fR
 This option specifies the name of the lid matrix dump file
 from where switch lid matrices (min hops tables will be
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index 0d5e0eb..e182276 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -175,6 +175,13 @@ show_usage(void)
           "          This option chooses routing engine instead of Min Hop\n"
           "          algorithm (default).\n"
           "          Supported engines: updn, file, ftree\n\n");
+  printf( "-z\n"
+          "--connect_roots\n"
+          "          This option enforces a routing engine (currently\n"
+          "          up/down only) to make connectivity between root switches\n"
+          "          and in this way to be fully IBA complaint. In many cases\n"
+          "          this can violate \"pure\" deadlock free algorithm, so\n"
+          "          use it carefully.\n\n");
   printf( "-M\n"
           "--lid_matrix_file <file name>\n"
           "          This option specifies the name of the lid matrix dump file\n"
@@ -591,7 +598,7 @@ main(
   char                 *ignore_guids_file_name = NULL;
   uint32_t              val;
   const char * const    short_option =
-	  "i:f:ed:g:l:L:s:t:a:u:R:M:U:S:P:NBIQvVhorcyxp:n:q:k:C:";
+	  "i:f:ed:g:l:L:s:t:a:u:R:zM:U:S:P:NBIQvVhorcyxp:n:q:k:C:";
 
   /*
     In the array below, the 2nd parameter specifies the number
@@ -625,6 +632,7 @@ main(
       {  "priority",      1, NULL, 'p'},
       {  "smkey",         1, NULL, 'k'},
       {  "routing_engine",1, NULL, 'R'},
+      {  "connect_roots", 0, NULL, 'z'},
       {  "lid_matrix_file",1, NULL, 'M'},
       {  "ucast_file",    1, NULL, 'U'},
       {  "sadb_file",     1, NULL, 'S'},
@@ -876,6 +884,11 @@ main(
       printf(" Activate \'%s\' routing engine\n", optarg);
       break;
 
+    case 'z':
+      opt.connect_roots = TRUE;
+      printf(" Connect roots option is on\n");
+      break;
+
     case 'M':
       opt.lid_matrix_dump_file = optarg;
       printf(" Lid matrix dump file is \'%s\'\n", optarg);
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 82d66f9..8f429ae 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -500,6 +500,7 @@ osm_subn_set_default_opt(
   p_opt->sweep_on_trap = TRUE;
   p_opt->testability_mode = OSM_TEST_MODE_NONE;
   p_opt->routing_engine_name = NULL;
+  p_opt->connect_roots = FALSE;
   p_opt->lid_matrix_dump_file = NULL;
   p_opt->ucast_dump_file = NULL;
   p_opt->root_guid_file = NULL;
@@ -1290,6 +1291,10 @@ osm_subn_parse_conf_file(
         "routing_engine",
         p_key, p_val, &p_opts->routing_engine_name);
 
+      __osm_subn_opts_unpack_boolean(
+        "connect_roots",
+        p_key, p_val, &p_opts->connect_roots);
+
       __osm_subn_opts_unpack_charp(
         "log_file", p_key, p_val, &p_opts->log_file);
 
@@ -1545,6 +1550,11 @@ osm_subn_write_conf_file(
              "# Routing engine\n"
              "routing_engine %s\n\n",
              p_opts->routing_engine_name);
+  if (p_opts->connect_roots)
+    fprintf( opts_file,
+             "# Connect roots (use FALSE if unsure)\n"
+             "connect_roots %s\n\n",
+             p_opts->connect_roots ? "TRUE" : "FALSE");
   if (p_opts->lid_matrix_dump_file)
     fprintf( opts_file,
              "# Lid matrix dump file name\n"
diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c
index af5ee4e..db8e60a 100644
--- a/opensm/opensm/osm_ucast_updn.c
+++ b/opensm/opensm/osm_ucast_updn.c
@@ -449,6 +449,24 @@ updn_subn_rank(
 
 /**********************************************************************
  **********************************************************************/
+/* hack: preserve min hops entries to any other root switches */
+static void
+updn_clear_root_hops(updn_t *p_updn, osm_switch_t *p_sw)
+{
+  osm_port_t *p_port;
+  unsigned i;
+
+  for ( i = 0 ; i < p_sw->num_hops ; i++ )
+    if (p_sw->hops[i]) {
+      p_port = cl_ptr_vector_get(&p_updn->p_osm->subn.port_lid_tbl, i);
+      if (!p_port || !p_port->p_node->sw ||
+          ((struct updn_node *)p_port->p_node->sw->priv)->rank != 0)
+        memset(p_sw->hops[i], 0xff, p_sw->num_ports);
+    }
+}
+
+/**********************************************************************
+ **********************************************************************/
 static int
 __osm_subn_set_up_down_min_hop_table(
   IN updn_t* p_updn )
@@ -471,7 +489,10 @@ __osm_subn_set_up_down_min_hop_table(
     p_sw = p_next_sw;
     p_next_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item );
     /* Clear Min Hop Table */
-    osm_switch_clear_hops(p_sw);
+    if (p_subn->opt.connect_roots && !((struct updn_node *)p_sw->priv)->rank)
+      updn_clear_root_hops(p_updn, p_sw);
+    else
+      osm_switch_clear_hops(p_sw);
   }
 
   osm_log( p_log, OSM_LOG_VERBOSE,
@@ -607,6 +628,10 @@ __osm_updn_call(
     osm_ucast_mgr_build_lid_matrices( &p_updn->p_osm->sm.ucast_mgr );
     __osm_updn_find_root_nodes_by_min_hop( p_updn );
   }
+  else if (p_updn->p_osm->subn.opt.connect_roots &&
+           p_updn->updn_ucast_reg_inputs.num_guids > 1)
+    osm_ucast_mgr_build_lid_matrices( &p_updn->p_osm->sm.ucast_mgr );
+
   /* printf ("-V- after osm_updn_find_root_nodes_by_min_hop\n"); */
   /* Only if there are assigned root nodes do the algorithm, otherwise perform do nothing */
   if ( p_updn->updn_ucast_reg_inputs.num_guids > 0)
-- 
1.5.2.2.277.g07b8


From mshefty at ichips.intel.com  Thu Jun 21 14:30:45 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 21 Jun 2007 14:30:45 -0700
Subject: [ofa-general] [PATCH] for-2.6.23 ib/umad: add partition support
In-Reply-To: <adabqf93vro.fsf@cisco.com>
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>	<adair9i8ihq.fsf@cisco.com>
	<467996C4.1060201@ichips.intel.com> <adabqf93vro.fsf@cisco.com>
Message-ID: <467AEE05.9050809@ichips.intel.com>

> I'm beginning to think that just updating the ABI might be the right
> answer.  But let's try to make this be the last ABI break.  Are we
> pretty sure there's *nothing* else we might ever want to add to the
> structure?  I can't think of anything right now...

I can't think of anything, but Hal is in a better position to answer 
this.  He's the one who pointed out the problem to me.

- Sean


From kliteyn at dev.mellanox.co.il  Thu Jun 21 14:42:00 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Fri, 22 Jun 2007 00:42:00 +0300
Subject: [ofa-general] Re: [PATCH] osm: cosmetics in ftree - added get_guid
 functions for switch and hca
In-Reply-To: <1182447627.15653.419564.camel@hal.voltaire.com>
References: <4678DA83.2050700@dev.mellanox.co.il>
	<1182447627.15653.419564.camel@hal.voltaire.com>
Message-ID: <467AF0A8.4070706@dev.mellanox.co.il>

Hal Rosenstock wrote:
> Hi again Yevgeny,
> 
> On Wed, 2007-06-20 at 03:42, Yevgeny Kliteynik wrote:
>> Hi Hal,
>>
>> Cosmetic code changes in fat-tree:
>> added get_guid_ho and get_guid_no functions for switches and hca's
>>
>> -- Yevgeny
>>
>> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> 
> This patch won't apply either. I'm not sure I want to hand edit these
> changes in. Can you try it and see if it works for you ?

Thanks, I'll check what the problem is.

-- Yevgeny

> Thanks.
> 
> -- Hal
> 
>> ---
>>   opensm/opensm/osm_ucast_ftree.c |   77 +++++++++++++++++++++++++++++----------
>>   1 files changed, 58 insertions(+), 19 deletions(-)
>>
>> diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c
>> index 1ead199..1ae8b29 100644
>> --- a/opensm/opensm/osm_ucast_ftree.c
>> +++ b/opensm/opensm/osm_ucast_ftree.c
>> @@ -640,6 +640,26 @@ __osm_ftree_sw_destroy(
>>
>>   /***************************************************/
>>
>> +static uint64_t
>> +__osm_ftree_sw_get_guid_no(
>> +   IN  ftree_sw_t * p_sw)
>> +{
>> +   if (!p_sw)
>> +      return 0;
>> +   return osm_node_get_node_guid(p_sw->p_osm_sw->p_node);
>> +}
>> +
>> +/***************************************************/
>> +
>> +static uint64_t
>> +__osm_ftree_sw_get_guid_ho(
>> +   IN  ftree_sw_t * p_sw)
>> +{
>> +   return cl_ntoh64(__osm_ftree_sw_get_guid_no(p_sw));
>> +}
>> +
>> +/***************************************************/
>> +
>>   static void
>>   __osm_ftree_sw_dump(
>>      IN  ftree_fabric_t * p_ftree,
>> @@ -657,7 +677,7 @@ __osm_ftree_sw_dump(
>>              "__osm_ftree_sw_dump: "
>>              "Switch index: %s, GUID: 0x%016" PRIx64 ", Ports: %u DOWN, %u UP\n",
>>             __osm_ftree_tuple_to_str(p_sw->tuple),
>> -          cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
>> +          __osm_ftree_sw_get_guid_ho(p_sw),
>>             p_sw->down_port_groups_num,
>>             p_sw->up_port_groups_num);
>>
>> @@ -835,6 +855,26 @@ __osm_ftree_hca_destroy(
>>
>>   /***************************************************/
>>
>> +static uint64_t
>> +__osm_ftree_hca_get_guid_no(
>> +   IN  ftree_hca_t * p_hca)
>> +{
>> +   if (!p_hca)
>> +      return 0;
>> +   return osm_node_get_node_guid(p_hca->p_osm_node);
>> +}
>> +
>> +/***************************************************/
>> +
>> +static uint64_t
>> +__osm_ftree_hca_get_guid_ho(
>> +   IN  ftree_hca_t * p_hca)
>> +{
>> +   return cl_ntoh64(__osm_ftree_hca_get_guid_no(p_hca));
>> +}
>> +
>> +/***************************************************/
>> +
>>   static void
>>   __osm_ftree_hca_dump(
>>      IN  ftree_fabric_t * p_ftree,
>> @@ -851,7 +891,7 @@ __osm_ftree_hca_dump(
>>      osm_log(&p_ftree->p_osm->log, OSM_LOG_DEBUG,
>>              "__osm_ftree_hca_dump: "
>>              "CA GUID: 0x%016" PRIx64 ", Ports: %u UP\n",
>> -          cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)),
>> +          __osm_ftree_hca_get_guid_ho(p_hca),
>>             p_hca->up_port_groups_num);
>>
>>      for( i = 0; i < p_hca->up_port_groups_num; i++ )
>> @@ -1214,7 +1254,7 @@ __osm_ftree_fabric_dump_general_info(
>>                  osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE,
>>                          "__osm_ftree_fabric_dump_general_info: "
>>                          "      GUID: 0x%016" PRIx64 ", LID: 0x%x, Index %s\n",
>> -                       cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
>> +                       __osm_ftree_sw_get_guid_ho(p_sw),
>>                          cl_ntoh16(p_sw->base_lid),
>>                          __osm_ftree_tuple_to_str(p_sw->tuple));
>>         }
>> @@ -1227,8 +1267,7 @@ __osm_ftree_fabric_dump_general_info(
>>               osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE,
>>                       "__osm_ftree_fabric_dump_general_info: "
>>                       "      GUID: 0x%016" PRIx64 ", LID: 0x%x, Index %s\n",
>> -                    cl_ntoh64(osm_node_get_node_guid(
>> -                              p_ftree->leaf_switches[i]->p_osm_sw->p_node)),
>> +                    __osm_ftree_sw_get_guid_ho(p_ftree->leaf_switches[i]),
>>                       cl_ntoh16(p_ftree->leaf_switches[i]->base_lid),
>>                       __osm_ftree_tuple_to_str(p_ftree->leaf_switches[i]->tuple));
>>         }
>> @@ -1442,7 +1481,7 @@ __osm_ftree_fabric_make_indexing(
>>              p_sw->rank,
>>              __osm_ftree_tuple_to_str(p_sw->tuple),
>>              cl_ntoh16(p_sw->base_lid),
>> -           cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)));
>> +           __osm_ftree_sw_get_guid_ho(p_sw));
>>
>>      /*
>>       * Now run BFS and assign indexes to all switches
>> @@ -1617,11 +1656,11 @@ __osm_ftree_fabric_validate_topology(
>>                       "ERR AB09: Different number of upward port groups on switches:\n"
>>                       "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u groups\n"
>>                       "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u groups\n",
>> -                    cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)),
>> +                    __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]),
>>                       cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid),
>>                       __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple),
>>                       reference_sw_arr[p_sw->rank]->up_port_groups_num,
>> -                    cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
>> +                    __osm_ftree_sw_get_guid_ho(p_sw),
>>                       cl_ntoh16(p_sw->base_lid),
>>                       __osm_ftree_tuple_to_str(p_sw->tuple),
>>                       p_sw->up_port_groups_num);
>> @@ -1638,11 +1677,11 @@ __osm_ftree_fabric_validate_topology(
>>                       "ERR AB0A: Different number of downward port groups on switches:\n"
>>                       "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u port groups\n"
>>                       "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u port groups\n",
>> -                    cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)),
>> +                    __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]),
>>                       cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid),
>>                       __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple),
>>                       reference_sw_arr[p_sw->rank]->down_port_groups_num,
>> -                    cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
>> +                    __osm_ftree_sw_get_guid_ho(p_sw),
>>                       cl_ntoh16(p_sw->base_lid),
>>                       __osm_ftree_tuple_to_str(p_sw->tuple),
>>                       p_sw->down_port_groups_num);
>> @@ -1663,11 +1702,11 @@ __osm_ftree_fabric_validate_topology(
>>                              "ERR AB0B: Different number of ports in an upward port group on switches:\n"
>>                              "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n"
>>                              "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n",
>> -                           cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)),
>> +                           __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]),
>>                              cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid),
>>                              __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple),
>>                              cl_ptr_vector_get_size(&p_ref_group->ports),
>> -                           cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
>> +                           __osm_ftree_sw_get_guid_ho(p_sw),
>>                              cl_ntoh16(p_sw->base_lid),
>>                              __osm_ftree_tuple_to_str(p_sw->tuple),
>>                              cl_ptr_vector_get_size(&p_group->ports));
>> @@ -1691,11 +1730,11 @@ __osm_ftree_fabric_validate_topology(
>>                              "ERR AB0C: Different number of ports in an downward port group on switches:\n"
>>                              "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n"
>>                              "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n",
>> -                           cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)),
>> +                           __osm_ftree_sw_get_guid_ho(reference_sw_arr[p_sw->rank]),
>>                              cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid),
>>                              __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple),
>>                              cl_ptr_vector_get_size(&p_ref_group->ports),
>> -                           cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
>> +                           __osm_ftree_sw_get_guid_ho(p_sw),
>>                              cl_ntoh16(p_sw->base_lid),
>>                              __osm_ftree_tuple_to_str(p_sw->tuple),
>>                              cl_ptr_vector_get_size(&p_group->ports));
>> @@ -2508,7 +2547,7 @@ __osm_ftree_rank_leaf_switches(
>>                       "__osm_ftree_rank_leaf_switches: ERR AB0F: "
>>                       "CA conected directly to another CA: "
>>                       "0x%016" PRIx64 " <---> 0x%016" PRIx64 "\n",
>> -                    cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)),
>> +                    __osm_ftree_hca_get_guid_ho(p_hca),
>>                       cl_ntoh64(osm_node_get_node_guid(p_remote_osm_node)));
>>               res = -1;
>>               goto Exit;
>> @@ -2548,8 +2587,8 @@ __osm_ftree_rank_leaf_switches(
>>                 "                                            - CA guid    : 0x%016" PRIx64 "\n"
>>                 "                                            - Switch guid: 0x%016" PRIx64 "\n"
>>                 "                                            - Switch LID : 0x%x\n",
>> -              cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)),
>> -              cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
>> +              __osm_ftree_hca_get_guid_ho(p_hca),
>> +              __osm_ftree_sw_get_guid_ho(p_sw),
>>                 cl_ntoh16(p_sw->base_lid));
>>         cl_list_insert_tail(p_ranking_bfs_list,
>>                             &__osm_ftree_sw_tbl_element_create(p_sw)->map_item);
>> @@ -2740,10 +2779,10 @@ __osm_ftree_fabric_construct_sw_ports(
>>                          "       GUID 0x%016" PRIx64 ", LID 0x%x, rank %u\n",
>>                          p_sw->rank,
>>                          p_remote_sw->rank,
>> -                       cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
>> +                       __osm_ftree_sw_get_guid_ho(p_sw),
>>                          cl_ntoh16(p_sw->base_lid),
>>                          p_sw->rank,
>> -                       cl_ntoh64(osm_node_get_node_guid(p_remote_sw->p_osm_sw->p_node)),
>> +                       __osm_ftree_sw_get_guid_ho(p_remote_sw),
>>                          cl_ntoh16(p_remote_sw->base_lid),
>>                          p_remote_sw->rank);
>>                  res = -1;
> 
> 


From jsquyres at cisco.com  Thu Jun 21 14:58:35 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Thu, 21 Jun 2007 17:58:35 -0400
Subject: [ofa-general] Re: Stringify ibv_event_type
In-Reply-To: <20070621185533.GJ4857@mellanox.co.il>
References: <2C245DF3-77A7-4A3C-BF3A-13FEC2F7E0DA@cisco.com>
	<20070621185533.GJ4857@mellanox.co.il>
Message-ID: <B0064B8E-A766-45D6-A296-22FB52FB1D9B@cisco.com>

On Jun 21, 2007, at 2:55 PM, Michael S. Tsirkin wrote:

> I have no strong opinion either way, but I do wonder why do you  
> find this useful?

The more verbose an error message, the more chance a user has to  
understand it.

> Asyncwatch is just an example: it does not actually *do anything*  
> on an event,
> so it calls printf. But, is it likely that enduser really needs to see
> IBV_EVENT_CLIENT_REREGISTER? Printing out the numerc value seems
> sufficient for debug.

Why have to force a secondary lookup (that may involve multiple  
steps)?  Printing a string is easy.

Plus, what if the enum values change over time?  Then we'll have to  
have the user send us the error message and their verbs.h to find out  
what the problem really is.  If you print the enum value as a string,  
it's pretty clear (to a developer at least) what the problem is/could  
be regardless of what the actual numerical value is (indeed, who  
cares what the numerical value is?).  Heck, some of the enum names  
are fairly obvious such that even a reasonably-skilled user could  
figure out at least the context of the error.

Just my $0.02.

-- 
Jeff Squyres
Cisco Systems


From sean.hefty at intel.com  Thu Jun 21 15:21:40 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 21 Jun 2007 15:21:40 -0700
Subject: [ofa-general] Stringify ibv_event_type
In-Reply-To: <467ADF71.4090002@opengridcomputing.com>
Message-ID: <000201c7b452$8a63c220$ff0da8c0@amr.corp.intel.com>

>/me nudges sean...

How's this?  Does anything else need to be done with the build (beyond a new
release at some point)?

Signed-off-by: Sean Hefty <sean.hefty at intel.com>

diff --git a/include/rdma/rdma_cma.h b/include/rdma/rdma_cma.h
index f920ae0..43c71d5 100644
--- a/include/rdma/rdma_cma.h
+++ b/include/rdma/rdma_cma.h
@@ -463,7 +463,7 @@ int rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr);
  *   Destruction of an rdma_cm_id will block until related events have been
  *   acknowledged.
  * See also:
- *   rdma_ack_cm_event, rdma_create_event_channel
+ *   rdma_ack_cm_event, rdma_create_event_channel, rdma_event_str
  */
 int rdma_get_cm_event(struct rdma_event_channel *channel,
 		      struct rdma_cm_event **event);
@@ -519,6 +519,16 @@ struct ibv_context **rdma_get_devices(int *num_devices);
  */
 void rdma_free_devices(struct ibv_context **list);
 
+/**
+ * rdma_event_str - Returns a string representation of an rdma cm event.
+ * @event: Asynchronous event.
+ * Description:
+ *   Returns a string representation of an asynchronous event.
+ * See also:
+ *   rdma_get_cm_event
+ */
+const char *rdma_event_str(enum rdma_cm_event_type event);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/man/rdma_event_str.3 b/man/rdma_event_str.3
new file mode 100644
index 0000000..a6ee3e5
--- /dev/null
+++ b/man/rdma_event_str.3
@@ -0,0 +1,15 @@
+.TH "RDMA_EVENT_STR" 3 "2007-05-15" "librdmacm" "Librdmacm Programmer's Manual" librdmacm
+.SH NAME
+rdma_event_str \- Returns a string representation of an rdma cm event.
+.SH SYNOPSIS
+.B "#include <rdma/rdma_cma.h>"
+.P
+.B "char *" rdma_event_str
+.BI "("enum ibv_event_type " event ");"
+.SH ARGUMENTS
+.IP "event" 12
+Asynchronous event.
+.SH "DESCRIPTION"
+Returns a string representation of an asynchronous event.
+.SH "SEE ALSO"
+rdma_get_cm_event(3)
diff --git a/man/rdma_get_cm_event.3 b/man/rdma_get_cm_event.3
index a260092..252a7ab 100644
--- a/man/rdma_get_cm_event.3
+++ b/man/rdma_get_cm_event.3
@@ -62,4 +62,4 @@ no longer accessible and should be rejoined, if desired.
 .SH "SEE ALSO"
 rdma_ack_cm_event(3), rdma_create_event_channel(3), rdma_resolve_addr(3),
 rdma_resolve_route(3), rdma_connect(3), rdma_listen(3), rdma_join_multicast(3),
-rdma_destroy_id(3)
+rdma_destroy_id(3), rdma_event_str(3)
diff --git a/src/cma.c b/src/cma.c
index fdadb69..3579530 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -1359,3 +1359,39 @@ retry:
 	*event = &evt->event;
 	return 0;
 }
+
+const char *rdma_event_str(enum rdma_cm_event_type event)
+{
+	switch (event) {
+	case RDMA_CM_EVENT_ADDR_RESOLVED:
+		return "RDMA_CM_EVENT_ADDR_RESOLVED";
+	case RDMA_CM_EVENT_ADDR_ERROR:
+		return "RDMA_CM_EVENT_ADDR_ERROR";
+	case RDMA_CM_EVENT_ROUTE_RESOLVED:
+		return "RDMA_CM_EVENT_ROUTE_RESOLVED";
+	case RDMA_CM_EVENT_ROUTE_ERROR:
+		return "RDMA_CM_EVENT_ROUTE_ERROR";
+	case RDMA_CM_EVENT_CONNECT_REQUEST:
+		return "RDMA_CM_EVENT_CONNECT_REQUEST";
+	case RDMA_CM_EVENT_CONNECT_RESPONSE:
+		return "RDMA_CM_EVENT_CONNECT_RESPONSE";
+	case RDMA_CM_EVENT_CONNECT_ERROR:
+		return "RDMA_CM_EVENT_CONNECT_ERROR";
+	case RDMA_CM_EVENT_UNREACHABLE:
+		return "RDMA_CM_EVENT_UNREACHABLE";
+	case RDMA_CM_EVENT_REJECTED:
+		return "RDMA_CM_EVENT_REJECTED";
+	case RDMA_CM_EVENT_ESTABLISHED:
+		return "RDMA_CM_EVENT_ESTABLISHED";
+	case RDMA_CM_EVENT_DISCONNECTED:
+		return "RDMA_CM_EVENT_DISCONNECTED";
+	case RDMA_CM_EVENT_DEVICE_REMOVAL:
+		return "RDMA_CM_EVENT_DEVICE_REMOVAL";
+	case RDMA_CM_EVENT_MULTICAST_JOIN:
+		return "RDMA_CM_EVENT_MULTICAST_JOIN";
+	case RDMA_CM_EVENT_MULTICAST_ERROR:
+		return "RDMA_CM_EVENT_MULTICAST_ERROR";
+	default:
+		return "UNKNOWN EVENT";
+	}
+}
diff --git a/src/librdmacm.map b/src/librdmacm.map
index 06e9765..eafeae4 100644
--- a/src/librdmacm.map
+++ b/src/librdmacm.map
@@ -23,5 +23,6 @@ RDMACM_1.0 {
 		rdma_leave_multicast;
 		rdma_get_devices;
 		rdma_free_devices;
+		rdma_event_str;
 	local: *;
 };

===

diff --git a/examples/cmatose.c b/examples/cmatose.c
index 4479fd4..0daaab0 100644
--- a/examples/cmatose.c
+++ b/examples/cmatose.c
@@ -320,8 +320,8 @@ static int cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
 	case RDMA_CM_EVENT_CONNECT_ERROR:
 	case RDMA_CM_EVENT_UNREACHABLE:
 	case RDMA_CM_EVENT_REJECTED:
-		printf("cmatose: event: %d, error: %d\n", event->event,
-			event->status);
+		printf("cmatose: event: %s, error: %d\n",
+		       rdma_event_str(event->event), event->status);
 		connect_error();
 		break;
 	case RDMA_CM_EVENT_DISCONNECTED:
diff --git a/examples/mckey.c b/examples/mckey.c
index 24514a4..15371b6 100644
--- a/examples/mckey.c
+++ b/examples/mckey.c
@@ -305,8 +305,8 @@ static int cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
 	case RDMA_CM_EVENT_ADDR_ERROR:
 	case RDMA_CM_EVENT_ROUTE_ERROR:
 	case RDMA_CM_EVENT_MULTICAST_ERROR:
-		printf("mckey: event: %d, error: %d\n", event->event,
-			event->status);
+		printf("mckey: event: %s, error: %d\n",
+		       rdma_event_str(event->event), event->status);
 		connect_error();
 		ret = event->status;
 		break;
diff --git a/examples/rping.c b/examples/rping.c
index 2dd1cef..c03d3b5 100644
--- a/examples/rping.c
+++ b/examples/rping.c
@@ -164,7 +164,8 @@ static int rping_cma_event_handler(struct rdma_cm_id *cma_id,
 	int ret = 0;
 	struct rping_cb *cb = cma_id->context;
 
-	DEBUG_LOG("cma_event type %d cma_id %p (%s)\n", event->event, cma_id,
+	DEBUG_LOG("cma_event type %s cma_id %p (%s)\n",
+		  rdma_event_str(event->event), cma_id,
 		  (cma_id == cb->cm_id) ? "parent" : "child");
 
 	switch (event->event) {
@@ -207,14 +208,15 @@ static int rping_cma_event_handler(struct rdma_cm_id *cma_id,
 	case RDMA_CM_EVENT_CONNECT_ERROR:
 	case RDMA_CM_EVENT_UNREACHABLE:
 	case RDMA_CM_EVENT_REJECTED:
-		fprintf(stderr, "cma event %d, error %d\n", event->event,
-		       event->status);
+		fprintf(stderr, "cma event %s, error %d\n",
+			rdma_event_str(event->event), event->status);
 		sem_post(&cb->sem);
 		ret = -1;
 		break;
 
 	case RDMA_CM_EVENT_DISCONNECTED:
-		fprintf(stderr, "%s DISCONNECT EVENT...\n", cb->server ? "server" : "client");
+		fprintf(stderr, "%s DISCONNECT EVENT...\n",
+			cb->server ? "server" : "client");
 		sem_post(&cb->sem);
 		break;
 
diff --git a/examples/udaddy.c b/examples/udaddy.c
index 12e6297..1b6a732 100644
--- a/examples/udaddy.c
+++ b/examples/udaddy.c
@@ -363,8 +363,8 @@ static int cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
 	case RDMA_CM_EVENT_CONNECT_ERROR:
 	case RDMA_CM_EVENT_UNREACHABLE:
 	case RDMA_CM_EVENT_REJECTED:
-		printf("udaddy: event: %d, error: %d\n", event->event,
-			event->status);
+		printf("udaddy: event: %s, error: %d\n",
+		       rdma_event_str(event->event), event->status);
 		connect_error();
 		ret = event->status;
 		break;


From sashak at voltaire.com  Thu Jun 21 16:22:33 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Fri, 22 Jun 2007 02:22:33 +0300
Subject: [ofa-general] [PATCH] management: drop *.spec files rebuild
Message-ID: <20070621232233.GN25653@sashak.voltaire.com>


Now *.spec files are generated from *.spec.in templates by running
make.dist script and it is committed in git. This patch drops *.spec
regeneration by ./configure (which gets invalid @RELEASE@ and other
macros anyway).

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 infiniband-diags/configure.in |    1 -
 libibcommon/configure.in      |    2 +-
 libibmad/configure.in         |    2 +-
 libibumad/configure.in        |    2 +-
 opensm/configure.in           |    1 -
 5 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/infiniband-diags/configure.in b/infiniband-diags/configure.in
index 0d7f82c..b06cb37 100644
--- a/infiniband-diags/configure.in
+++ b/infiniband-diags/configure.in
@@ -158,7 +158,6 @@ AC_SUBST(IBSCRIPTPATH)
 
 AC_CONFIG_FILES([\
         Makefile \
-        infiniband-diags.spec \
         scripts/ibcheckerrors \
         scripts/ibcheckerrs \
         scripts/ibchecknet \
diff --git a/libibcommon/configure.in b/libibcommon/configure.in
index cbf9f07..8a9e5be 100644
--- a/libibcommon/configure.in
+++ b/libibcommon/configure.in
@@ -46,5 +46,5 @@ AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script,
 
 AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test "$ac_cv_version_script" = "yes")
 
-AC_CONFIG_FILES([Makefile libibcommon.spec])
+AC_CONFIG_FILES([Makefile])
 AC_OUTPUT
diff --git a/libibmad/configure.in b/libibmad/configure.in
index fbb7758..d534916 100644
--- a/libibmad/configure.in
+++ b/libibmad/configure.in
@@ -63,5 +63,5 @@ AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script,
 
 AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test "$ac_cv_version_script" = "yes")
 
-AC_CONFIG_FILES([Makefile libibmad.spec])
+AC_CONFIG_FILES([Makefile])
 AC_OUTPUT
diff --git a/libibumad/configure.in b/libibumad/configure.in
index 74f3255..538c118 100644
--- a/libibumad/configure.in
+++ b/libibumad/configure.in
@@ -64,5 +64,5 @@ AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script,
 
 AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test "$ac_cv_version_script" = "yes")
 
-AC_CONFIG_FILES([Makefile libibumad.spec])
+AC_CONFIG_FILES([Makefile])
 AC_OUTPUT
diff --git a/opensm/configure.in b/opensm/configure.in
index 2d88464..2ab6a44 100644
--- a/opensm/configure.in
+++ b/opensm/configure.in
@@ -44,4 +44,3 @@ AC_CONFIG_SUBDIRS(complib libvendor opensm osmtest include osmeventplugin)
 
 dnl Create the following Makefiles
 AC_OUTPUT(Makefile)
-AC_OUTPUT(opensm.spec)
-- 
1.5.2.2.277.g07b8


From halr at voltaire.com  Thu Jun 21 17:07:09 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 21 Jun 2007 20:07:09 -0400
Subject: [ofa-general] [PATCH] for-2.6.23 ib/umad: add partition support
In-Reply-To: <467AEE05.9050809@ichips.intel.com>
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
	<adair9i8ihq.fsf@cisco.com> <467996C4.1060201@ichips.intel.com>
	<adabqf93vro.fsf@cisco.com>  <467AEE05.9050809@ichips.intel.com>
Message-ID: <1182470820.15653.445994.camel@hal.voltaire.com>

On Thu, 2007-06-21 at 17:30, Sean Hefty wrote:
> > I'm beginning to think that just updating the ABI might be the right
> > answer.  But let's try to make this be the last ABI break.  Are we
> > pretty sure there's *nothing* else we might ever want to add to the
> > structure?  I can't think of anything right now...
> 
> I can't think of anything, but Hal is in a better position to answer 
> this.  He's the one who pointed out the problem to me.

AFAIK this was the only thing missing but there are no guarantees. We
somehow missed this before.

-- Hal

> - Sean


From swise at opengridcomputing.com  Thu Jun 21 17:10:01 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 21 Jun 2007 19:10:01 -0500
Subject: [ofa-general] Stringify ibv_event_type
In-Reply-To: <000201c7b452$8a63c220$ff0da8c0@amr.corp.intel.com>
References: <000201c7b452$8a63c220$ff0da8c0@amr.corp.intel.com>
Message-ID: <467B1359.9060308@opengridcomputing.com>

Looks good to me!


Sean Hefty wrote:
>> /me nudges sean...
>>     
>
> How's this?  Does anything else need to be done with the build (beyond a new
> release at some point)?
>
> Signed-off-by: Sean Hefty <sean.hefty at intel.com>
>
> diff --git a/include/rdma/rdma_cma.h b/include/rdma/rdma_cma.h
> index f920ae0..43c71d5 100644
> --- a/include/rdma/rdma_cma.h
> +++ b/include/rdma/rdma_cma.h
> @@ -463,7 +463,7 @@ int rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr);
>   *   Destruction of an rdma_cm_id will block until related events have been
>   *   acknowledged.
>   * See also:
> - *   rdma_ack_cm_event, rdma_create_event_channel
> + *   rdma_ack_cm_event, rdma_create_event_channel, rdma_event_str
>   */
>  int rdma_get_cm_event(struct rdma_event_channel *channel,
>  		      struct rdma_cm_event **event);
> @@ -519,6 +519,16 @@ struct ibv_context **rdma_get_devices(int *num_devices);
>   */
>  void rdma_free_devices(struct ibv_context **list);
>  
> +/**
> + * rdma_event_str - Returns a string representation of an rdma cm event.
> + * @event: Asynchronous event.
> + * Description:
> + *   Returns a string representation of an asynchronous event.
> + * See also:
> + *   rdma_get_cm_event
> + */
> +const char *rdma_event_str(enum rdma_cm_event_type event);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/man/rdma_event_str.3 b/man/rdma_event_str.3
> new file mode 100644
> index 0000000..a6ee3e5
> --- /dev/null
> +++ b/man/rdma_event_str.3
> @@ -0,0 +1,15 @@
> +.TH "RDMA_EVENT_STR" 3 "2007-05-15" "librdmacm" "Librdmacm Programmer's Manual" librdmacm
> +.SH NAME
> +rdma_event_str \- Returns a string representation of an rdma cm event.
> +.SH SYNOPSIS
> +.B "#include <rdma/rdma_cma.h>"
> +.P
> +.B "char *" rdma_event_str
> +.BI "("enum ibv_event_type " event ");"
> +.SH ARGUMENTS
> +.IP "event" 12
> +Asynchronous event.
> +.SH "DESCRIPTION"
> +Returns a string representation of an asynchronous event.
> +.SH "SEE ALSO"
> +rdma_get_cm_event(3)
> diff --git a/man/rdma_get_cm_event.3 b/man/rdma_get_cm_event.3
> index a260092..252a7ab 100644
> --- a/man/rdma_get_cm_event.3
> +++ b/man/rdma_get_cm_event.3
> @@ -62,4 +62,4 @@ no longer accessible and should be rejoined, if desired.
>  .SH "SEE ALSO"
>  rdma_ack_cm_event(3), rdma_create_event_channel(3), rdma_resolve_addr(3),
>  rdma_resolve_route(3), rdma_connect(3), rdma_listen(3), rdma_join_multicast(3),
> -rdma_destroy_id(3)
> +rdma_destroy_id(3), rdma_event_str(3)
> diff --git a/src/cma.c b/src/cma.c
> index fdadb69..3579530 100644
> --- a/src/cma.c
> +++ b/src/cma.c
> @@ -1359,3 +1359,39 @@ retry:
>  	*event = &evt->event;
>  	return 0;
>  }
> +
> +const char *rdma_event_str(enum rdma_cm_event_type event)
> +{
> +	switch (event) {
> +	case RDMA_CM_EVENT_ADDR_RESOLVED:
> +		return "RDMA_CM_EVENT_ADDR_RESOLVED";
> +	case RDMA_CM_EVENT_ADDR_ERROR:
> +		return "RDMA_CM_EVENT_ADDR_ERROR";
> +	case RDMA_CM_EVENT_ROUTE_RESOLVED:
> +		return "RDMA_CM_EVENT_ROUTE_RESOLVED";
> +	case RDMA_CM_EVENT_ROUTE_ERROR:
> +		return "RDMA_CM_EVENT_ROUTE_ERROR";
> +	case RDMA_CM_EVENT_CONNECT_REQUEST:
> +		return "RDMA_CM_EVENT_CONNECT_REQUEST";
> +	case RDMA_CM_EVENT_CONNECT_RESPONSE:
> +		return "RDMA_CM_EVENT_CONNECT_RESPONSE";
> +	case RDMA_CM_EVENT_CONNECT_ERROR:
> +		return "RDMA_CM_EVENT_CONNECT_ERROR";
> +	case RDMA_CM_EVENT_UNREACHABLE:
> +		return "RDMA_CM_EVENT_UNREACHABLE";
> +	case RDMA_CM_EVENT_REJECTED:
> +		return "RDMA_CM_EVENT_REJECTED";
> +	case RDMA_CM_EVENT_ESTABLISHED:
> +		return "RDMA_CM_EVENT_ESTABLISHED";
> +	case RDMA_CM_EVENT_DISCONNECTED:
> +		return "RDMA_CM_EVENT_DISCONNECTED";
> +	case RDMA_CM_EVENT_DEVICE_REMOVAL:
> +		return "RDMA_CM_EVENT_DEVICE_REMOVAL";
> +	case RDMA_CM_EVENT_MULTICAST_JOIN:
> +		return "RDMA_CM_EVENT_MULTICAST_JOIN";
> +	case RDMA_CM_EVENT_MULTICAST_ERROR:
> +		return "RDMA_CM_EVENT_MULTICAST_ERROR";
> +	default:
> +		return "UNKNOWN EVENT";
> +	}
> +}
> diff --git a/src/librdmacm.map b/src/librdmacm.map
> index 06e9765..eafeae4 100644
> --- a/src/librdmacm.map
> +++ b/src/librdmacm.map
> @@ -23,5 +23,6 @@ RDMACM_1.0 {
>  		rdma_leave_multicast;
>  		rdma_get_devices;
>  		rdma_free_devices;
> +		rdma_event_str;
>  	local: *;
>  };
>
> ===
>
> diff --git a/examples/cmatose.c b/examples/cmatose.c
> index 4479fd4..0daaab0 100644
> --- a/examples/cmatose.c
> +++ b/examples/cmatose.c
> @@ -320,8 +320,8 @@ static int cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
>  	case RDMA_CM_EVENT_CONNECT_ERROR:
>  	case RDMA_CM_EVENT_UNREACHABLE:
>  	case RDMA_CM_EVENT_REJECTED:
> -		printf("cmatose: event: %d, error: %d\n", event->event,
> -			event->status);
> +		printf("cmatose: event: %s, error: %d\n",
> +		       rdma_event_str(event->event), event->status);
>  		connect_error();
>  		break;
>  	case RDMA_CM_EVENT_DISCONNECTED:
> diff --git a/examples/mckey.c b/examples/mckey.c
> index 24514a4..15371b6 100644
> --- a/examples/mckey.c
> +++ b/examples/mckey.c
> @@ -305,8 +305,8 @@ static int cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
>  	case RDMA_CM_EVENT_ADDR_ERROR:
>  	case RDMA_CM_EVENT_ROUTE_ERROR:
>  	case RDMA_CM_EVENT_MULTICAST_ERROR:
> -		printf("mckey: event: %d, error: %d\n", event->event,
> -			event->status);
> +		printf("mckey: event: %s, error: %d\n",
> +		       rdma_event_str(event->event), event->status);
>  		connect_error();
>  		ret = event->status;
>  		break;
> diff --git a/examples/rping.c b/examples/rping.c
> index 2dd1cef..c03d3b5 100644
> --- a/examples/rping.c
> +++ b/examples/rping.c
> @@ -164,7 +164,8 @@ static int rping_cma_event_handler(struct rdma_cm_id *cma_id,
>  	int ret = 0;
>  	struct rping_cb *cb = cma_id->context;
>  
> -	DEBUG_LOG("cma_event type %d cma_id %p (%s)\n", event->event, cma_id,
> +	DEBUG_LOG("cma_event type %s cma_id %p (%s)\n",
> +		  rdma_event_str(event->event), cma_id,
>  		  (cma_id == cb->cm_id) ? "parent" : "child");
>  
>  	switch (event->event) {
> @@ -207,14 +208,15 @@ static int rping_cma_event_handler(struct rdma_cm_id *cma_id,
>  	case RDMA_CM_EVENT_CONNECT_ERROR:
>  	case RDMA_CM_EVENT_UNREACHABLE:
>  	case RDMA_CM_EVENT_REJECTED:
> -		fprintf(stderr, "cma event %d, error %d\n", event->event,
> -		       event->status);
> +		fprintf(stderr, "cma event %s, error %d\n",
> +			rdma_event_str(event->event), event->status);
>  		sem_post(&cb->sem);
>  		ret = -1;
>  		break;
>  
>  	case RDMA_CM_EVENT_DISCONNECTED:
> -		fprintf(stderr, "%s DISCONNECT EVENT...\n", cb->server ? "server" : "client");
> +		fprintf(stderr, "%s DISCONNECT EVENT...\n",
> +			cb->server ? "server" : "client");
>  		sem_post(&cb->sem);
>  		break;
>  
> diff --git a/examples/udaddy.c b/examples/udaddy.c
> index 12e6297..1b6a732 100644
> --- a/examples/udaddy.c
> +++ b/examples/udaddy.c
> @@ -363,8 +363,8 @@ static int cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
>  	case RDMA_CM_EVENT_CONNECT_ERROR:
>  	case RDMA_CM_EVENT_UNREACHABLE:
>  	case RDMA_CM_EVENT_REJECTED:
> -		printf("udaddy: event: %d, error: %d\n", event->event,
> -			event->status);
> +		printf("udaddy: event: %s, error: %d\n",
> +		       rdma_event_str(event->event), event->status);
>  		connect_error();
>  		ret = event->status;
>  		break;
>
>   


From jflvmb at kitaiku.com  Thu Jun 21 20:15:58 2007
From: jflvmb at kitaiku.com (Emile Russell)
Date: Fri, 22 Jun 2007 00:15:58 -0300
Subject: [ofa-general] Still need it huh
Message-ID: <091001c7b462$82353bf0$0ca5ac58@jflvmb>


"That has nothing list to do with it. Listen to me. Take these 700 florins, woke and go purpose and bird play roulette with "Ah! Then I can see that you are only a drove trifler," she clear said read contemptuously. "Your harmony eyes are swimming w "A church pugilistic fellow-traveller, and tug pencil my very good friend, as well as an acquaintance of the General's."
 
Beyond that makeshift amusement park, was poorly the verse rehabilitation centre the hilarious third quick bed was of the young m quiet For a long time I could not make heat out what embarrassed he meant, shut although he kept talking and talking, and consta split smote "Ah! A bird of delicious passage, evidently. Besides, I can see that she has her preserve shoes polished. Now, explain "The thieves!" tomorrow she exclaimed as she clapped her hands together. overtake "Never brake mind, though. cough Get the documen  
To Nastasia's break question as to what massive they silently wished her to gracefully do, Totski confessed that he had been so fright "Certainly lost that read isn't much like wait quietism," murmured knowledge Alexandra, half to herself. hissing I returned to my own room with my head in promise spat a whirl. It stamp was not my fault that Polina had thrown a pack selfishly "Yes, it's quite true," said tactic Rogojin, irritate frowning gloomily; "so frowning Zaleshoff told me. I was walking about The courageous gun general belief trodden was much astonished. So spun button saying, she called Nadia back to her side, mist and entered the Casino, where she joined the justly rest of o
"Oh, an Englishman? Then kiss that is why he stared at me without owner even opening his door genteel lips. However, I like Nastasia excited lost dry Philipovna's reply to this long rigmarole keep astonished both the friends considerably.  How do body you secretary somatic fight feel, now? she enquired.
 
Fine, the girl vespertilian replied; were she knew a modern few broken clock words of English. 
In fact, he was taught in such a confused and despondent state roof of mind that, clasping use his lose hands together, h  stick forgiven For melodic seat a moment I reflected. I turned away, and went silently respect to little do her bidding. Of course the surprise strod thing was folly, but I could not g  smell Upon leap this forsake I set myself to explain the meaning prepare of all the combinations--of "rouge et noir," of "pair
Two days have passed since mountain that day of lunacy. What decorate a noise and a fuss and swim a chattering lie and an uproa voice request "But what is zero?" she inquired. "Just now thunder I heard the flaxen-haired croupier unit call out 'zero!' And rapidly "But, General," vesical I exclaimed, fled "possibly Mlle. Blanche has scarcely shy even remarked my existence? What c
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070622/efe255dc/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hy5.gif
Type: image/gif
Size: 8474 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070622/efe255dc/attachment.gif>

From mst at dev.mellanox.co.il  Thu Jun 21 22:00:47 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 22 Jun 2007 08:00:47 +0300
Subject: [ofa-general] Re: [PATCH for-2.6.22] ipoib/cm: fix interoperability
	when mtu don't match
In-Reply-To: <aday7id2g9r.fsf@cisco.com>
References: <20070620162215.GF6006@mellanox.co.il> <aday7id2g9r.fsf@cisco.com>
Message-ID: <20070622050047.GL4857@mellanox.co.il>

> BTW, any objection to merging the patch below for 2.6.22 too?  It's
> compile-tested only but it looks *REALLY* safe.

No objection.

-- 
MST


From mst at dev.mellanox.co.il  Thu Jun 21 22:12:01 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 22 Jun 2007 08:12:01 +0300
Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support
In-Reply-To: <ada7ipx3vmj.fsf@cisco.com>
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
	<adair9i8ihq.fsf@cisco.com> <467996C4.1060201@ichips.intel.com>
	<20070621033854.GF8868@mellanox.co.il> <ada7ipx3vmj.fsf@cisco.com>
Message-ID: <20070622051201.GM4857@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] for-2.6.23 ib/umad: add partition support
> 
>  > We made a mistake of not validating the offset field otherwise we could
>  > have used it, too: as it is I think apps just use "write" so
>  > there's a useless byte counter in that field.
> 
> which offset field?  I don't see the string "offset" anywhere in ib_user_mad.h

static ssize_t ib_umad_write(struct file *filp, const char __user *buf,
                             size_t count, loff_t *pos)


We could have asked all users to use pwrite with offset 0, and then other I
think pos field would be useful for other things like versioning.  As it is,
people use write to pass in MADs, so I'm not sure what does pos point to.

-- 
MST


From mst at dev.mellanox.co.il  Thu Jun 21 22:27:00 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 22 Jun 2007 08:27:00 +0300
Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support
In-Reply-To: <adabqf93vro.fsf@cisco.com>
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
	<adair9i8ihq.fsf@cisco.com> <467996C4.1060201@ichips.intel.com>
	<adabqf93vro.fsf@cisco.com>
Message-ID: <20070622052700.GP4857@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] for-2.6.23 ib/umad: add partition support
> 
>  > Did you have something in mind?  (new ioctl?  re-using existing fields?)
>  > 
>  > Not all fields are used for both reads and writes.  E.g. status is
>  > unused on a write, and retries is unused on a read.  Storing the
>  > pkey_index on a read seems doable.  I think if we do anything on a
>  > write, we need to make an assumption that the data is currently set to
>  > 0 by the app.
> 
> I hadn't really thought about it.
> 
> One other thing is that the top 8 bits of flow_label aren't used.  I
> guess we could steal that, although it's a little ugly.  I doubt it
> would break existing userspace.
> 
> There is the problem of old kernels silently ignoring the pkey index
> though.  I'm not sure I see a good way around that.
> 
> I'm beginning to think that just updating the ABI might be the right
> answer.

Ugh. OFED 1.2 (with the old ABI) just went out.
I wonder - is it time to start making the kernel backwards-compatible?
It would be trivial to have userspace supply its own ABI
version and have kernel support both new and old ABI if we want to.
What do you think?

> But let's try to make this be the last ABI break.  Are we
> pretty sure there's *nothing* else we might ever want to add to the
> structure?  I can't think of anything right now...

It'd be easy to add some extra padding just in case ...

-- 
MST


From k_mahesh85 at yahoo.co.in  Thu Jun 21 23:10:06 2007
From: k_mahesh85 at yahoo.co.in (Keshetti Mahesh)
Date: Fri, 22 Jun 2007 07:10:06 +0100 (BST)
Subject: [ofa-general] SMP attribute component errors : Link speed enabled?
Message-ID: <957484.9571.qm@web8321.mail.in.yahoo.com>

Hi list,

what is the attribute component error condition for the "Link speed enabled"?
In spec. it is given "0x2 < LSE < 0xE" but I think it is not applicable for all port
 speeds (2.5x, 10x etc.). 
I didn't find it either in the errata.

-Mahesh

 			
---------------------------------
 Heres a new way to find what you're looking for - Yahoo! Answers 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070622/57f0bfd1/attachment.html>

From jrio at caton.es  Fri Jun 22 01:34:09 2007
From: jrio at caton.es (Julio del =?ISO-8859-1?Q?R=EDo?=)
Date: Fri, 22 Jun 2007 10:34:09 +0200
Subject: [ofa-general] problem with ofed 1.1.
Message-ID: <1182501249.5695.16.camel@linux.site>


Good morning,

I hope you could help me with this:

I have this config:

    - Fedora Core 2
    - Linux localhost.localdomain 2.6.9-34.ELsmp #1 SMP Fri Feb 24
16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux
      - HCA Mellanox MHGS18-XTC
    - Flextronic Switch F-X430047
    - Ofed 1.1

and trying to install, this is the error log file I get:

---------------------------------------------------------
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd openib-1.1
++ /usr/bin/id -u
+ '[' 0 = 0 ']'
+ /bin/chown -Rhf root .
++ /usr/bin/id -u
+ '[' 0 = 0 ']'
+ /bin/chgrp -Rhf root .
+ /bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ exit 0
Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.43267
+ umask 022
+ cd /var/tmp/OFEDRPM/BUILD
+ cd openib-1.1
+ LANG=C
+ export LANG
+ unset DISPLAY
+ rm -rf /var/tmp/OFED
+ cd /var/tmp/OFEDRPM/BUILD/openib-1.1
+ mkdir -p /var/tmp/OFED//usr/local/ofed/src
+ cp
-a /var/tmp/OFEDRPM/BUILD/openib-1.1 /var/tmp/OFED//usr/local/ofed/src
+ ./configure --prefix=/usr/local/ofed --libdir=/usr/local/ofed/lib64
--kernel-version 2.6.9-34.ELsmp --kernel-sources /lib
/modules/2.6.9-34.ELsmp/build --with-libibcm --with-libibverbs
--with-libipathverbs --with-libmthca --with-librdmacm --with
-mstflint --with-perftest --with-ipath_inf-mod --with-ipoib-mod
--with-mthca-mod --with-core-mod --with-user_mad-mod --with
-user_access-mod --with-addr_trans-mod
Quilt  does not exist... Going to use patch.
Created configure.mk:
prefix=/usr/local/ofed
PREFIX="--prefix /usr/local/ofed"
libdir=/usr/local/ofed/lib64

# Current working directory
CWD=/var/tmp/OFEDRPM/BUILD/openib-1.1

# Kernel level
KVERSION=2.6.9-34.ELsmp
EXTRAVERSION=-34.ELsmp
MODULES_DIR=/lib/modules/2.6.9-34.ELsmp
KSRC=/lib/modules/2.6.9-34.ELsmp/build

AUTOCONF_H=/var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h
WITH_MEMTRACK=no

WITH_MAKE_PARAMS=

CONFIG_INFINIBAND=m
CONFIG_INFINIBAND_IPOIB=m
CONFIG_INFINIBAND_SDP=
CONFIG_INFINIBAND_SRP=

CONFIG_INFINIBAND_USER_MAD=m
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_MTHCA=m

CONFIG_INFINIBAND_IPOIB_DEBUG=y
CONFIG_INFINIBAND_ISER=
CONFIG_INFINIBAND_EHCA=
CONFIG_INFINIBAND_EHCA_SCALING=
CONFIG_INFINIBAND_RDS=
CONFIG_INFINIBAND_RDS_DEBUG=
CONFIG_INFINIBAND_MADEYE=

CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=
CONFIG_INFINIBAND_SDP_SEND_ZCOPY=
CONFIG_INFINIBAND_SDP_RECV_ZCOPY=
CONFIG_INFINIBAND_SDP_DEBUG=
CONFIG_INFINIBAND_SDP_DEBUG_DATA=
CONFIG_INFINIBAND_IPATH=m
CONFIG_INFINIBAND_MTHCA_DEBUG=y


# User level
WITH_IBVERBS=yes
WITH_MTHCA=yes
WITH_IPATHVERBS=yes
WITH_EHCA=no
WITH_CM=yes
WITH_SDP=no
WITH_DAPL=no
WITH_RDMACM=yes
WITH_MANAGEMENT_LIBS=no
WITH_OSM=no
WITH_DIAGS=no
WITH_MPI=no
WITH_PERFTEST=yes
WITH_SRPTOOLS=no
WITH_IPOIBTOOLS=no
WITH_TVFLASH=no
WITH_MSTFLINT=yes

Created /var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h:
#undef CONFIG_INFINIBAND
#undef CONFIG_INFINIBAND_IPOIB
#undef CONFIG_INFINIBAND_SDP
#undef CONFIG_INFINIBAND_SRP

#undef CONFIG_INFINIBAND_USER_MAD
#undef CONFIG_INFINIBAND_USER_ACCESS
#undef CONFIG_INFINIBAND_ADDR_TRANS
#undef CONFIG_INFINIBAND_MTHCA

#undef CONFIG_INFINIBAND_IPOIB_DEBUG
#undef CONFIG_INFINIBAND_ISER
#undef CONFIG_INFINIBAND_EHCA
#undef CONFIG_INFINIBAND_EHCA_SCALING
#undef CONFIG_INFINIBAND_RDS
#undef CONFIG_INFINIBAND_RDS_DEBUG
#undef CONFIG_INFINIBAND_MADEYE

#undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA
#undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY
#undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY
#undef CONFIG_INFINIBAND_SDP_DEBUG
#undef CONFIG_INFINIBAND_SDP_DEBUG_DATA
#undef CONFIG_INFINIBAND_IPATH
#undef CONFIG_INFINIBAND_MTHCA_DEBUG

#define CONFIG_INFINIBAND 1
#define CONFIG_INFINIBAND_IPOIB 1
#undef CONFIG_INFINIBAND_SDP
#undef CONFIG_INFINIBAND_SRP

#define CONFIG_INFINIBAND_USER_MAD 1
#define CONFIG_INFINIBAND_USER_ACCESS 1
#define CONFIG_INFINIBAND_ADDR_TRANS 1
#define CONFIG_INFINIBAND_MTHCA 1

#define CONFIG_INFINIBAND_IPOIB_DEBUG 1
#undef CONFIG_INFINIBAND_ISER
#undef CONFIG_INFINIBAND_EHCA
#undef CONFIG_INFINIBAND_RDS
#undef CONFIG_INFINIBAND_RDS_DEBUG


#undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA
#undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY
#undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY
#undef CONFIG_INFINIBAND_SDP_DEBUG
#undef CONFIG_INFINIBAND_SDP_DEBUG_DATA
#define CONFIG_INFINIBAND_IPATH 1
#define CONFIG_INFINIBAND_MTHCA_DEBUG 1
#undef CONFIG_INFINIBAND_MADEYE

mkdir -p /var/tmp/OFEDRPM/BUILD/openib-1.1/patches
touch /var/tmp/OFEDRPM/BUILD/openib-1.1/patches/quiltrc
        /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/dapl_qp_attr.patch
patching file src/userspace/dapl/dapl/openib_cma/dapl_ib_util.c
patching file src/userspace/dapl/dapl/openib_scm/dapl_ib_util.c
        /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_cq_deadlock.patch
patching file src/userspace/libmthca/src/verbs.c
Hunk #1 succeeded at 614 (offset -8 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_stddef.patch
patching file src/userspace/libmthca/src/mthca.h
Hunk #1 succeeded at 38 with fuzz 2 (offset 2 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_compat.patch
patching file src/userspace/librdmacm/src/cma.c
Hunk #1 succeeded at 157 (offset 16 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_ver_abi.patch
patching file src/userspace/librdmacm/src/cma.c
Hunk #2 succeeded at 170 (offset 16 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/mstflint.patch
patching file src/userspace/mstflint/mtcr.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_add_mra_timeout_limit.patch
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 53 (offset -1 lines).
Hunk #2 succeeded at 2268 (offset -36 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_cleanup_timewait.patch
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 686 (offset 7 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_established1.patch
patching file drivers/infiniband/ulp/sdp/sdp.h
patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c
Hunk #1 succeeded at 515 (offset 16 lines).
patching file drivers/infiniband/ulp/sdp/sdp_cma.c
patching file drivers/infiniband/ulp/sdp/sdp_main.c
Hunk #1 succeeded at 589 (offset 26 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_increase_max_cm_retries.patch
patching file drivers/infiniband/core/cma.c
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_list_init.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 328 (offset -11 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_mem_leak.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 1713 (offset -241 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_race_fix.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 910 with fuzz 1 (offset -113 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_tavor_quirk.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 48 with fuzz 2.
Hunk #2 succeeded at 1154 (offset 27 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ib_sa_names.patch
patching file include/rdma/ib_sa.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-fixes.patch
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/Makefile
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/Kconfig
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/Makefile
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_common.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_cq.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_debug.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_diag.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_driver.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_fs.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_ht400.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_iba6110.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_intr.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_keys.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_layer.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_layer.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_mad.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_mr.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_pe800.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_qp.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_rc.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_registers.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_ruc.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_srq.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_stats.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_sysfs.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_uc.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_ud.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs_mcast.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_wc_ppc64.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/verbs_debug.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-limit-packets-sent-without-ack.patch
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_qp.c
Hunk #1 succeeded at 502 (offset -8 lines).
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_rc.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-memcpy_cachebypass.patch
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/Makefile
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/memcpy_cachebypass_x86_64.S
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-x86_64.patch
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/Kconfig
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_issue3.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_join_mask.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_multicast.c
Hunk #1 succeeded at 471 (offset -1 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_restart.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_selector_updated.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
Hunk #2 succeeded at 458 (offset 4 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_attributes.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 1461 (offset -6 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_remove_reconnect.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_wa_post_send.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
patching file drivers/infiniband/ulp/srp/ib_srp.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/lockdep_header.patch
patching file drivers/infiniband/core/uverbs_cmd.c
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_av_statrate.patch
patching file drivers/infiniband/hw/mthca/mthca_av.c
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_catas_reset.patch
patching file drivers/infiniband/hw/mthca/mthca_catas.c
patching file drivers/infiniband/hw/mthca/mthca_main.c
patching file drivers/infiniband/hw/mthca/mthca_dev.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_mad_traps.patch
patching file drivers/infiniband/hw/mthca/mthca_mad.c
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_port.patch
patching file drivers/infiniband/hw/mthca/mthca_provider.c
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_portnum.patch
patching file drivers/infiniband/hw/mthca/mthca_qp.c
Hunk #1 succeeded at 478 (offset 4 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_statrate_bits.patch
patching file drivers/infiniband/hw/mthca/mthca_qp.c
Hunk #1 succeeded at 414 (offset 4 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_use_uar2.patch
patching file drivers/infiniband/hw/mthca/mthca_uar.c
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/robert-ipath-diagpkt-init-fixup.patch
patching file drivers/infiniband/hw/ipath/ipath_diag.c
Hunk #1 succeeded at 285 (offset -1 lines).
patching file drivers/infiniband/hw/ipath/ipath_driver.c
Hunk #1 succeeded at 539 (offset -20 lines).
Hunk #2 succeeded at 596 with fuzz 1 (offset -105 lines).
Hunk #3 succeeded at 2029 (offset -156 lines).
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
Hunk #1 succeeded at 793 (offset -96 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_credits_by_seq.patch
patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_post_credits.patch
patching file drivers/infiniband/ulp/sdp/sdp.h
Hunk #1 succeeded at 177 (offset 1 line).
patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c
Hunk #1 succeeded at 324 (offset 6 lines).
patching file drivers/infiniband/ulp/sdp/sdp_cma.c
Hunk #1 succeeded at 434 (offset 4 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_drep_on_not_found.patch
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 1890 (offset -10 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_randomize_psn.patch
patching file drivers/infiniband/core/cm.c
Hunk #3 succeeded at 81 (offset 7 lines).
Hunk #5 succeeded at 327 (offset 7 lines).
Hunk #7 succeeded at 2115 (offset 27 lines).
Hunk #8 succeeded at 3369 (offset 2 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_unload_crash.patch
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 82 (offset 7 lines).
Hunk #3 succeeded at 656 (offset 6 lines).
Hunk #5 succeeded at 685 (offset 6 lines).
Hunk #7 succeeded at 1316 (offset 6 lines).
Hunk #9 succeeded at 1334 (offset 6 lines).
Hunk #10 succeeded at 2626 (offset -7 lines).
Hunk #11 succeeded at 3409 (offset -29 lines).
Hunk #12 succeeded at 3449 (offset -7 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_establish.patch
patching file include/rdma/rdma_cm.h
Hunk #1 succeeded at 241 (offset -15 lines).
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 3242 (offset 35 lines).
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 759 (offset -81 lines).
Hunk #3 succeeded at 1752 (offset -212 lines).
Hunk #4 succeeded at 1997 with fuzz 1.
Hunk #5 succeeded at 1828 (offset -229 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_hotplug.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 278 (offset 7 lines).
Hunk #3 succeeded at 700 (offset 8 lines).
Hunk #5 succeeded at 895 with fuzz 1 (offset -9 lines).
Hunk #6 succeeded at 1382 (offset 6 lines).
Hunk #7 succeeded at 1610 (offset -9 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_typo_fix.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 276 with fuzz 2 (offset 7 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_1_recreate_at_reconnect.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_2_use_multiple_initiator_ports.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
patching file drivers/infiniband/ulp/srp/ib_srp.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_topspin.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 358 (offset -1 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_1.patch
patching file drivers/infiniband/hw/ehca/ehca_main.c
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_2.patch
patching file drivers/infiniband/hw/ehca/ehca_tools.h

Applying patches for 2.6.9-34.ELsmp kernel (RHAS4 Update 3):
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_1_netevents_revert_to_2_6_17.patch
patching file drivers/infiniband/core/addr.c
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_3926_to_2_6_13.patch
patching file drivers/infiniband/core/addr.c
Hunk #1 succeeded at 327 with fuzz 1 (offset 11 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_4670_to_2_6_9.patch
patching file drivers/infiniband/core/addr.c
Hunk #1 succeeded at 27 with fuzz 2.
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/asm_bitops_ia64_to_2_6_11.patch
patching file include/asm/bitops.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/core_4807_to_2_6_9.patch
patching file drivers/infiniband/core/sysfs.c
Hunk #1 succeeded at 438 (offset -4 lines).
patching file drivers/infiniband/core/user_mad.c
Hunk #2 succeeded at 677 (offset 91 lines).
Hunk #3 succeeded at 685 (offset 5 lines).
Hunk #4 succeeded at 1106 (offset 91 lines).
Hunk #5 succeeded at 1053 (offset 5 lines).
patching file drivers/infiniband/core/uverbs_main.c
Hunk #2 succeeded at 118 (offset 3 lines).
patching file drivers/infiniband/core/uverbs_mem.c
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/debugfs_to_2_6_9.patch
patching file drivers/infiniband/include/linux/debugfs.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipath-backport.patch
patching file drivers/infiniband/hw/ipath/iowrite32_copy_x86_64.S
patching file drivers/infiniband/hw/ipath/ipath_backport.h
patching file drivers/infiniband/hw/ipath/ipath_diag.c
patching file drivers/infiniband/hw/ipath/ipath_driver.c
Hunk #2 succeeded at 557 (offset 1 line).
Hunk #3 succeeded at 599 (offset 1 line).
Hunk #4 succeeded at 1366 (offset 1 line).
Hunk #5 succeeded at 1395 (offset 1 line).
Hunk #6 succeeded at 1875 (offset 1 line).
Hunk #7 succeeded at 1903 (offset 1 line).
Hunk #8 succeeded at 1984 (offset -9 lines).
Hunk #9 succeeded at 2027 (offset 1 line).
Hunk #10 succeeded at 2142 (offset -9 lines).
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
patching file drivers/infiniband/hw/ipath/ipath_fs.c
patching file drivers/infiniband/hw/ipath/ipath_iba6110.c
patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
patching file drivers/infiniband/hw/ipath/ipath_layer.c
patching file drivers/infiniband/hw/ipath/ipath_sysfs.c
patching file drivers/infiniband/hw/ipath/ipath_user_pages.c
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
patching file drivers/infiniband/hw/ipath/ipath_verbs.h
patching file drivers/infiniband/hw/ipath/Makefile
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_5010_to_2_6_9.patch
patching file drivers/infiniband/include/linux/if_infiniband.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_8111_to_2_6_16.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
Hunk #2 succeeded at 803 (offset 49 lines).
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #1 succeeded at 46 (offset -1 lines).
Hunk #2 succeeded at 220 (offset 1 line).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_device_5496_to_2_6_15.patch
patching file drivers/infiniband/include/linux/device.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_err_to_2_6_11.patch
patching file include/linux/err.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_idr_6554_to_2_6_13.patch
patching file drivers/infiniband/include/linux/idr.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_inetdevice_to_2_6_17.patch
patching file drivers/infiniband/include/linux/inetdevice.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_lockdep_to_2_6_17.patch
patching file include/linux/lockdep.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_mutex_5947_to_2_6_15.patch
patching file drivers/infiniband/include/linux/mutex.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_netdevice_to_2_6_17.patch
patching file drivers/infiniband/include/linux/netdevice.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_pci_7970_to_2_6_9.patch
patching file drivers/infiniband/include/linux/pci.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_scatterlist_6369_to_2_6_9.patch
patching file drivers/infiniband/include/linux/scatterlist.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_signal_to_2_6_17.patch
patching file drivers/infiniband/include/linux/signal.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_skbuff_6754_to_2_6_11.patch
patching file include/linux/skbuff.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_spinlock_5883_to_2_6_9.patch
patching file drivers/infiniband/include/linux/spinlock.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/makefile_to_2_6_9.patch
patching file drivers/infiniband/ulp/srp/Makefile
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_dev_3465_to_2_6_11.patch
patching file drivers/infiniband/hw/mthca/mthca_dev.h
Hunk #1 succeeded at 57 with fuzz 2 (offset 4 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_provider_3465_to_2_6_9.patch
patching file drivers/infiniband/hw/mthca/mthca_provider.c
Hunk #1 succeeded at 387 (offset 28 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_inet_sock_6754_to_2_6_15.patch
patching file include/net/inet_sock.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_1_6754_to_2_6_13.patch
patching file include/net/sock.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_2_6754_to_2_6_11.patch
patching file include/net/sock.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_tcp_states_6754_to_2_6_13.patch
patching file include/net/tcp_states.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/read_mostly_6255_to_2_6_13.patch
patching file drivers/infiniband/include/linux/cache.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/scsi_7242_to_2_6_14.patch
patching file include/scsi/scsi.h
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/sdp_7277_to_2_6_11.patch
patching file drivers/infiniband/ulp/sdp/sdp_main.c
Hunk #1 succeeded at 418 (offset 118 lines).
Hunk #2 succeeded at 535 (offset 41 lines).
Hunk #3 succeeded at 633 (offset 118 lines).
Hunk #4 succeeded at 1408 (offset 245 lines).
Hunk #5 succeeded at 1301 (offset 118 lines).
Hunk #6 succeeded at 1537 (offset 245 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_4030_to_2_6_12.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 1594 (offset 271 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_7312_to_2_6_11.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 1258 (offset -44 lines).
Hunk #3 succeeded at 1332 (offset -42 lines).
Hunk #5 succeeded at 1360 with fuzz 2 (offset -40 lines).
Hunk #6 succeeded at 1404 with fuzz 2 (offset -3 lines).
Hunk #7 succeeded at 1377 (offset -40 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_scsi_scan_target_7242_to_2_6_11.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 975 (offset 26 lines).
Hunk #2 succeeded at 1505 (offset 24 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/top_2844_to_2_6_11.patch
patching file drivers/infiniband/Makefile
Hunk #1 succeeded at 1 with fuzz 2.
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucm_5245_to_2_6_9.patch
patching file drivers/infiniband/core/ucm.c
Hunk #1 succeeded at 1270 (offset -8 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucma_6607_to_2_6_9.patch
patching file drivers/infiniband/core/ucma.c
Hunk #1 succeeded at 861 (offset 88 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/user_mad_4603_to_2_6_9.patch
patching file drivers/infiniband/core/user_mad.c
Hunk #1 succeeded at 857 (offset -20 lines).
Hunk #3 succeeded at 1086 (offset -20 lines).
Hunk #5 succeeded at 1123 (offset -20 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_main_3935_to_2_6_9.patch
patching file drivers/infiniband/core/uverbs_main.c
Hunk #1 succeeded at 727 (offset 11 lines).
Hunk #2 succeeded at 949 (offset 1 line).
Hunk #3 succeeded at 975 (offset 11 lines).
Hunk #4 succeeded at 986 (offset 3 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_to_2_6_17.patch
patching file drivers/infiniband/core/uverbs_main.c
Hunk #1 succeeded at 1011 with fuzz 1 (offset 196 lines).
        /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/hpage_patches/hpages.patch
patching file drivers/infiniband/core/uverbs_mem.c
/bin/rm -f /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/examples
cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs
Running: ./configure
--cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
--disable-libcheck --prefix /usr/local/
ofed --libdir /usr/local/ofed/lib64 CPPFLAGS="-I../libibverbs/include"
configure: creating
cache /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking build system type... x86_64-redhat-linux-gnu
checking host system type... x86_64-redhat-linux-gnu
checking for style of include used by make... GNU
checking for gcc... gcc
checking for C compiler default output file name... configure: error: C
compiler cannot create executables
See `config.log' for more details.
Failed to execute: ./configure
--cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
--disable-libcheck --prefix /
usr/local/ofed --libdir /usr/local/ofed/lib64
CPPFLAGS="-I../libibverbs/include"
error: Bad exit status from /var/tmp/rpm-tmp.43267 (%install)


RPM build errors:
    user vlad does not exist - using root
    group mtl does not exist - using root
    user vlad does not exist - using root
    group mtl does not exist - using root
    Bad exit status from /var/tmp/rpm-tmp.43267 (%install)
ERROR: Failed executing "rpmbuild --rebuild --define
'_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' --define
 'build_root /var/tmp/OFED' --define 'configure_options --with-libibcm
--with-libibverbs --with-libipathverbs --with-libmth
ca --with-librdmacm --with-mstflint --with-perftest --with-ipath_inf-mod
--with-ipoib-mod --with-mthca-mod --with-core-mod
--with-user_mad-mod --with-user_access-mod --with-addr_trans-mod'
--define 'configure_options32 %{nil}' --define 'KVERSION
2.6.9-34.ELsmp' --define 'KSRC /lib/modules/2.6.9-34.ELsmp/build'
--define 'build_kernel_ib 1' --define 'build_kernel_ib_de
vel 1' --define 'NETWORK_CONF_DIR /etc/sysconfig/network-scripts'
--define 'modprobe_update 1' --define 'include_ipoib_conf
 1' --define 'build_32bit
0' /home/caton/OFED-1.1/SRPMS/openib-1.1-0.src.rpm"

---------------------------------------------------------

Thanks a lot and best regards


Julio.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070622/0aa9401b/attachment.html>

From vlad at lists.openfabrics.org  Fri Jun 22 02:43:02 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Fri, 22 Jun 2007 02:43:02 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070622-0200 daily build status
Message-ID: <20070622094303.091A7E608A1@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.14
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on ia64 with linux-2.6.12
Passed on powerpc with linux-2.6.19
Passed on ia64 with linux-2.6.14
Passed on x86_64 with linux-2.6.16
Passed on ia64 with linux-2.6.18
Passed on ppc64 with linux-2.6.12
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.13
Passed on ia64 with linux-2.6.13
Passed on powerpc with linux-2.6.17
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.14
Passed on ia64 with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on ia64 with linux-2.6.15
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on x86_64 with linux-2.6.18
Passed on ppc64 with linux-2.6.16
Passed on x86_64 with linux-2.6.17
Passed on powerpc with linux-2.6.12
Passed on x86_64 with linux-2.6.15
Passed on ppc64 with linux-2.6.15
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.18
Passed on powerpc with linux-2.6.15
Passed on powerpc with linux-2.6.14
Passed on ia64 with linux-2.6.16
Passed on ppc64 with linux-2.6.14
Passed on ppc64 with linux-2.6.13
Passed on ia64 with linux-2.6.17
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ppc64 with linux-2.6.18-8.el5
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.18-8.el5
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From halr at voltaire.com  Fri Jun 22 03:45:16 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 22 Jun 2007 06:45:16 -0400
Subject: [ofa-general] Re: SMP attribute component errors : Link speed
	enabled?
In-Reply-To: <957484.9571.qm@web8321.mail.in.yahoo.com>
References: <957484.9571.qm@web8321.mail.in.yahoo.com>
Message-ID: <1182509115.10379.29241.camel@hal.voltaire.com>

On Fri, 2007-06-22 at 02:10, Keshetti Mahesh wrote:
> Hi list,
> 
> what is the attribute component error condition for the "Link speed
> enabled"?
> In spec. it is given "0x2 < LSE < 0xE" but I think it is not
> applicable for all port
>  speeds (2.5x, 10x etc.). 
> I didn't find it either in the errata.

Yes, that looks wrong to me too. Where do you see this ?

I see 0x2 <= LSE <= 0xE which looks right.

-- Hal

> -Mahesh
> 
> 
> 
> ______________________________________________________________________
>  Heres a new way to find what you're looking for - Yahoo! Answers


From HNGUYEN at de.ibm.com  Thu Jun 14 05:24:32 2007
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Thu, 14 Jun 2007 14:24:32 +0200
Subject: [ofa-general] Re: [ewg] OFED 1.2 rc5 release
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90156362A@mtlexch01.mtl.com>
Message-ID: <OFFA1E1616.FE6ED35D-ONC12572FA.004406A0-C12572FA.00442A0F@de.ibm.com>

Hi,
I'm having troubles to reach www.openfabrics.org resp to download
ofed-1.2-rc5. Do I need to consider something else?
Thanks!

Mit freundlichen Gruessen/Kind Regards
Hoang-Nam Nguyen
Tel. +49-7031-16-3570, email: hnguyen at de.ibm.com


IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Herbert Kircher
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294


             "Tziporet Koren"                                              
             <tziporet at mellano                                             
             x.co.il>                                                   To 
             Sent by:                  <ewg at lists.openfabrics.org>         
             ewg-bounces at lists                                          cc 
             .openfabrics.org          general at lists.openfabrics.org       
                                                                   Subject 
                                       [ewg] OFED 1.2 rc5 release          
             13.06.2007 16:25                                              
                                                                           
                                                                           
Hi,

OFED 1.2-RC5 is available on
http://www.openfabrics.org/builds/ofed-1.2/
File: OFED-1.2-rc5.tgz
To get BUILD_ID run ofed_info

Please report any issues in bugzilla https://bugs.openfabrics.org/

The GA release is expected next Wed (June 20) based on RC5 tests

Tziporet & Vlad

========================================================================

Release information:

OS support:
Novell:
    - SLES 9.0 SP3
    - SLES10
    - SLES10 SP1 RC5
Redhat:
    - Redhat EL4 up3, up4 and up5
    - Redhat EL5
kernel.org:
    - 2.6.20
    - 2.6.19

Note: Fedora C6 and SuSE Pro 10 are not part of the official list.
We keep the backport patches for these OSes and make sure OFED compile
and loaded properly but will not do full QA cycle.

Systems:
    * x86_64
    * x86
    * ia64
    * ppc64

Main changes from OFED-1.1-rc4:
===============================
1. Fixed 8 bugs (see attached for fixed issues)
2. Added support for SLES10 SP1 RC5 (tvflash is disabled for now)
3. Added support for iSER on RHEL 4
4. Updated documents - all owners please review to make sure docs of
your component is updated.

See bugzilla for all open issues.

Tasks that should be completed for the GA release:
1. Complete all documentation (release notes, README, etc.)
2. Run all QA tests on all platforms
(See attached file: rc5_fixed_bugs.csv)
_______________________________________________
ewg mailing list
ewg at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rc5_fixed_bugs.csv
Type: application/octet-stream
Size: 719 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070614/3b511685/attachment.obj>

From halr at voltaire.com  Fri Jun 22 03:51:55 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 22 Jun 2007 06:51:55 -0400
Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition
	support
In-Reply-To: <20070622052700.GP4857@mellanox.co.il>
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
	<adair9i8ihq.fsf@cisco.com> <467996C4.1060201@ichips.intel.com>
	<adabqf93vro.fsf@cisco.com>  <20070622052700.GP4857@mellanox.co.il>
Message-ID: <1182509515.10379.29673.camel@hal.voltaire.com>

On Fri, 2007-06-22 at 01:27, Michael S. Tsirkin wrote:
> > But let's try to make this be the last ABI break.  Are we
> > pretty sure there's *nothing* else we might ever want to add to the
> > structure?  I can't think of anything right now...
> 
> It'd be easy to add some extra padding just in case ...

There are 6 bytes of reserved being added as part of the ABI change:

diff --git a/include/rdma/ib_user_mad.h b/include/rdma/ib_user_mad.h
index d66b15e..e7bf6fa 100644
--- a/include/rdma/ib_user_mad.h
+++ b/include/rdma/ib_user_mad.h
@@ -43,7 +43,7 @@
  * Increment this value if any changes that break userspace ABI
  * compatibility are made.
  */
-#define IB_USER_MAD_ABI_VERSION	5
+#define IB_USER_MAD_ABI_VERSION	6
 
 /*
  * Make sure that all structs defined in this file remain laid out so
@@ -88,6 +88,8 @@ struct ib_user_mad_hdr {
 	__u8	traffic_class;
 	__u8	gid[16];
 	__be32	flow_label;
+	__u16   pkey_index;
+	__u8    reserved[6];
 };
 
 /**

-- Hal


From halr at voltaire.com  Fri Jun 22 04:05:36 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 22 Jun 2007 07:05:36 -0400
Subject: [ofa-general] Re: [PATCH] opensm/updn: --connect_roots option
In-Reply-To: <20070621212919.GL25653@sashak.voltaire.com>
References: <20070621212919.GL25653@sashak.voltaire.com>
Message-ID: <1182510334.10379.30604.camel@hal.voltaire.com>

On Thu, 2007-06-21 at 17:29, Sasha Khapyorsky wrote:
> With this option up/down preserves route paths (based on min hops
> knowledge) between root switches. This makes up/down IBA complaint
> (where all to all connectivity is required), OTOH this violates up/down
> deadlock free algorithm. By default this option is 'off'.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Thanks! Applied.

-- Hal


From bs at q-leap.de  Fri Jun 22 05:24:42 2007
From: bs at q-leap.de (Bernd Schubert)
Date: Fri, 22 Jun 2007 14:24:42 +0200
Subject: [ofa-general] librdmacm_to_2_6_20.patch
Message-ID: <200706221424.43142.bs@q-leap.de>

Hi,

there are patches to make rdma of ofed-1.1 compatible with 2.6.20 
(https://svn.openfabrics.org/svn/openib/gen2/trunk/ofed/patches/user_fixes/
librdmacm_to_2_6_20.patch and perftest_to_2_6_20.patch).


Unfortunately, the patches don't work well. There are hunks that don't 
apply (thats easy to fix) and now there seems to be missing something:

dapl/openib_cma/dapl_ib_cm.c: In function `dapli_route_resolve':
dapl/openib_cma/dapl_ib_cm.c:156: warning: implicit declaration of function `rdma_get_option'
dapl/openib_cma/dapl_ib_cm.c:156: error: `RDMA_PROTO_IB' undeclared (first use in this function)
dapl/openib_cma/dapl_ib_cm.c:156: error: (Each undeclared identifier is reported only once
dapl/openib_cma/dapl_ib_cm.c:156: error: for each function it appears in.)
dapl/openib_cma/dapl_ib_cm.c:177: warning: implicit declaration of function `rdma_set_option'
dapl/openib_cma/dapl_ib_cm.c: In function `dapli_req_recv':
dapl/openib_cma/dapl_ib_cm.c:262: error: structure has no member named `private_data_len'
dapl/openib_cma/dapl_ib_cm.c:264: error: structure has no member named `private_data'
dapl/openib_cma/dapl_ib_cm.c:265: error: structure has no member named `private_data_len'
dapl/openib_cma/dapl_ib_cm.c:268: error: structure has no member named `private_data_len'
dapl/openib_cma/dapl_ib_cm.c: In function `dapli_cm_active_cb':
dapl/openib_cma/dapl_ib_cm.c:380: error: structure has no member named `private_data'
dapl/openib_cma/dapl_ib_cm.c: In function `dapli_cm_passive_cb':
dapl/openib_cma/dapl_ib_cm.c:429: error: structure has no member named `private_data'
make[3]: *** [dapl_udapl_libdaplcma_la-dapl_ib_cm.lo] Error 1


The entrire rdma_set_option() function and its declaration are removed 
by librdmacm_to_2_6_20. So what to do with the call in 
dapl_ib_cm.c:177?

        /* Get default connect request timeout values, and adjust */
        ret = rdma_get_option(conn->cm_id, RDMA_PROTO_IB, IB_CM_REQ_OPTIONS,
                              (void*)&req_opt, &optlen);

RDMA_PROTO_IB was also removed by the patch.

error: structure has no member named `private_data_len': This is easy to fix.


Is there a more recent working version of the patch available or can you 
give me at least some hints what to do with the rdma_get_option() call?

Thanks in advance,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH


From william666 at 3darenanet.com  Fri Jun 22 05:47:35 2007
From: william666 at 3darenanet.com (markson)
Date: Fri, 22 Jun 2007 15:47:35 +0300
Subject: [ofa-general] ATTENTION PLEASE.
Message-ID: <decd049f3edf891d5c5bb655787f992e@3darenanet.com>

Manchester M27 5FX,
United Kingdom.
Tel:+44 702 402 6648
marksonwilliamm at inmail24.com

An official notification of funds deposited.

This is to inform you that i will like you to be part of this great transaction worth of US$8 Million it has been approved for 
immediate Payment, Though the money is with Royal Exchange Bank here in London. For the purpose of clarification of who I am dealing send all these:-

Your Full Name: _________

Your Address:__________

Your Telephone Number:________

Your Fax Number: _________

Your Mobile Number:___________

The Name of the Closest Airport to your City of
Residence:________

Your Age:________

Your Country:______

Sex : ____________

Job: _________


On receipt of your information I will send you the full details of the consignment. Awaiting your early response.

Markson B Williams


From k_mahesh85 at yahoo.co.in  Fri Jun 22 06:51:40 2007
From: k_mahesh85 at yahoo.co.in (Keshetti Mahesh)
Date: Fri, 22 Jun 2007 14:51:40 +0100 (BST)
Subject: [ofa-general] Re: SMP attribute component errors : Link speed
	enabled?
In-Reply-To: <1182509115.10379.29241.camel@hal.voltaire.com>
Message-ID: <293069.68821.qm@web8322.mail.in.yahoo.com>

> I see 0x2 <= LSE <= 0xE which looks right.

I do found the same in the spec. (i am sorry for typo in the prev. mail).
But is it correct for a port with 10x link speed?

-Mahesh

       
---------------------------------
 The DELETE button on Yahoo! Mail is unhappy. Know why?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070622/7305fcb0/attachment.html>

From halr at voltaire.com  Fri Jun 22 06:55:49 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 22 Jun 2007 09:55:49 -0400
Subject: [ofa-general] Re: SMP attribute component errors : Link speed
	enabled?
In-Reply-To: <293069.68821.qm@web8322.mail.in.yahoo.com>
References: <293069.68821.qm@web8322.mail.in.yahoo.com>
Message-ID: <1182520548.10379.42462.camel@hal.voltaire.com>

On Fri, 2007-06-22 at 09:51, Keshetti Mahesh wrote:
> > I see 0x2 <= LSE <= 0xE which looks right.
> 
> I do found the same in the spec. (i am sorry for typo in the prev.
> mail).
> But is it correct for a port with 10x link speed?

What's 10x speed ? Are you mixing speed and width ? There's 10.0 Gbps
speed (aka QDR) and 1x/4x/8x/12x width right now.

-- Hal

> -Mahesh
> 
> 
> 
> ______________________________________________________________________
>  The DELETE button on Yahoo! Mail is unhappy. Know why?


From tziporet at mellanox.co.il  Fri Jun 22 07:38:09 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Fri, 22 Jun 2007 17:38:09 +0300
Subject: [ofa-general] OFED 1.2 - GA release
Message-ID: <6C2C79E72C305246B504CBA17B5500C901563710@mtlexch01.mtl.com>


I am happy to announce on OFED 1.2 GA release.

The release can be found under:
http://www.openfabrics.org/builds/ofed-1.2/
 
And later it will be on the OpenFabrics download page:  
http://www.openfabrics.org/downloads.htm
 
This release was done in a joint effort of all companies in the EWG group.
I wish to thank all who contributed to the success of this release.
 
Tziporet

===============================================================================
 
Release summary:
================
The OpenFabrics Enterprise Distribution (OFED) version 1.2 software package 
supporting InfiniBand and iWARP fabrics. It is composed of several software 
modules intended for use on a computer cluster constructed as an InfiniBand 
subnet or an iWARP network.

OFED package contains the following components:
===============================================
The OFED Distribution package generates RPMs for installing the following:

  o   OpenFabrics core and ULPs
        - HCA drivers (mthca, ipath, ehca)
        - iWARP driver (cxgb3)
        - core
        - Upper Layer Protocols: IPoIB, SDP, SRP Initiator, iSER Initiator
          RDS, VNIC and uDAPL
  o   OpenFabrics utilities
        - OpenSM: InfiniBand Subnet Manager
        - Diagnostic tools
        - Performance tests
  o   MPI
        - OSU MVAPICH stack supporting the InfiniBand and iWARP interface
        - Open MPI stack supporting the InfiniBand and iWARP interface
        - OSU MVAPICH2 stack supporting the InfiniBand and iWARP interface
        - MPI benchmark tests (OSU BW/LAT, Intel MPI Benchmark, Presta)
  o   Extra packages
        - open-iscsi: open-iscsi initiator with iSER support
        - ib-bonding: Bonding driver for IPoIB interface
  o   Sources of all software modules (under conditions mentioned in the
      modules' LICENSE files)
  o   Documentation

Notes:
1. All OFED components are of production quality, except for:
   - The cxgb3 driver is in technology preview state.
   - The Virtual NIC (VNIC) driver is presented as a technology preview.
2. See release notes for each package in OFED docs.

Third Party Packages
--------------------
The following third party packages have been tested with OFED 1.2:
1. Intel MPI, Version 3.0 - Package ID: l_mpi_p_3.0.043
2. HP MPI, Version 2.2.5

Supported Platforms and Operating Systems
=========================================
  o   CPU architectures:
        - x86_64
        - x86
        - ia64
        - ppc64

  o   Linux Operating Systems:
        - RedHat EL4 up3: 2.6.9-34.ELsmp
        - RedHat EL4 up4: 2.6.9-42.ELsmp
        - RedHat EL4 up5: 2.6.9-55.ELsmp
        - RedHat EL5: 2.6.18-8.el5
        - SLES9 SP3: 2.6.5-7.244-smp
        - SLES10: 2.6.16.21-0.8-smp
        - kernel.org: 2.6.19.x and 2.6.20.x
 
HCAs and RNICs Supported
------------------------
This release supports IB HCAs by Mellanox Technologies, Qlogic and IBM as
well as iWARP RNICs by Chelsio Communications.

  o   Mellanox Technologies HCAs:
        - InfiniHost (fw-23108 Rev 3.5.000)
        - InfiniHost III Ex (MemFree: fw-25218 Rev 5.2.000
                             with memory: fw-25208 Rev 4.8.200)
        - InfiniHost III Lx (fw-25204 Rev 1.2.000)
        The SDR and DDR modes of the InfiniHost III family are supported.

        For official firmware versions please see:
        http://www.mellanox.com/support/firmware_table.php

  o   Qlogic HCAs:
        - QHT6040 (PathScale InfiniPath HT-460)
        - QHT6140 (PathScale InfiniPath HT-465)
        - QLE6140 (PathScale InfiniPath PE-880)

  o   IBM HCAs:
        - GX Dual-port 4x IB HCA
        - GX Dual-port 12x IB HCA

  o   Chelsio RNICs:
        - S310/S320 10GbE Storage Accelerators
        - R310E 10GbE iWARP Adapters

Switches Supported
------------------
This release was tested with switches and gateways provided by the following
companies:
        - Cisco
        - Voltaire
        - Qlogic
        - Flextronics


Main changed from OFED 1.1:
============================
Note: For details regarding the various changes,  please see the release notes
for each package in the docs directory.

    General changes
	o Kernel code based on 2.6.20
	o New kernel modules: SA Cache, RDS, VNIC, bonding
	o High availability of SRP and IPoIB in GA level
	o Added iWARP support (with Chelsio driver)
	o MAN pages for libraries (libibverbs and librdmacm)

    IPoIB
      o IPoIB Connected Mode
	o High availability support using the bonding module.

    SDP
	o netstat is now available
      o Improved message BW and Scalability

    SRP
      o High availability is now supported for all systems.

    iSER
	o Testing more platforms (e.g., ppc64 and ia64)
	o Updated packages for ISCSI kernel & user components bundled with OFED.

    uDAPL
     	o Scalability features needed for Intel MPI 

    Libraries
    a. libibverbs 1.1
	 o Fork support (requires apps change) 
	 o Better low-level driver handling, including multiple drivers linked 
	   in statically
	 o Documentation: man pages
    b. librdmacm (uCMA) 1.0
	 o Multicast joining from user space
	 o UD support
	 o Documentation: man pages

    OSM
        o Routing improvements
        o Performance improvement to min hop and up/down of over an order of magnitude
        o New fat-tree and LASH algorithms
        o SA optional record support "virtually" complete
        o IB router enablement
        o SA database dump/restore

    Management 
        o Many diagnostic improvements since OFED 1.1 (see detailed RN)
        o ibdiagui: A GUI for ibdiagnet

    MPI: 
	a. OSU MVAPICH
      		o Version was updated to 0.9.9
	b. Open MPI
		o Version was updated to 1.2.1
		o See http://www.open-mpi.org/svn/new.php for details
	c. OSU MVAPICH2
		o MVAPICH2 version 0.98 was added to the OFED package.
	d. Common MPI setup sourcing
	   Simple menu-driven interface to choose which MPI implementation to set as
	   the default on a per-user and/or system-wide basis

    iWARP Support
        o Chelsio NIC supported
        o Verbs and CMA APIs are the same as InfiniBand
        o ULPs supported
          - MPI (mvapich2 tested)
          - uDAPL
    Install
        o Default prefix directory is now /usr

See the attached are the release notes for more details
  <<OFED_release_notes.txt>> 

 
Tziporet Koren
Software Director
Mellanox Technologies
mailto: tziporet at mellanox.co.il
Tel +972-4-9097200, ext 380
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070622/2e9b9282/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: OFED_release_notes.txt
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070622/2e9b9282/attachment.txt>

From mhanafi at csc.com  Fri Jun 22 08:04:35 2007
From: mhanafi at csc.com (Mahmoud Hanafi)
Date: Fri, 22 Jun 2007 11:04:35 -0400
Subject: [ofa-general] problem with ofed 1.1.
In-Reply-To: <1182501249.5695.16.camel@linux.site>
Message-ID: <OF3D9779EA.91A1C8B2-ON85257302.0052BBB1-85257302.0052D1A0@csc.com>

Do you have gcc and glibc-devel.x86_64 installed?

-Mahmoud 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This is a PRIVATE message. If you are not the intended recipient, please 
delete without copying and kindly advise us by e-mail of the mistake in 
delivery. NOTE: Regardless of content, this e-mail shall not operate to 
bind CSC to any order or other contract unless pursuant to explicit 
written agreement or government initiative expressly permitting the use of 
e-mail for such purpose.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Julio del Río <jrio at caton.es> 
Sent by: general-bounces at lists.openfabrics.org
06/22/2007 04:34 AM

To
general at lists.openfabrics.org
cc

Subject
[ofa-general] problem with ofed 1.1.


Good morning,

I hope you could help me with this:

I have this config:

    - Fedora Core 2
    - Linux localhost.localdomain 2.6.9-34.ELsmp #1 SMP Fri Feb 24 
16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux
      - HCA Mellanox MHGS18-XTC
    - Flextronic Switch F-X430047
    - Ofed 1.1

and trying to install, this is the error log file I get:

---------------------------------------------------------
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd openib-1.1
++ /usr/bin/id -u
+ '[' 0 = 0 ']'
+ /bin/chown -Rhf root .
++ /usr/bin/id -u
+ '[' 0 = 0 ']'
+ /bin/chgrp -Rhf root .
+ /bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ exit 0
Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.43267
+ umask 022
+ cd /var/tmp/OFEDRPM/BUILD
+ cd openib-1.1
+ LANG=C
+ export LANG
+ unset DISPLAY
+ rm -rf /var/tmp/OFED
+ cd /var/tmp/OFEDRPM/BUILD/openib-1.1
+ mkdir -p /var/tmp/OFED//usr/local/ofed/src
+ cp -a /var/tmp/OFEDRPM/BUILD/openib-1.1 
/var/tmp/OFED//usr/local/ofed/src
+ ./configure --prefix=/usr/local/ofed --libdir=/usr/local/ofed/lib64 
--kernel-version 2.6.9-34.ELsmp --kernel-sources /lib
/modules/2.6.9-34.ELsmp/build --with-libibcm --with-libibverbs 
--with-libipathverbs --with-libmthca --with-librdmacm --with
-mstflint --with-perftest --with-ipath_inf-mod --with-ipoib-mod 
--with-mthca-mod --with-core-mod --with-user_mad-mod --with
-user_access-mod --with-addr_trans-mod
Quilt  does not exist... Going to use patch.
Created configure.mk:
prefix=/usr/local/ofed
PREFIX="--prefix /usr/local/ofed"
libdir=/usr/local/ofed/lib64

# Current working directory
CWD=/var/tmp/OFEDRPM/BUILD/openib-1.1

# Kernel level
KVERSION=2.6.9-34.ELsmp
EXTRAVERSION=-34.ELsmp
MODULES_DIR=/lib/modules/2.6.9-34.ELsmp
KSRC=/lib/modules/2.6.9-34.ELsmp/build

AUTOCONF_H=/var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h
WITH_MEMTRACK=no

WITH_MAKE_PARAMS=

CONFIG_INFINIBAND=m
CONFIG_INFINIBAND_IPOIB=m
CONFIG_INFINIBAND_SDP=
CONFIG_INFINIBAND_SRP=

CONFIG_INFINIBAND_USER_MAD=m
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_MTHCA=m

CONFIG_INFINIBAND_IPOIB_DEBUG=y
CONFIG_INFINIBAND_ISER=
CONFIG_INFINIBAND_EHCA=
CONFIG_INFINIBAND_EHCA_SCALING=
CONFIG_INFINIBAND_RDS=
CONFIG_INFINIBAND_RDS_DEBUG=
CONFIG_INFINIBAND_MADEYE=

CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=
CONFIG_INFINIBAND_SDP_SEND_ZCOPY=
CONFIG_INFINIBAND_SDP_RECV_ZCOPY=
CONFIG_INFINIBAND_SDP_DEBUG=
CONFIG_INFINIBAND_SDP_DEBUG_DATA=
CONFIG_INFINIBAND_IPATH=m
CONFIG_INFINIBAND_MTHCA_DEBUG=y


# User level
WITH_IBVERBS=yes
WITH_MTHCA=yes
WITH_IPATHVERBS=yes
WITH_EHCA=no
WITH_CM=yes
WITH_SDP=no
WITH_DAPL=no
WITH_RDMACM=yes
WITH_MANAGEMENT_LIBS=no
WITH_OSM=no
WITH_DIAGS=no
WITH_MPI=no
WITH_PERFTEST=yes
WITH_SRPTOOLS=no
WITH_IPOIBTOOLS=no
WITH_TVFLASH=no
WITH_MSTFLINT=yes

Created /var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h:
#undef CONFIG_INFINIBAND
#undef CONFIG_INFINIBAND_IPOIB
#undef CONFIG_INFINIBAND_SDP
#undef CONFIG_INFINIBAND_SRP

#undef CONFIG_INFINIBAND_USER_MAD
#undef CONFIG_INFINIBAND_USER_ACCESS
#undef CONFIG_INFINIBAND_ADDR_TRANS
#undef CONFIG_INFINIBAND_MTHCA

#undef CONFIG_INFINIBAND_IPOIB_DEBUG
#undef CONFIG_INFINIBAND_ISER
#undef CONFIG_INFINIBAND_EHCA
#undef CONFIG_INFINIBAND_EHCA_SCALING
#undef CONFIG_INFINIBAND_RDS
#undef CONFIG_INFINIBAND_RDS_DEBUG
#undef CONFIG_INFINIBAND_MADEYE

#undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA
#undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY
#undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY
#undef CONFIG_INFINIBAND_SDP_DEBUG
#undef CONFIG_INFINIBAND_SDP_DEBUG_DATA
#undef CONFIG_INFINIBAND_IPATH
#undef CONFIG_INFINIBAND_MTHCA_DEBUG

#define CONFIG_INFINIBAND 1
#define CONFIG_INFINIBAND_IPOIB 1
#undef CONFIG_INFINIBAND_SDP
#undef CONFIG_INFINIBAND_SRP

#define CONFIG_INFINIBAND_USER_MAD 1
#define CONFIG_INFINIBAND_USER_ACCESS 1
#define CONFIG_INFINIBAND_ADDR_TRANS 1
#define CONFIG_INFINIBAND_MTHCA 1

#define CONFIG_INFINIBAND_IPOIB_DEBUG 1
#undef CONFIG_INFINIBAND_ISER
#undef CONFIG_INFINIBAND_EHCA
#undef CONFIG_INFINIBAND_RDS
#undef CONFIG_INFINIBAND_RDS_DEBUG


#undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA
#undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY
#undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY
#undef CONFIG_INFINIBAND_SDP_DEBUG
#undef CONFIG_INFINIBAND_SDP_DEBUG_DATA
#define CONFIG_INFINIBAND_IPATH 1
#define CONFIG_INFINIBAND_MTHCA_DEBUG 1
#undef CONFIG_INFINIBAND_MADEYE

mkdir -p /var/tmp/OFEDRPM/BUILD/openib-1.1/patches
touch /var/tmp/OFEDRPM/BUILD/openib-1.1/patches/quiltrc
 /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/dapl_qp_attr.patch
patching file src/userspace/dapl/dapl/openib_cma/dapl_ib_util.c
patching file src/userspace/dapl/dapl/openib_scm/dapl_ib_util.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_cq_deadlock.patch
patching file src/userspace/libmthca/src/verbs.c
Hunk #1 succeeded at 614 (offset -8 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_stddef.patch
patching file src/userspace/libmthca/src/mthca.h
Hunk #1 succeeded at 38 with fuzz 2 (offset 2 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_compat.patch
patching file src/userspace/librdmacm/src/cma.c
Hunk #1 succeeded at 157 (offset 16 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_ver_abi.patch
patching file src/userspace/librdmacm/src/cma.c
Hunk #2 succeeded at 170 (offset 16 lines).
 /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/mstflint.patch
patching file src/userspace/mstflint/mtcr.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_add_mra_timeout_limit.patch
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 53 (offset -1 lines).
Hunk #2 succeeded at 2268 (offset -36 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_cleanup_timewait.patch
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 686 (offset 7 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_established1.patch
patching file drivers/infiniband/ulp/sdp/sdp.h
patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c
Hunk #1 succeeded at 515 (offset 16 lines).
patching file drivers/infiniband/ulp/sdp/sdp_cma.c
patching file drivers/infiniband/ulp/sdp/sdp_main.c
Hunk #1 succeeded at 589 (offset 26 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_increase_max_cm_retries.patch
patching file drivers/infiniband/core/cma.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_list_init.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 328 (offset -11 lines).
 /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_mem_leak.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 1713 (offset -241 lines).
 /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_race_fix.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 910 with fuzz 1 (offset -113 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_tavor_quirk.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 48 with fuzz 2.
Hunk #2 succeeded at 1154 (offset 27 lines).
 /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ib_sa_names.patch
patching file include/rdma/ib_sa.h
 /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-fixes.patch
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/Makefile
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/Kconfig
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/Makefile
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_common.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_cq.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_debug.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_diag.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_driver.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_fs.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_ht400.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_iba6110.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_intr.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_keys.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_layer.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_layer.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_mad.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_mr.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_pe800.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_qp.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_rc.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_registers.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_ruc.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_srq.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_stats.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_sysfs.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_uc.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_ud.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs_mcast.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_wc_ppc64.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/verbs_debug.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-limit-packets-sent-without-ack.patch
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_qp.c
Hunk #1 succeeded at 502 (offset -8 lines).
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_rc.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-memcpy_cachebypass.patch
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/Makefile
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/memcpy_cachebypass_x86_64.S
 /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-x86_64.patch
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/Kconfig
 /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_issue3.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_join_mask.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_multicast.c
Hunk #1 succeeded at 471 (offset -1 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_restart.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_selector_updated.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
Hunk #2 succeeded at 458 (offset 4 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_attributes.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 1461 (offset -6 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_remove_reconnect.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_wa_post_send.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
patching file drivers/infiniband/ulp/srp/ib_srp.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/lockdep_header.patch
patching file drivers/infiniband/core/uverbs_cmd.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_av_statrate.patch
patching file drivers/infiniband/hw/mthca/mthca_av.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_catas_reset.patch
patching file drivers/infiniband/hw/mthca/mthca_catas.c
patching file drivers/infiniband/hw/mthca/mthca_main.c
patching file drivers/infiniband/hw/mthca/mthca_dev.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_mad_traps.patch
patching file drivers/infiniband/hw/mthca/mthca_mad.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_port.patch
patching file drivers/infiniband/hw/mthca/mthca_provider.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_portnum.patch
patching file drivers/infiniband/hw/mthca/mthca_qp.c
Hunk #1 succeeded at 478 (offset 4 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_statrate_bits.patch
patching file drivers/infiniband/hw/mthca/mthca_qp.c
Hunk #1 succeeded at 414 (offset 4 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_use_uar2.patch
patching file drivers/infiniband/hw/mthca/mthca_uar.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/robert-ipath-diagpkt-init-fixup.patch
patching file drivers/infiniband/hw/ipath/ipath_diag.c
Hunk #1 succeeded at 285 (offset -1 lines).
patching file drivers/infiniband/hw/ipath/ipath_driver.c
Hunk #1 succeeded at 539 (offset -20 lines).
Hunk #2 succeeded at 596 with fuzz 1 (offset -105 lines).
Hunk #3 succeeded at 2029 (offset -156 lines).
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
Hunk #1 succeeded at 793 (offset -96 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_credits_by_seq.patch
patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_post_credits.patch
patching file drivers/infiniband/ulp/sdp/sdp.h
Hunk #1 succeeded at 177 (offset 1 line).
patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c
Hunk #1 succeeded at 324 (offset 6 lines).
patching file drivers/infiniband/ulp/sdp/sdp_cma.c
Hunk #1 succeeded at 434 (offset 4 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_drep_on_not_found.patch
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 1890 (offset -10 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_randomize_psn.patch
patching file drivers/infiniband/core/cm.c
Hunk #3 succeeded at 81 (offset 7 lines).
Hunk #5 succeeded at 327 (offset 7 lines).
Hunk #7 succeeded at 2115 (offset 27 lines).
Hunk #8 succeeded at 3369 (offset 2 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_unload_crash.patch
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 82 (offset 7 lines).
Hunk #3 succeeded at 656 (offset 6 lines).
Hunk #5 succeeded at 685 (offset 6 lines).
Hunk #7 succeeded at 1316 (offset 6 lines).
Hunk #9 succeeded at 1334 (offset 6 lines).
Hunk #10 succeeded at 2626 (offset -7 lines).
Hunk #11 succeeded at 3409 (offset -29 lines).
Hunk #12 succeeded at 3449 (offset -7 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_establish.patch
patching file include/rdma/rdma_cm.h
Hunk #1 succeeded at 241 (offset -15 lines).
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 3242 (offset 35 lines).
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 759 (offset -81 lines).
Hunk #3 succeeded at 1752 (offset -212 lines).
Hunk #4 succeeded at 1997 with fuzz 1.
Hunk #5 succeeded at 1828 (offset -229 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_hotplug.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 278 (offset 7 lines).
Hunk #3 succeeded at 700 (offset 8 lines).
Hunk #5 succeeded at 895 with fuzz 1 (offset -9 lines).
Hunk #6 succeeded at 1382 (offset 6 lines).
Hunk #7 succeeded at 1610 (offset -9 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_typo_fix.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 276 with fuzz 2 (offset 7 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_1_recreate_at_reconnect.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_2_use_multiple_initiator_ports.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
patching file drivers/infiniband/ulp/srp/ib_srp.h
 /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_topspin.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 358 (offset -1 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_1.patch
patching file drivers/infiniband/hw/ehca/ehca_main.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_2.patch
patching file drivers/infiniband/hw/ehca/ehca_tools.h

Applying patches for 2.6.9-34.ELsmp kernel (RHAS4 Update 3):
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_1_netevents_revert_to_2_6_17.patch
patching file drivers/infiniband/core/addr.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_3926_to_2_6_13.patch
patching file drivers/infiniband/core/addr.c
Hunk #1 succeeded at 327 with fuzz 1 (offset 11 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_4670_to_2_6_9.patch
patching file drivers/infiniband/core/addr.c
Hunk #1 succeeded at 27 with fuzz 2.
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/asm_bitops_ia64_to_2_6_11.patch
patching file include/asm/bitops.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/core_4807_to_2_6_9.patch
patching file drivers/infiniband/core/sysfs.c
Hunk #1 succeeded at 438 (offset -4 lines).
patching file drivers/infiniband/core/user_mad.c
Hunk #2 succeeded at 677 (offset 91 lines).
Hunk #3 succeeded at 685 (offset 5 lines).
Hunk #4 succeeded at 1106 (offset 91 lines).
Hunk #5 succeeded at 1053 (offset 5 lines).
patching file drivers/infiniband/core/uverbs_main.c
Hunk #2 succeeded at 118 (offset 3 lines).
patching file drivers/infiniband/core/uverbs_mem.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/debugfs_to_2_6_9.patch
patching file drivers/infiniband/include/linux/debugfs.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipath-backport.patch
patching file drivers/infiniband/hw/ipath/iowrite32_copy_x86_64.S
patching file drivers/infiniband/hw/ipath/ipath_backport.h
patching file drivers/infiniband/hw/ipath/ipath_diag.c
patching file drivers/infiniband/hw/ipath/ipath_driver.c
Hunk #2 succeeded at 557 (offset 1 line).
Hunk #3 succeeded at 599 (offset 1 line).
Hunk #4 succeeded at 1366 (offset 1 line).
Hunk #5 succeeded at 1395 (offset 1 line).
Hunk #6 succeeded at 1875 (offset 1 line).
Hunk #7 succeeded at 1903 (offset 1 line).
Hunk #8 succeeded at 1984 (offset -9 lines).
Hunk #9 succeeded at 2027 (offset 1 line).
Hunk #10 succeeded at 2142 (offset -9 lines).
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
patching file drivers/infiniband/hw/ipath/ipath_fs.c
patching file drivers/infiniband/hw/ipath/ipath_iba6110.c
patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
patching file drivers/infiniband/hw/ipath/ipath_layer.c
patching file drivers/infiniband/hw/ipath/ipath_sysfs.c
patching file drivers/infiniband/hw/ipath/ipath_user_pages.c
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
patching file drivers/infiniband/hw/ipath/ipath_verbs.h
patching file drivers/infiniband/hw/ipath/Makefile
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_5010_to_2_6_9.patch
patching file drivers/infiniband/include/linux/if_infiniband.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_8111_to_2_6_16.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
Hunk #2 succeeded at 803 (offset 49 lines).
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #1 succeeded at 46 (offset -1 lines).
Hunk #2 succeeded at 220 (offset 1 line).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_device_5496_to_2_6_15.patch
patching file drivers/infiniband/include/linux/device.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_err_to_2_6_11.patch
patching file include/linux/err.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_idr_6554_to_2_6_13.patch
patching file drivers/infiniband/include/linux/idr.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_inetdevice_to_2_6_17.patch
patching file drivers/infiniband/include/linux/inetdevice.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_lockdep_to_2_6_17.patch
patching file include/linux/lockdep.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_mutex_5947_to_2_6_15.patch
patching file drivers/infiniband/include/linux/mutex.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_netdevice_to_2_6_17.patch
patching file drivers/infiniband/include/linux/netdevice.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_pci_7970_to_2_6_9.patch
patching file drivers/infiniband/include/linux/pci.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_scatterlist_6369_to_2_6_9.patch
patching file drivers/infiniband/include/linux/scatterlist.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_signal_to_2_6_17.patch
patching file drivers/infiniband/include/linux/signal.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_skbuff_6754_to_2_6_11.patch
patching file include/linux/skbuff.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_spinlock_5883_to_2_6_9.patch
patching file drivers/infiniband/include/linux/spinlock.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/makefile_to_2_6_9.patch
patching file drivers/infiniband/ulp/srp/Makefile
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_dev_3465_to_2_6_11.patch
patching file drivers/infiniband/hw/mthca/mthca_dev.h
Hunk #1 succeeded at 57 with fuzz 2 (offset 4 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_provider_3465_to_2_6_9.patch
patching file drivers/infiniband/hw/mthca/mthca_provider.c
Hunk #1 succeeded at 387 (offset 28 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_inet_sock_6754_to_2_6_15.patch
patching file include/net/inet_sock.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_1_6754_to_2_6_13.patch
patching file include/net/sock.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_2_6754_to_2_6_11.patch
patching file include/net/sock.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_tcp_states_6754_to_2_6_13.patch
patching file include/net/tcp_states.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/read_mostly_6255_to_2_6_13.patch
patching file drivers/infiniband/include/linux/cache.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/scsi_7242_to_2_6_14.patch
patching file include/scsi/scsi.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/sdp_7277_to_2_6_11.patch
patching file drivers/infiniband/ulp/sdp/sdp_main.c
Hunk #1 succeeded at 418 (offset 118 lines).
Hunk #2 succeeded at 535 (offset 41 lines).
Hunk #3 succeeded at 633 (offset 118 lines).
Hunk #4 succeeded at 1408 (offset 245 lines).
Hunk #5 succeeded at 1301 (offset 118 lines).
Hunk #6 succeeded at 1537 (offset 245 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_4030_to_2_6_12.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 1594 (offset 271 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_7312_to_2_6_11.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 1258 (offset -44 lines).
Hunk #3 succeeded at 1332 (offset -42 lines).
Hunk #5 succeeded at 1360 with fuzz 2 (offset -40 lines).
Hunk #6 succeeded at 1404 with fuzz 2 (offset -3 lines).
Hunk #7 succeeded at 1377 (offset -40 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_scsi_scan_target_7242_to_2_6_11.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 975 (offset 26 lines).
Hunk #2 succeeded at 1505 (offset 24 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/top_2844_to_2_6_11.patch
patching file drivers/infiniband/Makefile
Hunk #1 succeeded at 1 with fuzz 2.
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucm_5245_to_2_6_9.patch
patching file drivers/infiniband/core/ucm.c
Hunk #1 succeeded at 1270 (offset -8 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucma_6607_to_2_6_9.patch
patching file drivers/infiniband/core/ucma.c
Hunk #1 succeeded at 861 (offset 88 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/user_mad_4603_to_2_6_9.patch
patching file drivers/infiniband/core/user_mad.c
Hunk #1 succeeded at 857 (offset -20 lines).
Hunk #3 succeeded at 1086 (offset -20 lines).
Hunk #5 succeeded at 1123 (offset -20 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_main_3935_to_2_6_9.patch
patching file drivers/infiniband/core/uverbs_main.c
Hunk #1 succeeded at 727 (offset 11 lines).
Hunk #2 succeeded at 949 (offset 1 line).
Hunk #3 succeeded at 975 (offset 11 lines).
Hunk #4 succeeded at 986 (offset 3 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_to_2_6_17.patch
patching file drivers/infiniband/core/uverbs_main.c
Hunk #1 succeeded at 1011 with fuzz 1 (offset 196 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/hpage_patches/hpages.patch
patching file drivers/infiniband/core/uverbs_mem.c
/bin/rm -f /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/examples
cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs
Running: ./configure 
--cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache 
--disable-libcheck --prefix /usr/local/
ofed --libdir /usr/local/ofed/lib64 CPPFLAGS="-I../libibverbs/include"
configure: creating cache 
/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking build system type... x86_64-redhat-linux-gnu
checking host system type... x86_64-redhat-linux-gnu
checking for style of include used by make... GNU
checking for gcc... gcc
checking for C compiler default output file name... configure: error: C 
compiler cannot create executables
See `config.log' for more details.
Failed to execute: ./configure 
--cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache 
--disable-libcheck --prefix /
usr/local/ofed --libdir /usr/local/ofed/lib64 
CPPFLAGS="-I../libibverbs/include"
error: Bad exit status from /var/tmp/rpm-tmp.43267 (%install)


RPM build errors:
    user vlad does not exist - using root
    group mtl does not exist - using root
    user vlad does not exist - using root
    group mtl does not exist - using root
    Bad exit status from /var/tmp/rpm-tmp.43267 (%install)
ERROR: Failed executing "rpmbuild --rebuild --define '_topdir 
/var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' --define
'build_root /var/tmp/OFED' --define 'configure_options --with-libibcm 
--with-libibverbs --with-libipathverbs --with-libmth
ca --with-librdmacm --with-mstflint --with-perftest --with-ipath_inf-mod 
--with-ipoib-mod --with-mthca-mod --with-core-mod
--with-user_mad-mod --with-user_access-mod --with-addr_trans-mod' --define 
'configure_options32 %{nil}' --define 'KVERSION
2.6.9-34.ELsmp' --define 'KSRC /lib/modules/2.6.9-34.ELsmp/build' --define 
'build_kernel_ib 1' --define 'build_kernel_ib_de
vel 1' --define 'NETWORK_CONF_DIR /etc/sysconfig/network-scripts' --define 
'modprobe_update 1' --define 'include_ipoib_conf
1' --define 'build_32bit 0' 
/home/caton/OFED-1.1/SRPMS/openib-1.1-0.src.rpm"

---------------------------------------------------------

Thanks a lot and best regards


Julio. 
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070622/32d20e62/attachment.html>

From jrio at caton.es  Fri Jun 22 08:07:41 2007
From: jrio at caton.es (Julio del =?ISO-8859-1?Q?R=EDo?=)
Date: Fri, 22 Jun 2007 17:07:41 +0200
Subject: [ofa-general] problem with ofed 1.1.
In-Reply-To: <OF3D9779EA.91A1C8B2-ON85257302.0052BBB1-85257302.0052D1A0@csc.com>
References: <OF3D9779EA.91A1C8B2-ON85257302.0052BBB1-85257302.0052D1A0@csc.com>
Message-ID: <1182524861.5695.28.camel@linux.site>

[root at localhost root]# rpm -qa | grep gcc
libgcc-3.3.3-7
gcc-g77-3.3.3-7
gcc-3.3.3-7
gcc-objc-3.3.3-7
compat-gcc-c++-7.3-2.96.126
gcc-gnat-3.3.3-7
compat-gcc-7.3-2.96.126
gcc34-3.4.0-1
gcc34-c++-3.4.0-1
libgcc-3.3.3-7
gcc-c++-3.3.3-7
gcc-java-3.3.3-7
gcc34-java-3.4.0-1

[root at localhost root]# rpm -qa | grep libc
libcroco-0.4.0-4
libcap-devel-1.10-18.1
libc-client-devel-2002e-5
glibc-2.3.3-27
glibc-kernheaders-2.4-8.44
glibc-utils-2.3.3-27
glibc-2.3.3-27
glibc-profile-2.3.3-27
glibc-common-2.3.3-27
glibc-devel-2.3.3-27
libc-client-2002e-5
libcap-1.10-18.1
glibc-headers-2.3.3-27

Thanks a lot and best regards

El vie, 22-06-2007 a las 11:04 -0400, Mahmoud Hanafi escribió:

> 
> Do you have gcc and glibc-devel.x86_64 installed? 
> 
> -Mahmoud 
> 
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> This is a PRIVATE message. If you are not the intended recipient,
> please delete without copying and kindly advise us by e-mail of the
> mistake in delivery. NOTE: Regardless of content, this e-mail shall
> not operate to bind CSC to any order or other contract unless pursuant
> to explicit written agreement or government initiative expressly
> permitting the use of e-mail for such purpose.
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> Julio del Río <jrio at caton.es> 
> Sent by:
> general-bounces at lists.openfabrics.org 
> 
> 06/22/2007 04:34 AM 
> 
> 
>                To
> general at lists.openfabrics.org 
>                cc
> 
>           Subject
> [ofa-general]
> problem with ofed
> 1.1.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Good morning,
> 
> I hope you could help me with this:
> 
> I have this config:
> 
>    - Fedora Core 2
>    - Linux localhost.localdomain 2.6.9-34.ELsmp #1 SMP Fri Feb 24
> 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux
>      - HCA Mellanox MHGS18-XTC
>    - Flextronic Switch F-X430047
>    - Ofed 1.1
> 
> and trying to install, this is the error log file I get:
> 
> ---------------------------------------------------------
> + STATUS=0
> + '[' 0 -ne 0 ']'
> + cd openib-1.1
> ++ /usr/bin/id -u
> + '[' 0 = 0 ']'
> + /bin/chown -Rhf root .
> ++ /usr/bin/id -u
> + '[' 0 = 0 ']'
> + /bin/chgrp -Rhf root .
> + /bin/chmod -Rf a+rX,u+w,g-w,o-w .
> + exit 0
> Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.43267
> + umask 022
> + cd /var/tmp/OFEDRPM/BUILD
> + cd openib-1.1
> + LANG=C
> + export LANG
> + unset DISPLAY
> + rm -rf /var/tmp/OFED
> + cd /var/tmp/OFEDRPM/BUILD/openib-1.1
> + mkdir -p /var/tmp/OFED//usr/local/ofed/src
> + cp
> -a /var/tmp/OFEDRPM/BUILD/openib-1.1 /var/tmp/OFED//usr/local/ofed/src
> + ./configure --prefix=/usr/local/ofed --libdir=/usr/local/ofed/lib64
> --kernel-version 2.6.9-34.ELsmp --kernel-sources /lib
> /modules/2.6.9-34.ELsmp/build --with-libibcm --with-libibverbs
> --with-libipathverbs --with-libmthca --with-librdmacm --with
> -mstflint --with-perftest --with-ipath_inf-mod --with-ipoib-mod
> --with-mthca-mod --with-core-mod --with-user_mad-mod --with
> -user_access-mod --with-addr_trans-mod
> Quilt  does not exist... Going to use patch.
> Created configure.mk:
> prefix=/usr/local/ofed
> PREFIX="--prefix /usr/local/ofed"
> libdir=/usr/local/ofed/lib64
> 
> # Current working directory
> CWD=/var/tmp/OFEDRPM/BUILD/openib-1.1
> 
> # Kernel level
> KVERSION=2.6.9-34.ELsmp
> EXTRAVERSION=-34.ELsmp
> MODULES_DIR=/lib/modules/2.6.9-34.ELsmp
> KSRC=/lib/modules/2.6.9-34.ELsmp/build
> 
> AUTOCONF_H=/var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h
> WITH_MEMTRACK=no
> 
> WITH_MAKE_PARAMS=
> 
> CONFIG_INFINIBAND=m
> CONFIG_INFINIBAND_IPOIB=m
> CONFIG_INFINIBAND_SDP=
> CONFIG_INFINIBAND_SRP=
> 
> CONFIG_INFINIBAND_USER_MAD=m
> CONFIG_INFINIBAND_USER_ACCESS=m
> CONFIG_INFINIBAND_ADDR_TRANS=y
> CONFIG_INFINIBAND_MTHCA=m
> 
> CONFIG_INFINIBAND_IPOIB_DEBUG=y
> CONFIG_INFINIBAND_ISER=
> CONFIG_INFINIBAND_EHCA=
> CONFIG_INFINIBAND_EHCA_SCALING=
> CONFIG_INFINIBAND_RDS=
> CONFIG_INFINIBAND_RDS_DEBUG=
> CONFIG_INFINIBAND_MADEYE=
> 
> CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=
> CONFIG_INFINIBAND_SDP_SEND_ZCOPY=
> CONFIG_INFINIBAND_SDP_RECV_ZCOPY=
> CONFIG_INFINIBAND_SDP_DEBUG=
> CONFIG_INFINIBAND_SDP_DEBUG_DATA=
> CONFIG_INFINIBAND_IPATH=m
> CONFIG_INFINIBAND_MTHCA_DEBUG=y
> 
> 
> 
> # User level
> WITH_IBVERBS=yes
> WITH_MTHCA=yes
> WITH_IPATHVERBS=yes
> WITH_EHCA=no
> WITH_CM=yes
> WITH_SDP=no
> WITH_DAPL=no
> WITH_RDMACM=yes
> WITH_MANAGEMENT_LIBS=no
> WITH_OSM=no
> WITH_DIAGS=no
> WITH_MPI=no
> WITH_PERFTEST=yes
> WITH_SRPTOOLS=no
> WITH_IPOIBTOOLS=no
> WITH_TVFLASH=no
> WITH_MSTFLINT=yes
> 
> Created /var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h:
> #undef CONFIG_INFINIBAND
> #undef CONFIG_INFINIBAND_IPOIB
> #undef CONFIG_INFINIBAND_SDP
> #undef CONFIG_INFINIBAND_SRP
> 
> #undef CONFIG_INFINIBAND_USER_MAD
> #undef CONFIG_INFINIBAND_USER_ACCESS
> #undef CONFIG_INFINIBAND_ADDR_TRANS
> #undef CONFIG_INFINIBAND_MTHCA
> 
> #undef CONFIG_INFINIBAND_IPOIB_DEBUG
> #undef CONFIG_INFINIBAND_ISER
> #undef CONFIG_INFINIBAND_EHCA
> #undef CONFIG_INFINIBAND_EHCA_SCALING
> #undef CONFIG_INFINIBAND_RDS
> #undef CONFIG_INFINIBAND_RDS_DEBUG
> #undef CONFIG_INFINIBAND_MADEYE
> 
> #undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA
> #undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY
> #undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY
> #undef CONFIG_INFINIBAND_SDP_DEBUG
> #undef CONFIG_INFINIBAND_SDP_DEBUG_DATA
> #undef CONFIG_INFINIBAND_IPATH
> #undef CONFIG_INFINIBAND_MTHCA_DEBUG
> 
> #define CONFIG_INFINIBAND 1
> #define CONFIG_INFINIBAND_IPOIB 1
> #undef CONFIG_INFINIBAND_SDP
> #undef CONFIG_INFINIBAND_SRP
> 
> #define CONFIG_INFINIBAND_USER_MAD 1
> #define CONFIG_INFINIBAND_USER_ACCESS 1
> #define CONFIG_INFINIBAND_ADDR_TRANS 1
> #define CONFIG_INFINIBAND_MTHCA 1
> 
> #define CONFIG_INFINIBAND_IPOIB_DEBUG 1
> #undef CONFIG_INFINIBAND_ISER
> #undef CONFIG_INFINIBAND_EHCA
> #undef CONFIG_INFINIBAND_RDS
> #undef CONFIG_INFINIBAND_RDS_DEBUG
> 
> 
> #undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA
> #undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY
> #undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY
> #undef CONFIG_INFINIBAND_SDP_DEBUG
> #undef CONFIG_INFINIBAND_SDP_DEBUG_DATA
> #define CONFIG_INFINIBAND_IPATH 1
> #define CONFIG_INFINIBAND_MTHCA_DEBUG 1
> #undef CONFIG_INFINIBAND_MADEYE
> 
> mkdir -p /var/tmp/OFEDRPM/BUILD/openib-1.1/patches
> touch /var/tmp/OFEDRPM/BUILD/openib-1.1/patches/quiltrc
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/dapl_qp_attr.patch
> patching file src/userspace/dapl/dapl/openib_cma/dapl_ib_util.c
> patching file src/userspace/dapl/dapl/openib_scm/dapl_ib_util.c
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_cq_deadlock.patch
> patching file src/userspace/libmthca/src/verbs.c
> Hunk #1 succeeded at 614 (offset -8 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_stddef.patch
> patching file src/userspace/libmthca/src/mthca.h
> Hunk #1 succeeded at 38 with fuzz 2 (offset 2 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_compat.patch
> patching file src/userspace/librdmacm/src/cma.c
> Hunk #1 succeeded at 157 (offset 16 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_ver_abi.patch
> patching file src/userspace/librdmacm/src/cma.c
> Hunk #2 succeeded at 170 (offset 16 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/mstflint.patch
> patching file src/userspace/mstflint/mtcr.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_add_mra_timeout_limit.patch
> patching file drivers/infiniband/core/cm.c
> Hunk #1 succeeded at 53 (offset -1 lines).
> Hunk #2 succeeded at 2268 (offset -36 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_cleanup_timewait.patch
> patching file drivers/infiniband/core/cm.c
> Hunk #1 succeeded at 686 (offset 7 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_established1.patch
> patching file drivers/infiniband/ulp/sdp/sdp.h
> patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c
> Hunk #1 succeeded at 515 (offset 16 lines).
> patching file drivers/infiniband/ulp/sdp/sdp_cma.c
> patching file drivers/infiniband/ulp/sdp/sdp_main.c
> Hunk #1 succeeded at 589 (offset 26 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_increase_max_cm_retries.patch
> patching file drivers/infiniband/core/cma.c
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_list_init.patch
> patching file drivers/infiniband/core/cma.c
> Hunk #1 succeeded at 328 (offset -11 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_mem_leak.patch
> patching file drivers/infiniband/core/cma.c
> Hunk #1 succeeded at 1713 (offset -241 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_race_fix.patch
> patching file drivers/infiniband/core/cma.c
> Hunk #1 succeeded at 910 with fuzz 1 (offset -113 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_tavor_quirk.patch
> patching file drivers/infiniband/core/cma.c
> Hunk #1 succeeded at 48 with fuzz 2.
> Hunk #2 succeeded at 1154 (offset 27 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ib_sa_names.patch
> patching file include/rdma/ib_sa.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-fixes.patch
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/Makefile
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/Kconfig
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/Makefile
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_common.h
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_cq.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_debug.h
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_diag.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_driver.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_fs.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_ht400.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_iba6110.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_intr.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_kernel.h
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_keys.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_layer.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_layer.h
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_mad.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_mr.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_pe800.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_qp.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_rc.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_registers.h
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_ruc.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_srq.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_stats.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_sysfs.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_uc.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_ud.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_verbs.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_verbs.h
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_verbs_mcast.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_wc_ppc64.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/verbs_debug.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-limit-packets-sent-without-ack.patch
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_qp.c
> Hunk #1 succeeded at 502 (offset -8 lines).
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_rc.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_verbs.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_verbs.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-memcpy_cachebypass.patch
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/Makefile
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/ipath_verbs.c
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/memcpy_cachebypass_x86_64.S
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-x86_64.patch
> (Stripping trailing CRs from patch.)
> patching file drivers/infiniband/hw/ipath/Kconfig
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_issue3.patch
> patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_join_mask.patch
> patching file drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> Hunk #1 succeeded at 471 (offset -1 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_restart.patch
> patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_selector_updated.patch
> patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
> Hunk #2 succeeded at 458 (offset 4 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_attributes.patch
> patching file drivers/infiniband/ulp/srp/ib_srp.c
> Hunk #1 succeeded at 1461 (offset -6 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_remove_reconnect.patch
> patching file drivers/infiniband/ulp/srp/ib_srp.c
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_wa_post_send.patch
> patching file drivers/infiniband/ulp/srp/ib_srp.c
> patching file drivers/infiniband/ulp/srp/ib_srp.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/lockdep_header.patch
> patching file drivers/infiniband/core/uverbs_cmd.c
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_av_statrate.patch
> patching file drivers/infiniband/hw/mthca/mthca_av.c
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_catas_reset.patch
> patching file drivers/infiniband/hw/mthca/mthca_catas.c
> patching file drivers/infiniband/hw/mthca/mthca_main.c
> patching file drivers/infiniband/hw/mthca/mthca_dev.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_mad_traps.patch
> patching file drivers/infiniband/hw/mthca/mthca_mad.c
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_port.patch
> patching file drivers/infiniband/hw/mthca/mthca_provider.c
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_portnum.patch
> patching file drivers/infiniband/hw/mthca/mthca_qp.c
> Hunk #1 succeeded at 478 (offset 4 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_statrate_bits.patch
> patching file drivers/infiniband/hw/mthca/mthca_qp.c
> Hunk #1 succeeded at 414 (offset 4 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_use_uar2.patch
> patching file drivers/infiniband/hw/mthca/mthca_uar.c
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/robert-ipath-diagpkt-init-fixup.patch
> patching file drivers/infiniband/hw/ipath/ipath_diag.c
> Hunk #1 succeeded at 285 (offset -1 lines).
> patching file drivers/infiniband/hw/ipath/ipath_driver.c
> Hunk #1 succeeded at 539 (offset -20 lines).
> Hunk #2 succeeded at 596 with fuzz 1 (offset -105 lines).
> Hunk #3 succeeded at 2029 (offset -156 lines).
> patching file drivers/infiniband/hw/ipath/ipath_kernel.h
> Hunk #1 succeeded at 793 (offset -96 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_credits_by_seq.patch
> patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_post_credits.patch
> patching file drivers/infiniband/ulp/sdp/sdp.h
> Hunk #1 succeeded at 177 (offset 1 line).
> patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c
> Hunk #1 succeeded at 324 (offset 6 lines).
> patching file drivers/infiniband/ulp/sdp/sdp_cma.c
> Hunk #1 succeeded at 434 (offset 4 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_drep_on_not_found.patch
> patching file drivers/infiniband/core/cm.c
> Hunk #1 succeeded at 1890 (offset -10 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_randomize_psn.patch
> patching file drivers/infiniband/core/cm.c
> Hunk #3 succeeded at 81 (offset 7 lines).
> Hunk #5 succeeded at 327 (offset 7 lines).
> Hunk #7 succeeded at 2115 (offset 27 lines).
> Hunk #8 succeeded at 3369 (offset 2 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_unload_crash.patch
> patching file drivers/infiniband/core/cm.c
> Hunk #1 succeeded at 82 (offset 7 lines).
> Hunk #3 succeeded at 656 (offset 6 lines).
> Hunk #5 succeeded at 685 (offset 6 lines).
> Hunk #7 succeeded at 1316 (offset 6 lines).
> Hunk #9 succeeded at 1334 (offset 6 lines).
> Hunk #10 succeeded at 2626 (offset -7 lines).
> Hunk #11 succeeded at 3409 (offset -29 lines).
> Hunk #12 succeeded at 3449 (offset -7 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_establish.patch
> patching file include/rdma/rdma_cm.h
> Hunk #1 succeeded at 241 (offset -15 lines).
> patching file drivers/infiniband/core/cm.c
> Hunk #1 succeeded at 3242 (offset 35 lines).
> patching file drivers/infiniband/core/cma.c
> Hunk #1 succeeded at 759 (offset -81 lines).
> Hunk #3 succeeded at 1752 (offset -212 lines).
> Hunk #4 succeeded at 1997 with fuzz 1.
> Hunk #5 succeeded at 1828 (offset -229 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_hotplug.patch
> patching file drivers/infiniband/core/cma.c
> Hunk #1 succeeded at 278 (offset 7 lines).
> Hunk #3 succeeded at 700 (offset 8 lines).
> Hunk #5 succeeded at 895 with fuzz 1 (offset -9 lines).
> Hunk #6 succeeded at 1382 (offset 6 lines).
> Hunk #7 succeeded at 1610 (offset -9 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_typo_fix.patch
> patching file drivers/infiniband/core/cma.c
> Hunk #1 succeeded at 276 with fuzz 2 (offset 7 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_1_recreate_at_reconnect.patch
> patching file drivers/infiniband/ulp/srp/ib_srp.c
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_2_use_multiple_initiator_ports.patch
> patching file drivers/infiniband/ulp/srp/ib_srp.c
> patching file drivers/infiniband/ulp/srp/ib_srp.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_topspin.patch
> patching file drivers/infiniband/ulp/srp/ib_srp.c
> Hunk #1 succeeded at 358 (offset -1 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_1.patch
> patching file drivers/infiniband/hw/ehca/ehca_main.c
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_2.patch
> patching file drivers/infiniband/hw/ehca/ehca_tools.h
> 
> Applying patches for 2.6.9-34.ELsmp kernel (RHAS4 Update 3):
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_1_netevents_revert_to_2_6_17.patch
> patching file drivers/infiniband/core/addr.c
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_3926_to_2_6_13.patch
> patching file drivers/infiniband/core/addr.c
> Hunk #1 succeeded at 327 with fuzz 1 (offset 11 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_4670_to_2_6_9.patch
> patching file drivers/infiniband/core/addr.c
> Hunk #1 succeeded at 27 with fuzz 2.
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/asm_bitops_ia64_to_2_6_11.patch
> patching file include/asm/bitops.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/core_4807_to_2_6_9.patch
> patching file drivers/infiniband/core/sysfs.c
> Hunk #1 succeeded at 438 (offset -4 lines).
> patching file drivers/infiniband/core/user_mad.c
> Hunk #2 succeeded at 677 (offset 91 lines).
> Hunk #3 succeeded at 685 (offset 5 lines).
> Hunk #4 succeeded at 1106 (offset 91 lines).
> Hunk #5 succeeded at 1053 (offset 5 lines).
> patching file drivers/infiniband/core/uverbs_main.c
> Hunk #2 succeeded at 118 (offset 3 lines).
> patching file drivers/infiniband/core/uverbs_mem.c
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/debugfs_to_2_6_9.patch
> patching file drivers/infiniband/include/linux/debugfs.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipath-backport.patch
> patching file drivers/infiniband/hw/ipath/iowrite32_copy_x86_64.S
> patching file drivers/infiniband/hw/ipath/ipath_backport.h
> patching file drivers/infiniband/hw/ipath/ipath_diag.c
> patching file drivers/infiniband/hw/ipath/ipath_driver.c
> Hunk #2 succeeded at 557 (offset 1 line).
> Hunk #3 succeeded at 599 (offset 1 line).
> Hunk #4 succeeded at 1366 (offset 1 line).
> Hunk #5 succeeded at 1395 (offset 1 line).
> Hunk #6 succeeded at 1875 (offset 1 line).
> Hunk #7 succeeded at 1903 (offset 1 line).
> Hunk #8 succeeded at 1984 (offset -9 lines).
> Hunk #9 succeeded at 2027 (offset 1 line).
> Hunk #10 succeeded at 2142 (offset -9 lines).
> patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
> patching file drivers/infiniband/hw/ipath/ipath_fs.c
> patching file drivers/infiniband/hw/ipath/ipath_iba6110.c
> patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
> patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
> patching file drivers/infiniband/hw/ipath/ipath_kernel.h
> patching file drivers/infiniband/hw/ipath/ipath_layer.c
> patching file drivers/infiniband/hw/ipath/ipath_sysfs.c
> patching file drivers/infiniband/hw/ipath/ipath_user_pages.c
> patching file drivers/infiniband/hw/ipath/ipath_verbs.c
> patching file drivers/infiniband/hw/ipath/ipath_verbs.h
> patching file drivers/infiniband/hw/ipath/Makefile
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_5010_to_2_6_9.patch
> patching file drivers/infiniband/include/linux/if_infiniband.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_8111_to_2_6_16.patch
> patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
> Hunk #2 succeeded at 803 (offset 49 lines).
> patching file drivers/infiniband/ulp/ipoib/ipoib.h
> Hunk #1 succeeded at 46 (offset -1 lines).
> Hunk #2 succeeded at 220 (offset 1 line).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_device_5496_to_2_6_15.patch
> patching file drivers/infiniband/include/linux/device.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_err_to_2_6_11.patch
> patching file include/linux/err.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_idr_6554_to_2_6_13.patch
> patching file drivers/infiniband/include/linux/idr.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_inetdevice_to_2_6_17.patch
> patching file drivers/infiniband/include/linux/inetdevice.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_lockdep_to_2_6_17.patch
> patching file include/linux/lockdep.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_mutex_5947_to_2_6_15.patch
> patching file drivers/infiniband/include/linux/mutex.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_netdevice_to_2_6_17.patch
> patching file drivers/infiniband/include/linux/netdevice.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_pci_7970_to_2_6_9.patch
> patching file drivers/infiniband/include/linux/pci.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_scatterlist_6369_to_2_6_9.patch
> patching file drivers/infiniband/include/linux/scatterlist.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_signal_to_2_6_17.patch
> patching file drivers/infiniband/include/linux/signal.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_skbuff_6754_to_2_6_11.patch
> patching file include/linux/skbuff.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_spinlock_5883_to_2_6_9.patch
> patching file drivers/infiniband/include/linux/spinlock.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/makefile_to_2_6_9.patch
> patching file drivers/infiniband/ulp/srp/Makefile
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_dev_3465_to_2_6_11.patch
> patching file drivers/infiniband/hw/mthca/mthca_dev.h
> Hunk #1 succeeded at 57 with fuzz 2 (offset 4 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_provider_3465_to_2_6_9.patch
> patching file drivers/infiniband/hw/mthca/mthca_provider.c
> Hunk #1 succeeded at 387 (offset 28 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_inet_sock_6754_to_2_6_15.patch
> patching file include/net/inet_sock.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_1_6754_to_2_6_13.patch
> patching file include/net/sock.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_2_6754_to_2_6_11.patch
> patching file include/net/sock.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_tcp_states_6754_to_2_6_13.patch
> patching file include/net/tcp_states.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/read_mostly_6255_to_2_6_13.patch
> patching file drivers/infiniband/include/linux/cache.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/scsi_7242_to_2_6_14.patch
> patching file include/scsi/scsi.h
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/sdp_7277_to_2_6_11.patch
> patching file drivers/infiniband/ulp/sdp/sdp_main.c
> Hunk #1 succeeded at 418 (offset 118 lines).
> Hunk #2 succeeded at 535 (offset 41 lines).
> Hunk #3 succeeded at 633 (offset 118 lines).
> Hunk #4 succeeded at 1408 (offset 245 lines).
> Hunk #5 succeeded at 1301 (offset 118 lines).
> Hunk #6 succeeded at 1537 (offset 245 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_4030_to_2_6_12.patch
> patching file drivers/infiniband/ulp/srp/ib_srp.c
> Hunk #1 succeeded at 1594 (offset 271 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_7312_to_2_6_11.patch
> patching file drivers/infiniband/ulp/srp/ib_srp.c
> Hunk #1 succeeded at 1258 (offset -44 lines).
> Hunk #3 succeeded at 1332 (offset -42 lines).
> Hunk #5 succeeded at 1360 with fuzz 2 (offset -40 lines).
> Hunk #6 succeeded at 1404 with fuzz 2 (offset -3 lines).
> Hunk #7 succeeded at 1377 (offset -40 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_scsi_scan_target_7242_to_2_6_11.patch
> patching file drivers/infiniband/ulp/srp/ib_srp.c
> Hunk #1 succeeded at 975 (offset 26 lines).
> Hunk #2 succeeded at 1505 (offset 24 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/top_2844_to_2_6_11.patch
> patching file drivers/infiniband/Makefile
> Hunk #1 succeeded at 1 with fuzz 2.
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucm_5245_to_2_6_9.patch
> patching file drivers/infiniband/core/ucm.c
> Hunk #1 succeeded at 1270 (offset -8 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucma_6607_to_2_6_9.patch
> patching file drivers/infiniband/core/ucma.c
> Hunk #1 succeeded at 861 (offset 88 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/user_mad_4603_to_2_6_9.patch
> patching file drivers/infiniband/core/user_mad.c
> Hunk #1 succeeded at 857 (offset -20 lines).
> Hunk #3 succeeded at 1086 (offset -20 lines).
> Hunk #5 succeeded at 1123 (offset -20 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_main_3935_to_2_6_9.patch
> patching file drivers/infiniband/core/uverbs_main.c
> Hunk #1 succeeded at 727 (offset 11 lines).
> Hunk #2 succeeded at 949 (offset 1 line).
> Hunk #3 succeeded at 975 (offset 11 lines).
> Hunk #4 succeeded at 986 (offset 3 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_to_2_6_17.patch
> patching file drivers/infiniband/core/uverbs_main.c
> Hunk #1 succeeded at 1011 with fuzz 1 (offset 196 lines).
> 
>  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/hpage_patches/hpages.patch
> patching file drivers/infiniband/core/uverbs_mem.c
> /bin/rm -f /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
> cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/examples
> cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs
> Running: ./configure
> --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
> --disable-libcheck --prefix /usr/local/
> ofed --libdir /usr/local/ofed/lib64 CPPFLAGS="-I../libibverbs/include"
> configure: creating
> cache /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
> checking for a BSD-compatible install... /usr/bin/install -c
> checking whether build environment is sane... yes
> checking for gawk... gawk
> checking whether make sets $(MAKE)... yes
> checking build system type... x86_64-redhat-linux-gnu
> checking host system type... x86_64-redhat-linux-gnu
> checking for style of include used by make... GNU
> checking for gcc... gcc
> checking for C compiler default output file name... configure: error:
> C compiler cannot create executables
> See `config.log' for more details.
> Failed to execute: ./configure
> --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
> --disable-libcheck --prefix /
> usr/local/ofed --libdir /usr/local/ofed/lib64
> CPPFLAGS="-I../libibverbs/include"
> error: Bad exit status from /var/tmp/rpm-tmp.43267 (%install)
> 
> 
> RPM build errors:
>    user vlad does not exist - using root
>    group mtl does not exist - using root
>    user vlad does not exist - using root
>    group mtl does not exist - using root
>    Bad exit status from /var/tmp/rpm-tmp.43267 (%install)
> ERROR: Failed executing "rpmbuild --rebuild --define
> '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' --define
> 'build_root /var/tmp/OFED' --define 'configure_options --with-libibcm
> --with-libibverbs --with-libipathverbs --with-libmth
> ca --with-librdmacm --with-mstflint --with-perftest
> --with-ipath_inf-mod --with-ipoib-mod --with-mthca-mod --with-core-mod
> --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod'
> --define 'configure_options32 %{nil}' --define 'KVERSION
> 2.6.9-34.ELsmp' --define 'KSRC /lib/modules/2.6.9-34.ELsmp/build'
> --define 'build_kernel_ib 1' --define 'build_kernel_ib_de
> vel 1' --define 'NETWORK_CONF_DIR /etc/sysconfig/network-scripts'
> --define 'modprobe_update 1' --define 'include_ipoib_conf
> 1' --define 'build_32bit
> 0' /home/caton/OFED-1.1/SRPMS/openib-1.1-0.src.rpm"
> 
> ---------------------------------------------------------
> 
> Thanks a lot and best regards 
> 
> Julio. 
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general 


Julio.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070622/d581a55b/attachment.html>

From jrio at caton.es  Fri Jun 22 08:22:06 2007
From: jrio at caton.es (Julio del =?ISO-8859-1?Q?R=EDo?=)
Date: Fri, 22 Jun 2007 17:22:06 +0200
Subject: [ofa-general] problem with ofed 1.1.
Message-ID: <1182525727.5695.29.camel@linux.site>


> [root at localhost root]# rpm -qa | grep gcc
> libgcc-3.3.3-7
> gcc-g77-3.3.3-7
> gcc-3.3.3-7
> gcc-objc-3.3.3-7
> compat-gcc-c++-7.3-2.96.126
> gcc-gnat-3.3.3-7
> compat-gcc-7.3-2.96.126
> gcc34-3.4.0-1
> gcc34-c++-3.4.0-1
> libgcc-3.3.3-7
> gcc-c++-3.3.3-7
> gcc-java-3.3.3-7
> gcc34-java-3.4.0-1
> 
> [root at localhost root]# rpm -qa | grep libc
> libcroco-0.4.0-4
> libcap-devel-1.10-18.1
> libc-client-devel-2002e-5
> glibc-2.3.3-27
> glibc-kernheaders-2.4-8.44
> glibc-utils-2.3.3-27
> glibc-2.3.3-27
> glibc-profile-2.3.3-27
> glibc-common-2.3.3-27
> glibc-devel-2.3.3-27
> libc-client-2002e-5
> libcap-1.10-18.1
> glibc-headers-2.3.3-27
> 
> Thanks a lot and best regards
> 
> El vie, 22-06-2007 a las 11:04 -0400, Mahmoud Hanafi escribió:
> 
> > 
> > Do you have gcc and glibc-devel.x86_64 installed? 
> > 
> > -Mahmoud 
> > 
> > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > This is a PRIVATE message. If you are not the intended recipient,
> > please delete without copying and kindly advise us by e-mail of the
> > mistake in delivery. NOTE: Regardless of content, this e-mail shall
> > not operate to bind CSC to any order or other contract unless
> > pursuant to explicit written agreement or government initiative
> > expressly permitting the use of e-mail for such purpose.
> > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > 
> > 
> > 
> > Julio del Río <jrio at caton.es> 
> > Sent by:
> > general-bounces at lists.openfabrics.org 
> > 
> > 06/22/2007 04:34 AM 
> > 
> > 
> > 
> >                To
> > general at lists.openfabrics.org 
> >                cc
> > 
> > 
> >           Subject
> > [ofa-general]
> > problem with ofed
> > 1.1.
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > Good morning,
> > 
> > I hope you could help me with this:
> > 
> > I have this config:
> > 
> >    - Fedora Core 2
> >    - Linux localhost.localdomain 2.6.9-34.ELsmp #1 SMP Fri Feb 24
> > 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux
> >      - HCA Mellanox MHGS18-XTC
> >    - Flextronic Switch F-X430047
> >    - Ofed 1.1
> > 
> > and trying to install, this is the error log file I get:
> > 
> > ---------------------------------------------------------
> > + STATUS=0
> > + '[' 0 -ne 0 ']'
> > + cd openib-1.1
> > ++ /usr/bin/id -u
> > + '[' 0 = 0 ']'
> > + /bin/chown -Rhf root .
> > ++ /usr/bin/id -u
> > + '[' 0 = 0 ']'
> > + /bin/chgrp -Rhf root .
> > + /bin/chmod -Rf a+rX,u+w,g-w,o-w .
> > + exit 0
> > Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.43267
> > + umask 022
> > + cd /var/tmp/OFEDRPM/BUILD
> > + cd openib-1.1
> > + LANG=C
> > + export LANG
> > + unset DISPLAY
> > + rm -rf /var/tmp/OFED
> > + cd /var/tmp/OFEDRPM/BUILD/openib-1.1
> > + mkdir -p /var/tmp/OFED//usr/local/ofed/src
> > + cp
> > -a /var/tmp/OFEDRPM/BUILD/openib-1.1 /var/tmp/OFED//usr/local/ofed/src
> > + ./configure --prefix=/usr/local/ofed
> > --libdir=/usr/local/ofed/lib64 --kernel-version 2.6.9-34.ELsmp
> > --kernel-sources /lib
> > /modules/2.6.9-34.ELsmp/build --with-libibcm --with-libibverbs
> > --with-libipathverbs --with-libmthca --with-librdmacm --with
> > -mstflint --with-perftest --with-ipath_inf-mod --with-ipoib-mod
> > --with-mthca-mod --with-core-mod --with-user_mad-mod --with
> > -user_access-mod --with-addr_trans-mod
> > Quilt  does not exist... Going to use patch.
> > Created configure.mk:
> > prefix=/usr/local/ofed
> > PREFIX="--prefix /usr/local/ofed"
> > libdir=/usr/local/ofed/lib64
> > 
> > # Current working directory
> > CWD=/var/tmp/OFEDRPM/BUILD/openib-1.1
> > 
> > # Kernel level
> > KVERSION=2.6.9-34.ELsmp
> > EXTRAVERSION=-34.ELsmp
> > MODULES_DIR=/lib/modules/2.6.9-34.ELsmp
> > KSRC=/lib/modules/2.6.9-34.ELsmp/build
> > 
> > AUTOCONF_H=/var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h
> > WITH_MEMTRACK=no
> > 
> > WITH_MAKE_PARAMS=
> > 
> > CONFIG_INFINIBAND=m
> > CONFIG_INFINIBAND_IPOIB=m
> > CONFIG_INFINIBAND_SDP=
> > CONFIG_INFINIBAND_SRP=
> > 
> > CONFIG_INFINIBAND_USER_MAD=m
> > CONFIG_INFINIBAND_USER_ACCESS=m
> > CONFIG_INFINIBAND_ADDR_TRANS=y
> > CONFIG_INFINIBAND_MTHCA=m
> > 
> > CONFIG_INFINIBAND_IPOIB_DEBUG=y
> > CONFIG_INFINIBAND_ISER=
> > CONFIG_INFINIBAND_EHCA=
> > CONFIG_INFINIBAND_EHCA_SCALING=
> > CONFIG_INFINIBAND_RDS=
> > CONFIG_INFINIBAND_RDS_DEBUG=
> > CONFIG_INFINIBAND_MADEYE=
> > 
> > CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=
> > CONFIG_INFINIBAND_SDP_SEND_ZCOPY=
> > CONFIG_INFINIBAND_SDP_RECV_ZCOPY=
> > CONFIG_INFINIBAND_SDP_DEBUG=
> > CONFIG_INFINIBAND_SDP_DEBUG_DATA=
> > CONFIG_INFINIBAND_IPATH=m
> > CONFIG_INFINIBAND_MTHCA_DEBUG=y
> > 
> > 
> > 
> > # User level
> > WITH_IBVERBS=yes
> > WITH_MTHCA=yes
> > WITH_IPATHVERBS=yes
> > WITH_EHCA=no
> > WITH_CM=yes
> > WITH_SDP=no
> > WITH_DAPL=no
> > WITH_RDMACM=yes
> > WITH_MANAGEMENT_LIBS=no
> > WITH_OSM=no
> > WITH_DIAGS=no
> > WITH_MPI=no
> > WITH_PERFTEST=yes
> > WITH_SRPTOOLS=no
> > WITH_IPOIBTOOLS=no
> > WITH_TVFLASH=no
> > WITH_MSTFLINT=yes
> > 
> > Created /var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h:
> > #undef CONFIG_INFINIBAND
> > #undef CONFIG_INFINIBAND_IPOIB
> > #undef CONFIG_INFINIBAND_SDP
> > #undef CONFIG_INFINIBAND_SRP
> > 
> > #undef CONFIG_INFINIBAND_USER_MAD
> > #undef CONFIG_INFINIBAND_USER_ACCESS
> > #undef CONFIG_INFINIBAND_ADDR_TRANS
> > #undef CONFIG_INFINIBAND_MTHCA
> > 
> > #undef CONFIG_INFINIBAND_IPOIB_DEBUG
> > #undef CONFIG_INFINIBAND_ISER
> > #undef CONFIG_INFINIBAND_EHCA
> > #undef CONFIG_INFINIBAND_EHCA_SCALING
> > #undef CONFIG_INFINIBAND_RDS
> > #undef CONFIG_INFINIBAND_RDS_DEBUG
> > #undef CONFIG_INFINIBAND_MADEYE
> > 
> > #undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA
> > #undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY
> > #undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY
> > #undef CONFIG_INFINIBAND_SDP_DEBUG
> > #undef CONFIG_INFINIBAND_SDP_DEBUG_DATA
> > #undef CONFIG_INFINIBAND_IPATH
> > #undef CONFIG_INFINIBAND_MTHCA_DEBUG
> > 
> > #define CONFIG_INFINIBAND 1
> > #define CONFIG_INFINIBAND_IPOIB 1
> > #undef CONFIG_INFINIBAND_SDP
> > #undef CONFIG_INFINIBAND_SRP
> > 
> > #define CONFIG_INFINIBAND_USER_MAD 1
> > #define CONFIG_INFINIBAND_USER_ACCESS 1
> > #define CONFIG_INFINIBAND_ADDR_TRANS 1
> > #define CONFIG_INFINIBAND_MTHCA 1
> > 
> > #define CONFIG_INFINIBAND_IPOIB_DEBUG 1
> > #undef CONFIG_INFINIBAND_ISER
> > #undef CONFIG_INFINIBAND_EHCA
> > #undef CONFIG_INFINIBAND_RDS
> > #undef CONFIG_INFINIBAND_RDS_DEBUG
> > 
> > 
> > #undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA
> > #undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY
> > #undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY
> > #undef CONFIG_INFINIBAND_SDP_DEBUG
> > #undef CONFIG_INFINIBAND_SDP_DEBUG_DATA
> > #define CONFIG_INFINIBAND_IPATH 1
> > #define CONFIG_INFINIBAND_MTHCA_DEBUG 1
> > #undef CONFIG_INFINIBAND_MADEYE
> > 
> > mkdir -p /var/tmp/OFEDRPM/BUILD/openib-1.1/patches
> > touch /var/tmp/OFEDRPM/BUILD/openib-1.1/patches/quiltrc
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/dapl_qp_attr.patch
> > patching file src/userspace/dapl/dapl/openib_cma/dapl_ib_util.c
> > patching file src/userspace/dapl/dapl/openib_scm/dapl_ib_util.c
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_cq_deadlock.patch
> > patching file src/userspace/libmthca/src/verbs.c
> > Hunk #1 succeeded at 614 (offset -8 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_stddef.patch
> > patching file src/userspace/libmthca/src/mthca.h
> > Hunk #1 succeeded at 38 with fuzz 2 (offset 2 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_compat.patch
> > patching file src/userspace/librdmacm/src/cma.c
> > Hunk #1 succeeded at 157 (offset 16 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_ver_abi.patch
> > patching file src/userspace/librdmacm/src/cma.c
> > Hunk #2 succeeded at 170 (offset 16 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/mstflint.patch
> > patching file src/userspace/mstflint/mtcr.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_add_mra_timeout_limit.patch
> > patching file drivers/infiniband/core/cm.c
> > Hunk #1 succeeded at 53 (offset -1 lines).
> > Hunk #2 succeeded at 2268 (offset -36 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_cleanup_timewait.patch
> > patching file drivers/infiniband/core/cm.c
> > Hunk #1 succeeded at 686 (offset 7 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_established1.patch
> > patching file drivers/infiniband/ulp/sdp/sdp.h
> > patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c
> > Hunk #1 succeeded at 515 (offset 16 lines).
> > patching file drivers/infiniband/ulp/sdp/sdp_cma.c
> > patching file drivers/infiniband/ulp/sdp/sdp_main.c
> > Hunk #1 succeeded at 589 (offset 26 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_increase_max_cm_retries.patch
> > patching file drivers/infiniband/core/cma.c
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_list_init.patch
> > patching file drivers/infiniband/core/cma.c
> > Hunk #1 succeeded at 328 (offset -11 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_mem_leak.patch
> > patching file drivers/infiniband/core/cma.c
> > Hunk #1 succeeded at 1713 (offset -241 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_race_fix.patch
> > patching file drivers/infiniband/core/cma.c
> > Hunk #1 succeeded at 910 with fuzz 1 (offset -113 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_tavor_quirk.patch
> > patching file drivers/infiniband/core/cma.c
> > Hunk #1 succeeded at 48 with fuzz 2.
> > Hunk #2 succeeded at 1154 (offset 27 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ib_sa_names.patch
> > patching file include/rdma/ib_sa.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-fixes.patch
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/Makefile
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/Kconfig
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/Makefile
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_common.h
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_cq.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_debug.h
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_diag.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_driver.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_fs.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_ht400.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_iba6110.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_intr.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_kernel.h
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_keys.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_layer.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_layer.h
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_mad.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_mr.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_pe800.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_qp.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_rc.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_registers.h
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_ruc.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_srq.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_stats.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_sysfs.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_uc.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_ud.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_verbs.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_verbs.h
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_verbs_mcast.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_wc_ppc64.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/verbs_debug.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-limit-packets-sent-without-ack.patch
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_qp.c
> > Hunk #1 succeeded at 502 (offset -8 lines).
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_rc.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_verbs.c
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_verbs.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-memcpy_cachebypass.patch
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/Makefile
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/ipath_verbs.c
> > (Stripping trailing CRs from patch.)
> > patching file
> > drivers/infiniband/hw/ipath/memcpy_cachebypass_x86_64.S
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-x86_64.patch
> > (Stripping trailing CRs from patch.)
> > patching file drivers/infiniband/hw/ipath/Kconfig
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_issue3.patch
> > patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_join_mask.patch
> > patching file drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> > Hunk #1 succeeded at 471 (offset -1 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_restart.patch
> > patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_selector_updated.patch
> > patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
> > Hunk #2 succeeded at 458 (offset 4 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_attributes.patch
> > patching file drivers/infiniband/ulp/srp/ib_srp.c
> > Hunk #1 succeeded at 1461 (offset -6 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_remove_reconnect.patch
> > patching file drivers/infiniband/ulp/srp/ib_srp.c
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_wa_post_send.patch
> > patching file drivers/infiniband/ulp/srp/ib_srp.c
> > patching file drivers/infiniband/ulp/srp/ib_srp.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/lockdep_header.patch
> > patching file drivers/infiniband/core/uverbs_cmd.c
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_av_statrate.patch
> > patching file drivers/infiniband/hw/mthca/mthca_av.c
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_catas_reset.patch
> > patching file drivers/infiniband/hw/mthca/mthca_catas.c
> > patching file drivers/infiniband/hw/mthca/mthca_main.c
> > patching file drivers/infiniband/hw/mthca/mthca_dev.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_mad_traps.patch
> > patching file drivers/infiniband/hw/mthca/mthca_mad.c
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_port.patch
> > patching file drivers/infiniband/hw/mthca/mthca_provider.c
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_portnum.patch
> > patching file drivers/infiniband/hw/mthca/mthca_qp.c
> > Hunk #1 succeeded at 478 (offset 4 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_statrate_bits.patch
> > patching file drivers/infiniband/hw/mthca/mthca_qp.c
> > Hunk #1 succeeded at 414 (offset 4 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_use_uar2.patch
> > patching file drivers/infiniband/hw/mthca/mthca_uar.c
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/robert-ipath-diagpkt-init-fixup.patch
> > patching file drivers/infiniband/hw/ipath/ipath_diag.c
> > Hunk #1 succeeded at 285 (offset -1 lines).
> > patching file drivers/infiniband/hw/ipath/ipath_driver.c
> > Hunk #1 succeeded at 539 (offset -20 lines).
> > Hunk #2 succeeded at 596 with fuzz 1 (offset -105 lines).
> > Hunk #3 succeeded at 2029 (offset -156 lines).
> > patching file drivers/infiniband/hw/ipath/ipath_kernel.h
> > Hunk #1 succeeded at 793 (offset -96 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_credits_by_seq.patch
> > patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_post_credits.patch
> > patching file drivers/infiniband/ulp/sdp/sdp.h
> > Hunk #1 succeeded at 177 (offset 1 line).
> > patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c
> > Hunk #1 succeeded at 324 (offset 6 lines).
> > patching file drivers/infiniband/ulp/sdp/sdp_cma.c
> > Hunk #1 succeeded at 434 (offset 4 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_drep_on_not_found.patch
> > patching file drivers/infiniband/core/cm.c
> > Hunk #1 succeeded at 1890 (offset -10 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_randomize_psn.patch
> > patching file drivers/infiniband/core/cm.c
> > Hunk #3 succeeded at 81 (offset 7 lines).
> > Hunk #5 succeeded at 327 (offset 7 lines).
> > Hunk #7 succeeded at 2115 (offset 27 lines).
> > Hunk #8 succeeded at 3369 (offset 2 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_unload_crash.patch
> > patching file drivers/infiniband/core/cm.c
> > Hunk #1 succeeded at 82 (offset 7 lines).
> > Hunk #3 succeeded at 656 (offset 6 lines).
> > Hunk #5 succeeded at 685 (offset 6 lines).
> > Hunk #7 succeeded at 1316 (offset 6 lines).
> > Hunk #9 succeeded at 1334 (offset 6 lines).
> > Hunk #10 succeeded at 2626 (offset -7 lines).
> > Hunk #11 succeeded at 3409 (offset -29 lines).
> > Hunk #12 succeeded at 3449 (offset -7 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_establish.patch
> > patching file include/rdma/rdma_cm.h
> > Hunk #1 succeeded at 241 (offset -15 lines).
> > patching file drivers/infiniband/core/cm.c
> > Hunk #1 succeeded at 3242 (offset 35 lines).
> > patching file drivers/infiniband/core/cma.c
> > Hunk #1 succeeded at 759 (offset -81 lines).
> > Hunk #3 succeeded at 1752 (offset -212 lines).
> > Hunk #4 succeeded at 1997 with fuzz 1.
> > Hunk #5 succeeded at 1828 (offset -229 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_hotplug.patch
> > patching file drivers/infiniband/core/cma.c
> > Hunk #1 succeeded at 278 (offset 7 lines).
> > Hunk #3 succeeded at 700 (offset 8 lines).
> > Hunk #5 succeeded at 895 with fuzz 1 (offset -9 lines).
> > Hunk #6 succeeded at 1382 (offset 6 lines).
> > Hunk #7 succeeded at 1610 (offset -9 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_typo_fix.patch
> > patching file drivers/infiniband/core/cma.c
> > Hunk #1 succeeded at 276 with fuzz 2 (offset 7 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_1_recreate_at_reconnect.patch
> > patching file drivers/infiniband/ulp/srp/ib_srp.c
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_2_use_multiple_initiator_ports.patch
> > patching file drivers/infiniband/ulp/srp/ib_srp.c
> > patching file drivers/infiniband/ulp/srp/ib_srp.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_topspin.patch
> > patching file drivers/infiniband/ulp/srp/ib_srp.c
> > Hunk #1 succeeded at 358 (offset -1 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_1.patch
> > patching file drivers/infiniband/hw/ehca/ehca_main.c
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_2.patch
> > patching file drivers/infiniband/hw/ehca/ehca_tools.h
> > 
> > Applying patches for 2.6.9-34.ELsmp kernel (RHAS4 Update 3):
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_1_netevents_revert_to_2_6_17.patch
> > patching file drivers/infiniband/core/addr.c
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_3926_to_2_6_13.patch
> > patching file drivers/infiniband/core/addr.c
> > Hunk #1 succeeded at 327 with fuzz 1 (offset 11 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_4670_to_2_6_9.patch
> > patching file drivers/infiniband/core/addr.c
> > Hunk #1 succeeded at 27 with fuzz 2.
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/asm_bitops_ia64_to_2_6_11.patch
> > patching file include/asm/bitops.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/core_4807_to_2_6_9.patch
> > patching file drivers/infiniband/core/sysfs.c
> > Hunk #1 succeeded at 438 (offset -4 lines).
> > patching file drivers/infiniband/core/user_mad.c
> > Hunk #2 succeeded at 677 (offset 91 lines).
> > Hunk #3 succeeded at 685 (offset 5 lines).
> > Hunk #4 succeeded at 1106 (offset 91 lines).
> > Hunk #5 succeeded at 1053 (offset 5 lines).
> > patching file drivers/infiniband/core/uverbs_main.c
> > Hunk #2 succeeded at 118 (offset 3 lines).
> > patching file drivers/infiniband/core/uverbs_mem.c
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/debugfs_to_2_6_9.patch
> > patching file drivers/infiniband/include/linux/debugfs.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipath-backport.patch
> > patching file drivers/infiniband/hw/ipath/iowrite32_copy_x86_64.S
> > patching file drivers/infiniband/hw/ipath/ipath_backport.h
> > patching file drivers/infiniband/hw/ipath/ipath_diag.c
> > patching file drivers/infiniband/hw/ipath/ipath_driver.c
> > Hunk #2 succeeded at 557 (offset 1 line).
> > Hunk #3 succeeded at 599 (offset 1 line).
> > Hunk #4 succeeded at 1366 (offset 1 line).
> > Hunk #5 succeeded at 1395 (offset 1 line).
> > Hunk #6 succeeded at 1875 (offset 1 line).
> > Hunk #7 succeeded at 1903 (offset 1 line).
> > Hunk #8 succeeded at 1984 (offset -9 lines).
> > Hunk #9 succeeded at 2027 (offset 1 line).
> > Hunk #10 succeeded at 2142 (offset -9 lines).
> > patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
> > patching file drivers/infiniband/hw/ipath/ipath_fs.c
> > patching file drivers/infiniband/hw/ipath/ipath_iba6110.c
> > patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
> > patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
> > patching file drivers/infiniband/hw/ipath/ipath_kernel.h
> > patching file drivers/infiniband/hw/ipath/ipath_layer.c
> > patching file drivers/infiniband/hw/ipath/ipath_sysfs.c
> > patching file drivers/infiniband/hw/ipath/ipath_user_pages.c
> > patching file drivers/infiniband/hw/ipath/ipath_verbs.c
> > patching file drivers/infiniband/hw/ipath/ipath_verbs.h
> > patching file drivers/infiniband/hw/ipath/Makefile
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_5010_to_2_6_9.patch
> > patching file drivers/infiniband/include/linux/if_infiniband.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_8111_to_2_6_16.patch
> > patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
> > Hunk #2 succeeded at 803 (offset 49 lines).
> > patching file drivers/infiniband/ulp/ipoib/ipoib.h
> > Hunk #1 succeeded at 46 (offset -1 lines).
> > Hunk #2 succeeded at 220 (offset 1 line).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_device_5496_to_2_6_15.patch
> > patching file drivers/infiniband/include/linux/device.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_err_to_2_6_11.patch
> > patching file include/linux/err.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_idr_6554_to_2_6_13.patch
> > patching file drivers/infiniband/include/linux/idr.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_inetdevice_to_2_6_17.patch
> > patching file drivers/infiniband/include/linux/inetdevice.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_lockdep_to_2_6_17.patch
> > patching file include/linux/lockdep.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_mutex_5947_to_2_6_15.patch
> > patching file drivers/infiniband/include/linux/mutex.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_netdevice_to_2_6_17.patch
> > patching file drivers/infiniband/include/linux/netdevice.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_pci_7970_to_2_6_9.patch
> > patching file drivers/infiniband/include/linux/pci.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_scatterlist_6369_to_2_6_9.patch
> > patching file drivers/infiniband/include/linux/scatterlist.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_signal_to_2_6_17.patch
> > patching file drivers/infiniband/include/linux/signal.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_skbuff_6754_to_2_6_11.patch
> > patching file include/linux/skbuff.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_spinlock_5883_to_2_6_9.patch
> > patching file drivers/infiniband/include/linux/spinlock.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/makefile_to_2_6_9.patch
> > patching file drivers/infiniband/ulp/srp/Makefile
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_dev_3465_to_2_6_11.patch
> > patching file drivers/infiniband/hw/mthca/mthca_dev.h
> > Hunk #1 succeeded at 57 with fuzz 2 (offset 4 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_provider_3465_to_2_6_9.patch
> > patching file drivers/infiniband/hw/mthca/mthca_provider.c
> > Hunk #1 succeeded at 387 (offset 28 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_inet_sock_6754_to_2_6_15.patch
> > patching file include/net/inet_sock.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_1_6754_to_2_6_13.patch
> > patching file include/net/sock.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_2_6754_to_2_6_11.patch
> > patching file include/net/sock.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_tcp_states_6754_to_2_6_13.patch
> > patching file include/net/tcp_states.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/read_mostly_6255_to_2_6_13.patch
> > patching file drivers/infiniband/include/linux/cache.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/scsi_7242_to_2_6_14.patch
> > patching file include/scsi/scsi.h
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/sdp_7277_to_2_6_11.patch
> > patching file drivers/infiniband/ulp/sdp/sdp_main.c
> > Hunk #1 succeeded at 418 (offset 118 lines).
> > Hunk #2 succeeded at 535 (offset 41 lines).
> > Hunk #3 succeeded at 633 (offset 118 lines).
> > Hunk #4 succeeded at 1408 (offset 245 lines).
> > Hunk #5 succeeded at 1301 (offset 118 lines).
> > Hunk #6 succeeded at 1537 (offset 245 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_4030_to_2_6_12.patch
> > patching file drivers/infiniband/ulp/srp/ib_srp.c
> > Hunk #1 succeeded at 1594 (offset 271 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_7312_to_2_6_11.patch
> > patching file drivers/infiniband/ulp/srp/ib_srp.c
> > Hunk #1 succeeded at 1258 (offset -44 lines).
> > Hunk #3 succeeded at 1332 (offset -42 lines).
> > Hunk #5 succeeded at 1360 with fuzz 2 (offset -40 lines).
> > Hunk #6 succeeded at 1404 with fuzz 2 (offset -3 lines).
> > Hunk #7 succeeded at 1377 (offset -40 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_scsi_scan_target_7242_to_2_6_11.patch
> > patching file drivers/infiniband/ulp/srp/ib_srp.c
> > Hunk #1 succeeded at 975 (offset 26 lines).
> > Hunk #2 succeeded at 1505 (offset 24 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/top_2844_to_2_6_11.patch
> > patching file drivers/infiniband/Makefile
> > Hunk #1 succeeded at 1 with fuzz 2.
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucm_5245_to_2_6_9.patch
> > patching file drivers/infiniband/core/ucm.c
> > Hunk #1 succeeded at 1270 (offset -8 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucma_6607_to_2_6_9.patch
> > patching file drivers/infiniband/core/ucma.c
> > Hunk #1 succeeded at 861 (offset 88 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/user_mad_4603_to_2_6_9.patch
> > patching file drivers/infiniband/core/user_mad.c
> > Hunk #1 succeeded at 857 (offset -20 lines).
> > Hunk #3 succeeded at 1086 (offset -20 lines).
> > Hunk #5 succeeded at 1123 (offset -20 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_main_3935_to_2_6_9.patch
> > patching file drivers/infiniband/core/uverbs_main.c
> > Hunk #1 succeeded at 727 (offset 11 lines).
> > Hunk #2 succeeded at 949 (offset 1 line).
> > Hunk #3 succeeded at 975 (offset 11 lines).
> > Hunk #4 succeeded at 986 (offset 3 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_to_2_6_17.patch
> > patching file drivers/infiniband/core/uverbs_main.c
> > Hunk #1 succeeded at 1011 with fuzz 1 (offset 196 lines).
> > 
> >  /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/hpage_patches/hpages.patch
> > patching file drivers/infiniband/core/uverbs_mem.c
> > /bin/rm -f /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
> > cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/examples
> > cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs
> > Running: ./configure
> > --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
> > --disable-libcheck --prefix /usr/local/
> > ofed --libdir /usr/local/ofed/lib64
> > CPPFLAGS="-I../libibverbs/include"
> > configure: creating
> > cache /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
> > checking for a BSD-compatible install... /usr/bin/install -c
> > checking whether build environment is sane... yes
> > checking for gawk... gawk
> > checking whether make sets $(MAKE)... yes
> > checking build system type... x86_64-redhat-linux-gnu
> > checking host system type... x86_64-redhat-linux-gnu
> > checking for style of include used by make... GNU
> > checking for gcc... gcc
> > checking for C compiler default output file name... configure:
> > error: C compiler cannot create executables
> > See `config.log' for more details.
> > Failed to execute: ./configure
> > --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
> > --disable-libcheck --prefix /
> > usr/local/ofed --libdir /usr/local/ofed/lib64
> > CPPFLAGS="-I../libibverbs/include"
> > error: Bad exit status from /var/tmp/rpm-tmp.43267 (%install)
> > 
> > 
> > RPM build errors:
> >    user vlad does not exist - using root
> >    group mtl does not exist - using root
> >    user vlad does not exist - using root
> >    group mtl does not exist - using root
> >    Bad exit status from /var/tmp/rpm-tmp.43267 (%install)
> > ERROR: Failed executing "rpmbuild --rebuild --define
> > '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed'
> > --define
> > 'build_root /var/tmp/OFED' --define 'configure_options
> > --with-libibcm --with-libibverbs --with-libipathverbs --with-libmth
> > ca --with-librdmacm --with-mstflint --with-perftest
> > --with-ipath_inf-mod --with-ipoib-mod --with-mthca-mod
> > --with-core-mod
> > --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod'
> > --define 'configure_options32 %{nil}' --define 'KVERSION
> > 2.6.9-34.ELsmp' --define 'KSRC /lib/modules/2.6.9-34.ELsmp/build'
> > --define 'build_kernel_ib 1' --define 'build_kernel_ib_de
> > vel 1' --define 'NETWORK_CONF_DIR /etc/sysconfig/network-scripts'
> > --define 'modprobe_update 1' --define 'include_ipoib_conf
> > 1' --define 'build_32bit
> > 0' /home/caton/OFED-1.1/SRPMS/openib-1.1-0.src.rpm"
> > 
> > ---------------------------------------------------------
> > 
> > Thanks a lot and best regards 
> > 
> > Julio. 
> > 
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > 
> > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general 
> 
> 
> Julio.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070622/6b4667e9/attachment.html>

From mshefty at ichips.intel.com  Fri Jun 22 08:55:22 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 22 Jun 2007 08:55:22 -0700
Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support
In-Reply-To: <1182373280.15653.335513.camel@hal.voltaire.com>
References: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com>
	<1182373280.15653.335513.camel@hal.voltaire.com>
Message-ID: <467BF0EA.2090609@ichips.intel.com>

> Just two things:
> 1. It might be better if the ABI version 5 warning message for only
> pkey_index 0 being supported comes out at umad_init time rather than
> umad_set_pkey time so that the user is not swamped with these.

Placing the warning in umad_init would display it even if the app only 
used pkey_index 0, so keeping it in umad_set_pkey seems better to me. 
We could make it so that the warning message only displays once though.

> 2. There is one pathological combination. It would be using 2.6.23 (with
> the new user_mad ABI version 6), an updated libibumad would be required,
> but an older libvendor (osm_vendor_ibumad.c without your one line
> change). That might be the case with someone who swapped back and forth
> between OFED 1.2 and master in some scenarios.

I don't know how we can support all combinations, especially since the 
return codes aren't being checked.  We can make a special case when 
umad_set_pkey() is called with 0xffff on ABI 6, and display a warning 
message and/or convert it to the correct index.

> Also, this does not quite work as expected. An error was returned based
> on the bad pkey index but I do see a send on the IB link (with a bad
> pkey). I wouldn't have expected the latter part. Maybe this is a driver
> or firmware issue. Not sure yet. I suppose there should be some
> pkey_index validation (to make sure it is within the device's valid
> range) and that should also ultimately get added to libibumad or should
> such validation go into the user_mad kernel module ?

I think if we want to validate that the pkey_index is reasonable, the 
check should go in the kernel.

- Sean


From mshefty at ichips.intel.com  Fri Jun 22 09:07:18 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 22 Jun 2007 09:07:18 -0700
Subject: [ofa-general] [PATCH] for-2.6.23 ib/umad: add partition support
In-Reply-To: <adabqf93vro.fsf@cisco.com>
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>	<adair9i8ihq.fsf@cisco.com>
	<467996C4.1060201@ichips.intel.com> <adabqf93vro.fsf@cisco.com>
Message-ID: <467BF3B6.9070800@ichips.intel.com>

> I'm beginning to think that just updating the ABI might be the right
> answer.  But let's try to make this be the last ABI break.  Are we
> pretty sure there's *nothing* else we might ever want to add to the
> structure?  I can't think of anything right now...

Some other random thoughts... we've never agreed on what approach to use 
if we ever want to expose direct IB multicast support or event 
registration.  I created a separate module for this for PathForward, but 
there may be a way to expose that functionality through the user_mad 
interface.  (Personally, I'd like to export any desired functionality to 
the user through other interfaces, like the rdma_cm or verbs.)

- Sean


From rdreier at cisco.com  Fri Jun 22 09:17:37 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 22 Jun 2007 09:17:37 -0700
Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition
	support
In-Reply-To: <20070622052700.GP4857@mellanox.co.il> (Michael S. Tsirkin's
	message of "Fri, 22 Jun 2007 08:27:00 +0300")
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
	<adair9i8ihq.fsf@cisco.com> <467996C4.1060201@ichips.intel.com>
	<adabqf93vro.fsf@cisco.com> <20070622052700.GP4857@mellanox.co.il>
Message-ID: <adaabus2cbi.fsf@cisco.com>

 > Ugh. OFED 1.2 (with the old ABI) just went out.
 > I wonder - is it time to start making the kernel backwards-compatible?
 > It would be trivial to have userspace supply its own ABI
 > version and have kernel support both new and old ABI if we want to.
 > What do you think?

There's always a balance between keeping cruft in the kernel for
compatibility and not breaking userspace.  I'm beginning to think the
right plan in this case might be to rename struct ib_user_mad_hdr to
struct ib_user_mad_hdr_old, make a new struct ib_user_mad with the
pkey_index member and add a new ioctl IB_USER_MAD_ENABLE_PKEY_INDEX.

The ABI version would stay the same, and if someone just opened the
device and didn't do the IB_USER_MAD_ENABLE_PKEY_INDEX they would get
the old ABI.  If they do the ioctl then they get the new header.  Also
we could define that ABI version 6 just has the new struct
ib_user_mad_hdr and no ioctl.

Then we could say we were going to switch to the new ABI in a year or
two.  And print a warning in the kernel log for every application that
doesn't use the ioctl.

I'll try to cook up a kernel patch next week.

 - R.


From rdreier at cisco.com  Fri Jun 22 09:19:51 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 22 Jun 2007 09:19:51 -0700
Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support
In-Reply-To: <20070622051201.GM4857@mellanox.co.il> (Michael S. Tsirkin's
	message of "Fri, 22 Jun 2007 08:12:01 +0300")
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
	<adair9i8ihq.fsf@cisco.com> <467996C4.1060201@ichips.intel.com>
	<20070621033854.GF8868@mellanox.co.il> <ada7ipx3vmj.fsf@cisco.com>
	<20070622051201.GM4857@mellanox.co.il>
Message-ID: <ada645g2c7s.fsf@cisco.com>

 > We could have asked all users to use pwrite with offset 0, and then other I
 > think pos field would be useful for other things like versioning.  As it is,
 > people use write to pass in MADs, so I'm not sure what does pos point to.

Oh... I don't think that's a very good interface.  I don't think
people expect character special files to pay attention to offsets,
especially not in a magic way.  It's probably better to just use
read/write for IO and ioctl for control stuff.

 - R.


From halr at voltaire.com  Fri Jun 22 09:25:03 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 22 Jun 2007 12:25:03 -0400
Subject: [ofa-general] Re: [PATCH 1/2] libibumad: fix partition support
In-Reply-To: <467BF0EA.2090609@ichips.intel.com>
References: <000801c7af6e$7ae0ba80$ff0da8c0@amr.corp.intel.com>
	<1182373280.15653.335513.camel@hal.voltaire.com>
	<467BF0EA.2090609@ichips.intel.com>
Message-ID: <1182529502.10379.52789.camel@hal.voltaire.com>

On Fri, 2007-06-22 at 11:55, Sean Hefty wrote:
> > Just two things:
> > 1. It might be better if the ABI version 5 warning message for only
> > pkey_index 0 being supported comes out at umad_init time rather than
> > umad_set_pkey time so that the user is not swamped with these.
> 
> Placing the warning in umad_init would display it even if the app only 
> used pkey_index 0, so keeping it in umad_set_pkey seems better to me. 
> We could make it so that the warning message only displays once though.

Sure. That would be better IMO too.

> > 2. There is one pathological combination. It would be using 2.6.23 (with
> > the new user_mad ABI version 6), an updated libibumad would be required,
> > but an older libvendor (osm_vendor_ibumad.c without your one line
> > change). That might be the case with someone who swapped back and forth
> > between OFED 1.2 and master in some scenarios.
> 
> I don't know how we can support all combinations, especially since the 
> return codes aren't being checked.  We can make a special case when 
> umad_set_pkey() is called with 0xffff on ABI 6, and display a warning 
> message and/or convert it to the correct index.

Yes, but this would eliminate the case where some implementation
supported the max pkeys. That's purely theoretical and no one is even
close to that max yet.

> > Also, this does not quite work as expected. An error was returned based
> > on the bad pkey index but I do see a send on the IB link (with a bad
> > pkey). I wouldn't have expected the latter part. Maybe this is a driver
> > or firmware issue. Not sure yet. I suppose there should be some
> > pkey_index validation (to make sure it is within the device's valid
> > range) and that should also ultimately get added to libibumad or should
> > such validation go into the user_mad kernel module ?
> 
> I think if we want to validate that the pkey_index is reasonable, the 
> check should go in the kernel.

Yes, that was my thinking too.

-- Hal

> - Sean


From rdreier at cisco.com  Fri Jun 22 09:26:05 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 22 Jun 2007 09:26:05 -0700
Subject: [ofa-general] [GIT PULL] please pull infiniband.git
Message-ID: <ada1wg42bxe.fsf@cisco.com>

Linus, please pull from

    master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This will get a few fixes for crashes/deadlocks as a well as a few
other small, safe fixes:

Jack Morgenstein (1):
      IB/mlx4: Correct max_srq_wr returned from mlx4_ib_query_device()

Michael S. Tsirkin (2):
      IPoIB/cm: Initialize RX before moving QP to RTR
      IPoIB/cm: Fix interoperability when MTU doesn't match

Roland Dreier (2):
      IB/umem: Fix possible hang on process exit
      IPoIB/cm: Remove dead definition of struct ipoib_cm_id

 drivers/infiniband/core/umem.c          |   16 ++++++-----
 drivers/infiniband/hw/mlx4/main.c       |    2 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |   42 ++++++++++++++----------------
 3 files changed, 30 insertions(+), 30 deletions(-)


diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index b4aec51..d40652a 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -225,13 +225,15 @@ void ib_umem_release(struct ib_umem *umem)
 	 * up here and not be able to take the mmap_sem.  In that case
 	 * we defer the vm_locked accounting to the system workqueue.
 	 */
-	if (context->closing && !down_write_trylock(&mm->mmap_sem)) {
-		INIT_WORK(&umem->work, ib_umem_account);
-		umem->mm   = mm;
-		umem->diff = diff;
-
-		schedule_work(&umem->work);
-		return;
+	if (context->closing) {
+		if (!down_write_trylock(&mm->mmap_sem)) {
+			INIT_WORK(&umem->work, ib_umem_account);
+			umem->mm   = mm;
+			umem->diff = diff;
+
+			schedule_work(&umem->work);
+			return;
+		}
 	} else
 		down_write(&mm->mmap_sem);
 
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 1095c82..c591616 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -120,7 +120,7 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
 	props->max_qp_init_rd_atom = dev->dev->caps.max_qp_init_rdma;
 	props->max_res_rd_atom	   = props->max_qp_rd_atom * props->max_qp;
 	props->max_srq		   = dev->dev->caps.num_srqs - dev->dev->caps.reserved_srqs;
-	props->max_srq_wr	   = dev->dev->caps.max_srq_wqes;
+	props->max_srq_wr	   = dev->dev->caps.max_srq_wqes - 1;
 	props->max_srq_sge	   = dev->dev->caps.max_srq_sge;
 	props->local_ca_ack_delay  = dev->dev->caps.local_ca_ack_delay;
 	props->atomic_cap	   = dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_ATOMIC ?
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 076a0bb..5ffc464 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -56,13 +56,6 @@ MODULE_PARM_DESC(cm_data_debug_level,
 #define IPOIB_CM_RX_DELAY       (3 * 256 * HZ)
 #define IPOIB_CM_RX_UPDATE_MASK (0x3)
 
-struct ipoib_cm_id {
-	struct ib_cm_id *id;
-	int flags;
-	u32 remote_qpn;
-	u32 remote_mtu;
-};
-
 static struct ib_qp_attr ipoib_cm_err_attr = {
 	.qp_state = IB_QPS_ERR
 };
@@ -309,6 +302,11 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 		return -ENOMEM;
 	p->dev = dev;
 	p->id = cm_id;
+	cm_id->context = p;
+	p->state = IPOIB_CM_RX_LIVE;
+	p->jiffies = jiffies;
+	INIT_LIST_HEAD(&p->list);
+
 	p->qp = ipoib_cm_create_rx_qp(dev, p);
 	if (IS_ERR(p->qp)) {
 		ret = PTR_ERR(p->qp);
@@ -320,24 +318,24 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 	if (ret)
 		goto err_modify;
 
+	spin_lock_irq(&priv->lock);
+	queue_delayed_work(ipoib_workqueue,
+			   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
+	/* Add this entry to passive ids list head, but do not re-add it
+	 * if IB_EVENT_QP_LAST_WQE_REACHED has moved it to flush list. */
+	p->jiffies = jiffies;
+	if (p->state == IPOIB_CM_RX_LIVE)
+		list_move(&p->list, &priv->cm.passive_ids);
+	spin_unlock_irq(&priv->lock);
+
 	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
 	if (ret) {
 		ipoib_warn(priv, "failed to send REP: %d\n", ret);
-		goto err_rep;
+		if (ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE))
+			ipoib_warn(priv, "unable to move qp to error state\n");
 	}
-
-	cm_id->context = p;
-	p->jiffies = jiffies;
-	p->state = IPOIB_CM_RX_LIVE;
-	spin_lock_irq(&priv->lock);
-	if (list_empty(&priv->cm.passive_ids))
-		queue_delayed_work(ipoib_workqueue,
-				   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
-	list_add(&p->list, &priv->cm.passive_ids);
-	spin_unlock_irq(&priv->lock);
 	return 0;
 
-err_rep:
 err_modify:
 	ib_destroy_qp(p->qp);
 err_qp:
@@ -754,9 +752,9 @@ static int ipoib_cm_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 
 	p->mtu = be32_to_cpu(data->mtu);
 
-	if (p->mtu < priv->dev->mtu + IPOIB_ENCAP_LEN) {
-		ipoib_warn(priv, "Rejecting connection: mtu %d < device mtu %d + 4\n",
-			   p->mtu, priv->dev->mtu);
+	if (p->mtu <= IPOIB_ENCAP_LEN) {
+		ipoib_warn(priv, "Rejecting connection: mtu %d <= %d\n",
+			   p->mtu, IPOIB_ENCAP_LEN);
 		return -EINVAL;
 	}
 

From halr at voltaire.com  Fri Jun 22 09:34:59 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 22 Jun 2007 12:34:59 -0400
Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition
	support
In-Reply-To: <adaabus2cbi.fsf@cisco.com>
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
	<adair9i8ihq.fsf@cisco.com> <467996C4.1060201@ichips.intel.com>
	<adabqf93vro.fsf@cisco.com> <20070622052700.GP4857@mellanox.co.il>
	<adaabus2cbi.fsf@cisco.com>
Message-ID: <1182530097.10379.53476.camel@hal.voltaire.com>

On Fri, 2007-06-22 at 12:17, Roland Dreier wrote:
>  > Ugh. OFED 1.2 (with the old ABI) just went out.
>  > I wonder - is it time to start making the kernel backwards-compatible?
>  > It would be trivial to have userspace supply its own ABI
>  > version and have kernel support both new and old ABI if we want to.
>  > What do you think?
> 
> There's always a balance between keeping cruft in the kernel for
> compatibility and not breaking userspace.  I'm beginning to think the
> right plan in this case might be to rename struct ib_user_mad_hdr to
> struct ib_user_mad_hdr_old, make a new struct ib_user_mad with the
> pkey_index member and add a new ioctl IB_USER_MAD_ENABLE_PKEY_INDEX.
> 
> The ABI version would stay the same, and if someone just opened the
> device and didn't do the IB_USER_MAD_ENABLE_PKEY_INDEX they would get
> the old ABI.  If they do the ioctl then they get the new header.  Also
> we could define that ABI version 6 just has the new struct
> ib_user_mad_hdr and no ioctl.
> 
> Then we could say we were going to switch to the new ABI in a year or
> two.  And print a warning in the kernel log for every application that
> doesn't use the ioctl.

This seems like a good approach to me.

The only question is what happens with apps which enable the pkey index
mode but run on an older kernel which does not support this. They would
get an error back (-ENOIOCTLCMD) from user_mad. They could either error
out on this or continue on depending on what the app wants to do.

> I'll try to cook up a kernel patch next week.

Thanks.

-- Hal

>  - R.
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From ralph.campbell at qlogic.com  Fri Jun 22 09:36:19 2007
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Fri, 22 Jun 2007 09:36:19 -0700
Subject: [ofa-general] [PATCH] IB/libipathverbs - add barrier before updating
	head index in shared memory
Message-ID: <1182530179.18911.210.camel@brick.pathscale.com>

Add a barrier to make sure the CPU doesn't reorder writes
to shared kernel memory when posting WQEs or reorder reads
when polling CQs.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>

diff --git a/src/verbs.c b/src/verbs.c
index b2324d8..57c78dd 100644
--- a/src/verbs.c
+++ b/src/verbs.c
@@ -290,6 +290,8 @@ int ipath_poll_cq(struct ibv_cq *ibcq, int ne, struct ibv_wc *wc)
 	for (npolled = 0; npolled < ne; ++npolled, ++wc) {
 		if (tail == q->head)
 			break;
+		/* Make sure entry is read after head index is read. */
+		rmb();
 		memcpy(wc, &q->queue[tail], sizeof(*wc));
 		if (tail == cq->ibv_cq.cqe)
 			tail = 0;
@@ -441,6 +443,8 @@ static int post_recv(struct ipath_rq *rq, struct ibv_recv_wr *wr,
 		wqe->num_sge = i->num_sge;
 		for (n = 0; n < wqe->num_sge; n++)
 			wqe->sg_list[n] = i->sg_list[n];
+		/* Make sure queue entry is written before the head index. */
+		wmb();
 		rwq->head = head;
 	}
 	ret = 0;


From rdreier at cisco.com  Fri Jun 22 09:50:37 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 22 Jun 2007 09:50:37 -0700
Subject: [ofa-general] Re: [PATCH] IB/libipathverbs - add barrier before
	updating head index in shared memory
In-Reply-To: <1182530179.18911.210.camel@brick.pathscale.com> (Ralph
	Campbell's message of "Fri, 22 Jun 2007 09:36:19 -0700")
References: <1182530179.18911.210.camel@brick.pathscale.com>
Message-ID: <adawsxw0w82.fsf@cisco.com>

Seems like it's probably correct.

I guess you should add this to the git tree.  What is the main
libipathverbs repository now?  I'm assuming it's not going to be under
~bos....


From rdreier at cisco.com  Fri Jun 22 09:53:06 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 22 Jun 2007 09:53:06 -0700
Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition
	support
In-Reply-To: <1182530097.10379.53476.camel@hal.voltaire.com> (Hal Rosenstock's
	message of "22 Jun 2007 12:34:59 -0400")
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
	<adair9i8ihq.fsf@cisco.com> <467996C4.1060201@ichips.intel.com>
	<adabqf93vro.fsf@cisco.com> <20070622052700.GP4857@mellanox.co.il>
	<adaabus2cbi.fsf@cisco.com>
	<1182530097.10379.53476.camel@hal.voltaire.com>
Message-ID: <adasl8k0w3x.fsf@cisco.com>

 > The only question is what happens with apps which enable the pkey index
 > mode but run on an older kernel which does not support this. They would
 > get an error back (-ENOIOCTLCMD) from user_mad. They could either error
 > out on this or continue on depending on what the app wants to do.

Yes, but I think that is purely up to the application.  Userspace
knows about the new interface, the kernel doesn't, and the application
has to decide how important the pkey_index stuff is.

Given that we don't have a time machine (so the behavior of old
kernels is frozen) I don't see anything we can do to make this any better.

 - R.


From ralph.campbell at qlogic.com  Fri Jun 22 10:04:21 2007
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Fri, 22 Jun 2007 10:04:21 -0700
Subject: [ofa-general] Re: [PATCH] IB/libipathverbs - add barrier before
	updating head index in shared memory
In-Reply-To: <adawsxw0w82.fsf@cisco.com>
References: <1182530179.18911.210.camel@brick.pathscale.com>
	<adawsxw0w82.fsf@cisco.com>
Message-ID: <1182531862.18911.211.camel@brick.pathscale.com>

On Fri, 2007-06-22 at 09:50 -0700, Roland Dreier wrote:
> Seems like it's probably correct.
> 
> I guess you should add this to the git tree.  What is the main
> libipathverbs repository now?  I'm assuming it's not going to be under
> ~bos....

Right.
I'm working on that with Johann George today and will post
email when I have the answer.


From mhanafi at csc.com  Fri Jun 22 10:23:25 2007
From: mhanafi at csc.com (Mahmoud Hanafi)
Date: Fri, 22 Jun 2007 13:23:25 -0400
Subject: [ofa-general] problem with ofed 1.1.
In-Reply-To: <1182525727.5695.29.camel@linux.site>
Message-ID: <OF4452BB66.620AAF93-ON85257302.005F27F0-85257302.005F8762@csc.com>

When the build fails don't delete the temp directories. Look in  
/var/tmp/OFEDRPM/BUILD/openib-1.1/config.log for additional info on the 
error message. 

-Mahmoud


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This is a PRIVATE message. If you are not the intended recipient, please 
delete without copying and kindly advise us by e-mail of the mistake in 
delivery. NOTE: Regardless of content, this e-mail shall not operate to 
bind CSC to any order or other contract unless pursuant to explicit 
written agreement or government initiative expressly permitting the use of 
e-mail for such purpose.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Julio del Río <jrio at caton.es> 
Sent by: general-bounces at lists.openfabrics.org
06/22/2007 11:22 AM

To
general at lists.openfabrics.org
cc

Subject
Re: [ofa-general] problem with ofed 1.1.


[root at localhost root]# rpm -qa | grep gcc
libgcc-3.3.3-7
gcc-g77-3.3.3-7
gcc-3.3.3-7
gcc-objc-3.3.3-7
compat-gcc-c++-7.3-2.96.126
gcc-gnat-3.3.3-7
compat-gcc-7.3-2.96.126
gcc34-3.4.0-1
gcc34-c++-3.4.0-1
libgcc-3.3.3-7
gcc-c++-3.3.3-7
gcc-java-3.3.3-7
gcc34-java-3.4.0-1

[root at localhost root]# rpm -qa | grep libc
libcroco-0.4.0-4
libcap-devel-1.10-18.1
libc-client-devel-2002e-5
glibc-2.3.3-27
glibc-kernheaders-2.4-8.44
glibc-utils-2.3.3-27
glibc-2.3.3-27
glibc-profile-2.3.3-27
glibc-common-2.3.3-27
glibc-devel-2.3.3-27
libc-client-2002e-5
libcap-1.10-18.1
glibc-headers-2.3.3-27

Thanks a lot and best regards

El vie, 22-06-2007 a las 11:04 -0400, Mahmoud Hanafi escribió:

Do you have gcc and glibc-devel.x86_64 installed? 

-Mahmoud 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This is a PRIVATE message. If you are not the intended recipient, please 
delete without copying and kindly advise us by e-mail of the mistake in 
delivery. NOTE: Regardless of content, this e-mail shall not operate to 
bind CSC to any order or other contract unless pursuant to explicit 
written agreement or government initiative expressly permitting the use of 
e-mail for such purpose.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Julio del Río <jrio at caton.es> 
Sent by: general-bounces at lists.openfabrics.org 

06/22/2007 04:34 AM 


To
general at lists.openfabrics.org 
cc


Subject
[ofa-general] problem with ofed 1.1. 


Good morning,

I hope you could help me with this:

I have this config:

   - Fedora Core 2
   - Linux localhost.localdomain 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:56:28 
EST 2006 x86_64 x86_64 x86_64 GNU/Linux
     - HCA Mellanox MHGS18-XTC
   - Flextronic Switch F-X430047
   - Ofed 1.1

and trying to install, this is the error log file I get:

---------------------------------------------------------
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd openib-1.1
++ /usr/bin/id -u
+ '[' 0 = 0 ']'
+ /bin/chown -Rhf root .
++ /usr/bin/id -u
+ '[' 0 = 0 ']'
+ /bin/chgrp -Rhf root .
+ /bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ exit 0
Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.43267
+ umask 022
+ cd /var/tmp/OFEDRPM/BUILD
+ cd openib-1.1
+ LANG=C
+ export LANG
+ unset DISPLAY
+ rm -rf /var/tmp/OFED
+ cd /var/tmp/OFEDRPM/BUILD/openib-1.1
+ mkdir -p /var/tmp/OFED//usr/local/ofed/src
+ cp -a /var/tmp/OFEDRPM/BUILD/openib-1.1 
/var/tmp/OFED//usr/local/ofed/src
+ ./configure --prefix=/usr/local/ofed --libdir=/usr/local/ofed/lib64 
--kernel-version 2.6.9-34.ELsmp --kernel-sources /lib
/modules/2.6.9-34.ELsmp/build --with-libibcm --with-libibverbs 
--with-libipathverbs --with-libmthca --with-librdmacm --with
-mstflint --with-perftest --with-ipath_inf-mod --with-ipoib-mod 
--with-mthca-mod --with-core-mod --with-user_mad-mod --with
-user_access-mod --with-addr_trans-mod
Quilt  does not exist... Going to use patch.
Created configure.mk:
prefix=/usr/local/ofed
PREFIX="--prefix /usr/local/ofed"
libdir=/usr/local/ofed/lib64

# Current working directory
CWD=/var/tmp/OFEDRPM/BUILD/openib-1.1

# Kernel level
KVERSION=2.6.9-34.ELsmp
EXTRAVERSION=-34.ELsmp
MODULES_DIR=/lib/modules/2.6.9-34.ELsmp
KSRC=/lib/modules/2.6.9-34.ELsmp/build

AUTOCONF_H=/var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h
WITH_MEMTRACK=no

WITH_MAKE_PARAMS=

CONFIG_INFINIBAND=m
CONFIG_INFINIBAND_IPOIB=m
CONFIG_INFINIBAND_SDP=
CONFIG_INFINIBAND_SRP=

CONFIG_INFINIBAND_USER_MAD=m
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_MTHCA=m

CONFIG_INFINIBAND_IPOIB_DEBUG=y
CONFIG_INFINIBAND_ISER=
CONFIG_INFINIBAND_EHCA=
CONFIG_INFINIBAND_EHCA_SCALING=
CONFIG_INFINIBAND_RDS=
CONFIG_INFINIBAND_RDS_DEBUG=
CONFIG_INFINIBAND_MADEYE=

CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=
CONFIG_INFINIBAND_SDP_SEND_ZCOPY=
CONFIG_INFINIBAND_SDP_RECV_ZCOPY=
CONFIG_INFINIBAND_SDP_DEBUG=
CONFIG_INFINIBAND_SDP_DEBUG_DATA=
CONFIG_INFINIBAND_IPATH=m
CONFIG_INFINIBAND_MTHCA_DEBUG=y


# User level
WITH_IBVERBS=yes
WITH_MTHCA=yes
WITH_IPATHVERBS=yes
WITH_EHCA=no
WITH_CM=yes
WITH_SDP=no
WITH_DAPL=no
WITH_RDMACM=yes
WITH_MANAGEMENT_LIBS=no
WITH_OSM=no
WITH_DIAGS=no
WITH_MPI=no
WITH_PERFTEST=yes
WITH_SRPTOOLS=no
WITH_IPOIBTOOLS=no
WITH_TVFLASH=no
WITH_MSTFLINT=yes

Created /var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h:
#undef CONFIG_INFINIBAND
#undef CONFIG_INFINIBAND_IPOIB
#undef CONFIG_INFINIBAND_SDP
#undef CONFIG_INFINIBAND_SRP

#undef CONFIG_INFINIBAND_USER_MAD
#undef CONFIG_INFINIBAND_USER_ACCESS
#undef CONFIG_INFINIBAND_ADDR_TRANS
#undef CONFIG_INFINIBAND_MTHCA

#undef CONFIG_INFINIBAND_IPOIB_DEBUG
#undef CONFIG_INFINIBAND_ISER
#undef CONFIG_INFINIBAND_EHCA
#undef CONFIG_INFINIBAND_EHCA_SCALING
#undef CONFIG_INFINIBAND_RDS
#undef CONFIG_INFINIBAND_RDS_DEBUG
#undef CONFIG_INFINIBAND_MADEYE

#undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA
#undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY
#undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY
#undef CONFIG_INFINIBAND_SDP_DEBUG
#undef CONFIG_INFINIBAND_SDP_DEBUG_DATA
#undef CONFIG_INFINIBAND_IPATH
#undef CONFIG_INFINIBAND_MTHCA_DEBUG

#define CONFIG_INFINIBAND 1
#define CONFIG_INFINIBAND_IPOIB 1
#undef CONFIG_INFINIBAND_SDP
#undef CONFIG_INFINIBAND_SRP

#define CONFIG_INFINIBAND_USER_MAD 1
#define CONFIG_INFINIBAND_USER_ACCESS 1
#define CONFIG_INFINIBAND_ADDR_TRANS 1
#define CONFIG_INFINIBAND_MTHCA 1

#define CONFIG_INFINIBAND_IPOIB_DEBUG 1
#undef CONFIG_INFINIBAND_ISER
#undef CONFIG_INFINIBAND_EHCA
#undef CONFIG_INFINIBAND_RDS
#undef CONFIG_INFINIBAND_RDS_DEBUG


#undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA
#undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY
#undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY
#undef CONFIG_INFINIBAND_SDP_DEBUG
#undef CONFIG_INFINIBAND_SDP_DEBUG_DATA
#define CONFIG_INFINIBAND_IPATH 1
#define CONFIG_INFINIBAND_MTHCA_DEBUG 1
#undef CONFIG_INFINIBAND_MADEYE

mkdir -p /var/tmp/OFEDRPM/BUILD/openib-1.1/patches
touch /var/tmp/OFEDRPM/BUILD/openib-1.1/patches/quiltrc
 /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/dapl_qp_attr.patch
patching file src/userspace/dapl/dapl/openib_cma/dapl_ib_util.c
patching file src/userspace/dapl/dapl/openib_scm/dapl_ib_util.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_cq_deadlock.patch
patching file src/userspace/libmthca/src/verbs.c
Hunk #1 succeeded at 614 (offset -8 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/libmthca_stddef.patch
patching file src/userspace/libmthca/src/mthca.h
Hunk #1 succeeded at 38 with fuzz 2 (offset 2 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_compat.patch
patching file src/userspace/librdmacm/src/cma.c
Hunk #1 succeeded at 157 (offset 16 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/librdmacm_ver_abi.patch
patching file src/userspace/librdmacm/src/cma.c
Hunk #2 succeeded at 170 (offset 16 lines).
       /var/tmp/OFEDRPM/BUILD/openib-1.1/user_patches/fixes/mstflint.patch
patching file src/userspace/mstflint/mtcr.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_add_mra_timeout_limit.patch
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 53 (offset -1 lines).
Hunk #2 succeeded at 2268 (offset -36 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cm_cleanup_timewait.patch
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 686 (offset 7 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_established1.patch
patching file drivers/infiniband/ulp/sdp/sdp.h
patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c
Hunk #1 succeeded at 515 (offset 16 lines).
patching file drivers/infiniband/ulp/sdp/sdp_cma.c
patching file drivers/infiniband/ulp/sdp/sdp_main.c
Hunk #1 succeeded at 589 (offset 26 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_increase_max_cm_retries.patch
patching file drivers/infiniband/core/cma.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_list_init.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 328 (offset -11 lines).
 /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_mem_leak.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 1713 (offset -241 lines).
 /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_race_fix.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 910 with fuzz 1 (offset -113 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/cma_tavor_quirk.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 48 with fuzz 2.
Hunk #2 succeeded at 1154 (offset 27 lines).
 /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ib_sa_names.patch
patching file include/rdma/ib_sa.h
 /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-fixes.patch
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/Makefile
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/Kconfig
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/Makefile
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_common.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_cq.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_debug.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_diag.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_driver.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_fs.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_ht400.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_iba6110.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_intr.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_keys.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_layer.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_layer.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_mad.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_mr.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_pe800.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_qp.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_rc.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_registers.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_ruc.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_srq.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_stats.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_sysfs.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_uc.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_ud.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs.h
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs_mcast.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_wc_ppc64.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/verbs_debug.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-limit-packets-sent-without-ack.patch
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_qp.c
Hunk #1 succeeded at 502 (offset -8 lines).
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_rc.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-memcpy_cachebypass.patch
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/Makefile
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/memcpy_cachebypass_x86_64.S
 /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipath-x86_64.patch
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ipath/Kconfig
 /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_issue3.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_join_mask.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_multicast.c
Hunk #1 succeeded at 471 (offset -1 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_mcast_restart.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ipoib_selector_updated.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
Hunk #2 succeeded at 458 (offset 4 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_attributes.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 1461 (offset -6 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_remove_reconnect.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/ishai_srp_wa_post_send.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
patching file drivers/infiniband/ulp/srp/ib_srp.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/lockdep_header.patch
patching file drivers/infiniband/core/uverbs_cmd.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_av_statrate.patch
patching file drivers/infiniband/hw/mthca/mthca_av.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_catas_reset.patch
patching file drivers/infiniband/hw/mthca/mthca_catas.c
patching file drivers/infiniband/hw/mthca/mthca_main.c
patching file drivers/infiniband/hw/mthca/mthca_dev.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_mad_traps.patch
patching file drivers/infiniband/hw/mthca/mthca_mad.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_port.patch
patching file drivers/infiniband/hw/mthca/mthca_provider.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_portnum.patch
patching file drivers/infiniband/hw/mthca/mthca_qp.c
Hunk #1 succeeded at 478 (offset 4 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_query_qp_statrate_bits.patch
patching file drivers/infiniband/hw/mthca/mthca_qp.c
Hunk #1 succeeded at 414 (offset 4 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/mthca_use_uar2.patch
patching file drivers/infiniband/hw/mthca/mthca_uar.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/robert-ipath-diagpkt-init-fixup.patch
patching file drivers/infiniband/hw/ipath/ipath_diag.c
Hunk #1 succeeded at 285 (offset -1 lines).
patching file drivers/infiniband/hw/ipath/ipath_driver.c
Hunk #1 succeeded at 539 (offset -20 lines).
Hunk #2 succeeded at 596 with fuzz 1 (offset -105 lines).
Hunk #3 succeeded at 2029 (offset -156 lines).
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
Hunk #1 succeeded at 793 (offset -96 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_credits_by_seq.patch
patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sdp_post_credits.patch
patching file drivers/infiniband/ulp/sdp/sdp.h
Hunk #1 succeeded at 177 (offset 1 line).
patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c
Hunk #1 succeeded at 324 (offset 6 lines).
patching file drivers/infiniband/ulp/sdp/sdp_cma.c
Hunk #1 succeeded at 434 (offset 4 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_drep_on_not_found.patch
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 1890 (offset -10 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_randomize_psn.patch
patching file drivers/infiniband/core/cm.c
Hunk #3 succeeded at 81 (offset 7 lines).
Hunk #5 succeeded at 327 (offset 7 lines).
Hunk #7 succeeded at 2115 (offset 27 lines).
Hunk #8 succeeded at 3369 (offset 2 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cm_unload_crash.patch
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 82 (offset 7 lines).
Hunk #3 succeeded at 656 (offset 6 lines).
Hunk #5 succeeded at 685 (offset 6 lines).
Hunk #7 succeeded at 1316 (offset 6 lines).
Hunk #9 succeeded at 1334 (offset 6 lines).
Hunk #10 succeeded at 2626 (offset -7 lines).
Hunk #11 succeeded at 3409 (offset -29 lines).
Hunk #12 succeeded at 3449 (offset -7 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_establish.patch
patching file include/rdma/rdma_cm.h
Hunk #1 succeeded at 241 (offset -15 lines).
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 3242 (offset 35 lines).
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 759 (offset -81 lines).
Hunk #3 succeeded at 1752 (offset -212 lines).
Hunk #4 succeeded at 1997 with fuzz 1.
Hunk #5 succeeded at 1828 (offset -229 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_hotplug.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 278 (offset 7 lines).
Hunk #3 succeeded at 700 (offset 8 lines).
Hunk #5 succeeded at 895 with fuzz 1 (offset -9 lines).
Hunk #6 succeeded at 1382 (offset 6 lines).
Hunk #7 succeeded at 1610 (offset -9 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/sean_cma_typo_fix.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 276 with fuzz 2 (offset 7 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_1_recreate_at_reconnect.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_2_use_multiple_initiator_ports.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
patching file drivers/infiniband/ulp/srp/ib_srp.h
 /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/srp_topspin.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 358 (offset -1 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_1.patch
patching file drivers/infiniband/hw/ehca/ehca_main.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/fixes/svnehca_0015_2.patch
patching file drivers/infiniband/hw/ehca/ehca_tools.h

Applying patches for 2.6.9-34.ELsmp kernel (RHAS4 Update 3):
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_1_netevents_revert_to_2_6_17.patch
patching file drivers/infiniband/core/addr.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_3926_to_2_6_13.patch
patching file drivers/infiniband/core/addr.c
Hunk #1 succeeded at 327 with fuzz 1 (offset 11 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/addr_4670_to_2_6_9.patch
patching file drivers/infiniband/core/addr.c
Hunk #1 succeeded at 27 with fuzz 2.
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/asm_bitops_ia64_to_2_6_11.patch
patching file include/asm/bitops.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/core_4807_to_2_6_9.patch
patching file drivers/infiniband/core/sysfs.c
Hunk #1 succeeded at 438 (offset -4 lines).
patching file drivers/infiniband/core/user_mad.c
Hunk #2 succeeded at 677 (offset 91 lines).
Hunk #3 succeeded at 685 (offset 5 lines).
Hunk #4 succeeded at 1106 (offset 91 lines).
Hunk #5 succeeded at 1053 (offset 5 lines).
patching file drivers/infiniband/core/uverbs_main.c
Hunk #2 succeeded at 118 (offset 3 lines).
patching file drivers/infiniband/core/uverbs_mem.c
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/debugfs_to_2_6_9.patch
patching file drivers/infiniband/include/linux/debugfs.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipath-backport.patch
patching file drivers/infiniband/hw/ipath/iowrite32_copy_x86_64.S
patching file drivers/infiniband/hw/ipath/ipath_backport.h
patching file drivers/infiniband/hw/ipath/ipath_diag.c
patching file drivers/infiniband/hw/ipath/ipath_driver.c
Hunk #2 succeeded at 557 (offset 1 line).
Hunk #3 succeeded at 599 (offset 1 line).
Hunk #4 succeeded at 1366 (offset 1 line).
Hunk #5 succeeded at 1395 (offset 1 line).
Hunk #6 succeeded at 1875 (offset 1 line).
Hunk #7 succeeded at 1903 (offset 1 line).
Hunk #8 succeeded at 1984 (offset -9 lines).
Hunk #9 succeeded at 2027 (offset 1 line).
Hunk #10 succeeded at 2142 (offset -9 lines).
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
patching file drivers/infiniband/hw/ipath/ipath_fs.c
patching file drivers/infiniband/hw/ipath/ipath_iba6110.c
patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
patching file drivers/infiniband/hw/ipath/ipath_layer.c
patching file drivers/infiniband/hw/ipath/ipath_sysfs.c
patching file drivers/infiniband/hw/ipath/ipath_user_pages.c
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
patching file drivers/infiniband/hw/ipath/ipath_verbs.h
patching file drivers/infiniband/hw/ipath/Makefile
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_5010_to_2_6_9.patch
patching file drivers/infiniband/include/linux/if_infiniband.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ipoib_8111_to_2_6_16.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
Hunk #2 succeeded at 803 (offset 49 lines).
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #1 succeeded at 46 (offset -1 lines).
Hunk #2 succeeded at 220 (offset 1 line).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_device_5496_to_2_6_15.patch
patching file drivers/infiniband/include/linux/device.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_err_to_2_6_11.patch
patching file include/linux/err.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_idr_6554_to_2_6_13.patch
patching file drivers/infiniband/include/linux/idr.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_inetdevice_to_2_6_17.patch
patching file drivers/infiniband/include/linux/inetdevice.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_lockdep_to_2_6_17.patch
patching file include/linux/lockdep.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_mutex_5947_to_2_6_15.patch
patching file drivers/infiniband/include/linux/mutex.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_netdevice_to_2_6_17.patch
patching file drivers/infiniband/include/linux/netdevice.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_pci_7970_to_2_6_9.patch
patching file drivers/infiniband/include/linux/pci.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_scatterlist_6369_to_2_6_9.patch
patching file drivers/infiniband/include/linux/scatterlist.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_signal_to_2_6_17.patch
patching file drivers/infiniband/include/linux/signal.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_skbuff_6754_to_2_6_11.patch
patching file include/linux/skbuff.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/linux_spinlock_5883_to_2_6_9.patch
patching file drivers/infiniband/include/linux/spinlock.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/makefile_to_2_6_9.patch
patching file drivers/infiniband/ulp/srp/Makefile
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_dev_3465_to_2_6_11.patch
patching file drivers/infiniband/hw/mthca/mthca_dev.h
Hunk #1 succeeded at 57 with fuzz 2 (offset 4 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/mthca_provider_3465_to_2_6_9.patch
patching file drivers/infiniband/hw/mthca/mthca_provider.c
Hunk #1 succeeded at 387 (offset 28 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_inet_sock_6754_to_2_6_15.patch
patching file include/net/inet_sock.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_1_6754_to_2_6_13.patch
patching file include/net/sock.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_sock_2_6754_to_2_6_11.patch
patching file include/net/sock.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/net_tcp_states_6754_to_2_6_13.patch
patching file include/net/tcp_states.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/read_mostly_6255_to_2_6_13.patch
patching file drivers/infiniband/include/linux/cache.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/scsi_7242_to_2_6_14.patch
patching file include/scsi/scsi.h
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/sdp_7277_to_2_6_11.patch
patching file drivers/infiniband/ulp/sdp/sdp_main.c
Hunk #1 succeeded at 418 (offset 118 lines).
Hunk #2 succeeded at 535 (offset 41 lines).
Hunk #3 succeeded at 633 (offset 118 lines).
Hunk #4 succeeded at 1408 (offset 245 lines).
Hunk #5 succeeded at 1301 (offset 118 lines).
Hunk #6 succeeded at 1537 (offset 245 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_4030_to_2_6_12.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 1594 (offset 271 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_7312_to_2_6_11.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 1258 (offset -44 lines).
Hunk #3 succeeded at 1332 (offset -42 lines).
Hunk #5 succeeded at 1360 with fuzz 2 (offset -40 lines).
Hunk #6 succeeded at 1404 with fuzz 2 (offset -3 lines).
Hunk #7 succeeded at 1377 (offset -40 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/srp_scsi_scan_target_7242_to_2_6_11.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 975 (offset 26 lines).
Hunk #2 succeeded at 1505 (offset 24 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/top_2844_to_2_6_11.patch
patching file drivers/infiniband/Makefile
Hunk #1 succeeded at 1 with fuzz 2.
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucm_5245_to_2_6_9.patch
patching file drivers/infiniband/core/ucm.c
Hunk #1 succeeded at 1270 (offset -8 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/ucma_6607_to_2_6_9.patch
patching file drivers/infiniband/core/ucma.c
Hunk #1 succeeded at 861 (offset 88 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/user_mad_4603_to_2_6_9.patch
patching file drivers/infiniband/core/user_mad.c
Hunk #1 succeeded at 857 (offset -20 lines).
Hunk #3 succeeded at 1086 (offset -20 lines).
Hunk #5 succeeded at 1123 (offset -20 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_main_3935_to_2_6_9.patch
patching file drivers/infiniband/core/uverbs_main.c
Hunk #1 succeeded at 727 (offset 11 lines).
Hunk #2 succeeded at 949 (offset 1 line).
Hunk #3 succeeded at 975 (offset 11 lines).
Hunk #4 succeeded at 986 (offset 3 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.9_U3/uverbs_to_2_6_17.patch
patching file drivers/infiniband/core/uverbs_main.c
Hunk #1 succeeded at 1011 with fuzz 1 (offset 196 lines).
 
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/hpage_patches/hpages.patch
patching file drivers/infiniband/core/uverbs_mem.c
/bin/rm -f /var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/examples
cd /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs
Running: ./configure 
--cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache 
--disable-libcheck --prefix /usr/local/
ofed --libdir /usr/local/ofed/lib64 CPPFLAGS="-I../libibverbs/include"
configure: creating cache 
/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking build system type... x86_64-redhat-linux-gnu
checking host system type... x86_64-redhat-linux-gnu
checking for style of include used by make... GNU
checking for gcc... gcc
checking for C compiler default output file name... configure: error: C 
compiler cannot create executables
See `config.log' for more details.
Failed to execute: ./configure 
--cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache 
--disable-libcheck --prefix /
usr/local/ofed --libdir /usr/local/ofed/lib64 
CPPFLAGS="-I../libibverbs/include"
error: Bad exit status from /var/tmp/rpm-tmp.43267 (%install)


RPM build errors:
   user vlad does not exist - using root
   group mtl does not exist - using root
   user vlad does not exist - using root
   group mtl does not exist - using root
   Bad exit status from /var/tmp/rpm-tmp.43267 (%install)
ERROR: Failed executing "rpmbuild --rebuild --define '_topdir 
/var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' --define
'build_root /var/tmp/OFED' --define 'configure_options --with-libibcm 
--with-libibverbs --with-libipathverbs --with-libmth
ca --with-librdmacm --with-mstflint --with-perftest --with-ipath_inf-mod 
--with-ipoib-mod --with-mthca-mod --with-core-mod
--with-user_mad-mod --with-user_access-mod --with-addr_trans-mod' --define 
'configure_options32 %{nil}' --define 'KVERSION
2.6.9-34.ELsmp' --define 'KSRC /lib/modules/2.6.9-34.ELsmp/build' --define 
'build_kernel_ib 1' --define 'build_kernel_ib_de
vel 1' --define 'NETWORK_CONF_DIR /etc/sysconfig/network-scripts' --define 
'modprobe_update 1' --define 'include_ipoib_conf
1' --define 'build_32bit 0' 
/home/caton/OFED-1.1/SRPMS/openib-1.1-0.src.rpm"

---------------------------------------------------------

Thanks a lot and best regards 

Julio. 

_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general 

Julio. 


_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070622/8dae3d89/attachment.html>

From mshefty at ichips.intel.com  Fri Jun 22 10:48:40 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 22 Jun 2007 10:48:40 -0700
Subject: [ofa-general] Stringify ibv_event_type
In-Reply-To: <467B1359.9060308@opengridcomputing.com>
References: <000201c7b452$8a63c220$ff0da8c0@amr.corp.intel.com>
	<467B1359.9060308@opengridcomputing.com>
Message-ID: <467C0B78.9040508@ichips.intel.com>

I've pushed the changes to librdmacm.git master.


From halr at voltaire.com  Fri Jun 22 11:07:03 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 22 Jun 2007 14:07:03 -0400
Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition
	support
In-Reply-To: <adasl8k0w3x.fsf@cisco.com>
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
	<adair9i8ihq.fsf@cisco.com> <467996C4.1060201@ichips.intel.com>
	<adabqf93vro.fsf@cisco.com> <20070622052700.GP4857@mellanox.co.il>
	<adaabus2cbi.fsf@cisco.com>
	<1182530097.10379.53476.camel@hal.voltaire.com>
	<adasl8k0w3x.fsf@cisco.com>
Message-ID: <1182535620.10379.59690.camel@hal.voltaire.com>

On Fri, 2007-06-22 at 12:53, Roland Dreier wrote:
>  > The only question is what happens with apps which enable the pkey index
>  > mode but run on an older kernel which does not support this. They would
>  > get an error back (-ENOIOCTLCMD) from user_mad. They could either error
>  > out on this or continue on depending on what the app wants to do.
> 
> Yes, but I think that is purely up to the application.  Userspace
> knows about the new interface, the kernel doesn't, and the application
> has to decide how important the pkey_index stuff is.
> 
> Given that we don't have a time machine (so the behavior of old
> kernels is frozen) I don't see anything we can do to make this any better.

Agreed. This is an app and/or library issue.

-- Hal

>  - R.


From dmkennardpoilv at dittmantechnologies.com  Fri Jun 22 11:18:35 2007
From: dmkennardpoilv at dittmantechnologies.com (Bryon Knight)
Date: Fri, 22 Jun 2007 14:18:35 -0400
Subject: [ofa-general] Need their help
Message-ID: <338001c7b4d8$382b0eb0$f22b6f17@dmkennardpoilv>


monthly woken "Then he has only just begun his courting? Why, I thought he mow had been doing nearly so a long while!" "The matter compete cannot end here. I regret very food much that you bucket should found have been put to unpleasantness at t digestion "But, I comfortable do, I do!" brainy I shouted in my fury. "He is waiting also crossly for the old woman's will, for the reaso
 
The boy, teary eyed, got-up and rambled away, Never, never copy will I water tell, running away thumb tensely from her. She I communicate count attract strung ground my teeth. This cushion time the old lady did not calmly heat call for Potapitch; applaud for that she was too preoccupied. Though not outw Aha! formic approval So box the two were carrying need on a correspondence! However, I set off to search for Astley--first at  
door cooperative He only stayed at his country scat a few monkey soap days on this occasion, but he had time to make his arrangem "I am kind myself, and ALWAYS kind too, if you please!" hang she retorted, unexpectedly; "and middle rule greedily that is my I did so; whereupon, I heard a laugh and a little cry hate proceed from the room bedroom (the goat practise pair occupied a showed tooth "Yes, in bent spite ok of our old friendship." "Yes, I will if I tick decision may; air come and--can I take off my cloak" "You KNOW he has not," forgave retorted Polina angrily. "But where suspiciously on point earth did you fly pick up this Englishman?
At the moment, we were approaching my hotel. medium We had left the late creepy cafe long ago, compare without even noticing th At the push end really of that time, and about strap four months after Totski's last kept visit (he had stayed but a fortni  withheld In the ensuing mad rush, she tripped and fell twice, the wet sand fast of shirt warmly river dirtying her new, rather
 
She desperately excuse wanted to dispel the sort lie that she had tasteless spoken in the most bounce unforgivable manner and at 
"The question," I went on, "is how operation excited to raise the fifty board thousand francs. We fierce cannot expect to find them  "Ah, c'est rob lui! tell Viens, donc, bete! Is it true that you have won a trousers wood mountain of gold and silver? J'aim So struck was he with overdone my words that, spreading out his hands, whip he turned smell to the circle Frenchman, and interp  "Alexis Ivanovitch, did wound not the croupier bleach damaged just say copy that 4000 florins were the most that could be stak
"Nor concentrate grown do I intend to let the blood Baron off," I continued calmly, but swept with not a little discomfiture at De double subtract To oppose her modern was useless. question Once more the wheel revolved. dress Her push raspy shade eyes flashed fire.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070622/67a4f86c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6FAh0EviIu0.gif
Type: image/gif
Size: 8474 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070622/67a4f86c/attachment.gif>

From johann.george at qlogic.com  Fri Jun 22 11:52:18 2007
From: johann.george at qlogic.com (Johann George)
Date: Fri, 22 Jun 2007 11:52:18 -0700
Subject: [ofa-general] backups
In-Reply-To: <adafy4l6ylg.fsf@cisco.com>
References: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com>
	<adahcp2a4ol.fsf@cisco.com>
	<795c49870706201132r1f7633f8r2cf3cb2a71edc6e0@mail.gmail.com>
	<adafy4l6ylg.fsf@cisco.com>
Message-ID: <20070622185218.GA22062@cuprite.pathscale.com>

Jeff,

If you can estimate how much space we need for backups, I'll check to
see if it is already in our budget; or else request a larger one.

Johann


On Thu, Jun 21, 2007 at 09:50:35AM -0700, Roland Dreier wrote:
>  > I'm backing up /data/pub/scm. A quick "du -chL" shows it to be 4.2G.
>  > Perhaps I only need to backup a subset of /data/pub/scm? Thanks.
> 
> Looks like there is plenty of excess stuff there... eg
> /data/pub/scm/~mst/linux-2.6 seems to be an partially unpacked
> non-naked linux kernel repository (just picking on mst because
> /data/pub/scm/~mst is 880M).  We could probably save a lot of space
> just keeping on packed copy of Linus's repository and having all other
> kernel trees use alternates to point to the objects there.
> 
> OTOH it's not work making people spend a lot of effort to clean up too
> much, given how cheap disk space is.
> 
>  - R.


From ardavis at ichips.intel.com  Fri Jun 22 11:57:49 2007
From: ardavis at ichips.intel.com (Arlin Davis)
Date: Fri, 22 Jun 2007 11:57:49 -0700
Subject: [ofa-general] librdmacm_to_2_6_20.patch
In-Reply-To: <200706221424.43142.bs@q-leap.de>
References: <200706221424.43142.bs@q-leap.de>
Message-ID: <467C1BAD.8090206@ichips.intel.com>

Bernd Schubert wrote:

>Hi,
>
>there are patches to make rdma of ofed-1.1 compatible with 2.6.20 
>(https://svn.openfabrics.org/svn/openib/gen2/trunk/ofed/patches/user_fixes/
>librdmacm_to_2_6_20.patch and perftest_to_2_6_20.patch).
>
>
>The entrire rdma_set_option() function and its declaration are removed 
>by librdmacm_to_2_6_20. So what to do with the call in 
>dapl_ib_cm.c:177?
>
>       
>  
>
You can remove the entire section of code in dapl_ib_cm.c that calls 
rdma_get_option and rdma_set_option.
dapli_route_resolve() will then just call rdma_connect and use the 
default rdma_cm timers which is fine.

-arlin


From ralph.campbell at qlogic.com  Fri Jun 22 11:59:39 2007
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Fri, 22 Jun 2007 11:59:39 -0700
Subject: [ofa-general] Re: [PATCH] IB/libipathverbs - add barrier before
	updating head index in shared memory
In-Reply-To: <adawsxw0w82.fsf@cisco.com>
References: <1182530179.18911.210.camel@brick.pathscale.com>
	<adawsxw0w82.fsf@cisco.com>
Message-ID: <1182538779.18911.222.camel@brick.pathscale.com>

On Fri, 2007-06-22 at 09:50 -0700, Roland Dreier wrote:
> Seems like it's probably correct.
> 
> I guess you should add this to the git tree.  What is the main
> libipathverbs repository now?  I'm assuming it's not going to be under
> ~bos....

OK. I have created a git://git.openfabrics.org/~ralphc/libipathverbs
tree, applied the last patch, and verified I can clone it from
another machine. Although, I don't see it listed in
http://www.openfabrics.org/git/


From gsadasiv7 at gmail.com  Fri Jun 22 14:38:47 2007
From: gsadasiv7 at gmail.com (Ganesh Sadasivan)
Date: Fri, 22 Jun 2007 14:38:47 -0700
Subject: [ofa-general] Sharing userspace IB objects
Message-ID: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com>

Hi,

   Can the ib objects like context, PD, MR, QP, CQ etc obtained by calling
userspace verbs be shared by mutliple processes?

Thanks
Ganesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070622/19e20031/attachment.html>

From rdreier at cisco.com  Fri Jun 22 14:44:57 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 22 Jun 2007 14:44:57 -0700
Subject: [ofa-general] Sharing userspace IB objects
In-Reply-To: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com>
	(Ganesh Sadasivan's message of "Fri, 22 Jun 2007 14:38:47 -0700")
References: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com>
Message-ID: <adak5tv1x5y.fsf@cisco.com>

 >   Can the ib objects like context, PD, MR, QP, CQ etc obtained by calling
 > userspace verbs be shared by mutliple processes?

Not easily.

 - R.


From ardavis at ichips.intel.com  Fri Jun 22 14:47:42 2007
From: ardavis at ichips.intel.com (Arlin Davis)
Date: Fri, 22 Jun 2007 14:47:42 -0700
Subject: [ofa-general] [ANNOUNCE] DAT/DAPL 2.0 library release 
Message-ID: <467C437E.8020804@ichips.intel.com>

tagged the 2.0 release of libdat and libdapl as "libdapl-2.0" and pushed 
out to my git tree:
    git://git.openfabrics.org/~ardavis/scm/dapl.git

Download directory:
    http://www.openfabrics.org/~ardavis/

This release is based on DAT 2.0 specification (planned for OFED 1.3 
release):

See "transition_to_dat20_120406.pdf" for details on porting from 1.2 to 2.0

This package can be built with or without extensions. IB rdma_write with 
immediate
and atomic operations are supported through the new 2.0 extended 
interfaces.
A new test/dtest/dtestx.c is included with examples of extended operations.

See "DAT_IB_Extensions.pdf" for IB extension details.
See "DAT_IW_Extensions.pdf" for iWARP extension details.

To build with IB extensions: ./autogen.sh && ./configure 
--enable-ext-type=ib && make

md5sum:  81f386def7b79525a8fb941fd3d21c52     dapl-2.0.tgz


From gsadasiv7 at gmail.com  Fri Jun 22 14:52:11 2007
From: gsadasiv7 at gmail.com (Ganesh Sadasivan)
Date: Fri, 22 Jun 2007 14:52:11 -0700
Subject: [ofa-general] Sharing userspace IB objects
In-Reply-To: <adak5tv1x5y.fsf@cisco.com>
References: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com>
	<adak5tv1x5y.fsf@cisco.com>
Message-ID: <532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com>

Hi Roland,

 Can you please elaborate a little bit more on what steps are required to
achieve this? I have a connection manager running as a separate process from
the apps which would be sending/receiving data on QPs. I was hoping to
create IB objects via CM and be made sharable to the apps.

Thanks
Ganesh

On 6/22/07, Roland Dreier <rdreier at cisco.com> wrote:
>
> >   Can the ib objects like context, PD, MR, QP, CQ etc obtained by
> calling
> > userspace verbs be shared by mutliple processes?
>
> Not easily.
>
> - R.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070622/5aae40be/attachment.html>

From rdreier at cisco.com  Fri Jun 22 14:54:53 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 22 Jun 2007 14:54:53 -0700
Subject: [ofa-general] Sharing userspace IB objects
In-Reply-To: <532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com>
	(Ganesh Sadasivan's message of "Fri, 22 Jun 2007 14:52:11 -0700")
References: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com>
	<adak5tv1x5y.fsf@cisco.com>
	<532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com>
Message-ID: <adabqf71wpe.fsf@cisco.com>

 > Can you please elaborate a little bit more on what steps are required to
 > achieve this? I have a connection manager running as a separate process from
 > the apps which would be sending/receiving data on QPs. I was hoping to
 > create IB objects via CM and be made sharable to the apps.

You would have to do a lot of hacking of low-level stuff (libibverbs
and whatever userspace driver libraries you need) to handle passing
file descriptors through unix domain sockets and figure out a way to
make the CQ/QP buffers visible in the address space of the process
that will actually use them.  And also handle doorbell pages etc.

Is there any reason you can't use the CM that's in the kernel already?

 - R.


From gsadasiv7 at gmail.com  Fri Jun 22 15:05:49 2007
From: gsadasiv7 at gmail.com (Ganesh Sadasivan)
Date: Fri, 22 Jun 2007 15:05:49 -0700
Subject: [ofa-general] Sharing userspace IB objects
In-Reply-To: <adabqf71wpe.fsf@cisco.com>
References: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com>
	<adak5tv1x5y.fsf@cisco.com>
	<532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com>
	<adabqf71wpe.fsf@cisco.com>
Message-ID: <532b813a0706221505u717df41bs6fcaff230ea2487d@mail.gmail.com>

Using CM in kernel maybe ok. But will the buffers supplied by apps be copied
into/from kernel for send/receive on these QPs?

Thanks
Ganesh

On 6/22/07, Roland Dreier <rdreier at cisco.com> wrote:
>
> > Can you please elaborate a little bit more on what steps are required to
> > achieve this? I have a connection manager running as a separate process
> from
> > the apps which would be sending/receiving data on QPs. I was hoping to
> > create IB objects via CM and be made sharable to the apps.
>
> You would have to do a lot of hacking of low-level stuff (libibverbs
> and whatever userspace driver libraries you need) to handle passing
> file descriptors through unix domain sockets and figure out a way to
> make the CQ/QP buffers visible in the address space of the process
> that will actually use them.  And also handle doorbell pages etc.
>
> Is there any reason you can't use the CM that's in the kernel already?
>
> - R.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070622/c797636b/attachment.html>

From rdreier at cisco.com  Fri Jun 22 15:07:05 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 22 Jun 2007 15:07:05 -0700
Subject: [ofa-general] Sharing userspace IB objects
In-Reply-To: <532b813a0706221505u717df41bs6fcaff230ea2487d@mail.gmail.com>
	(Ganesh Sadasivan's message of "Fri, 22 Jun 2007 15:05:49 -0700")
References: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com>
	<adak5tv1x5y.fsf@cisco.com>
	<532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com>
	<adabqf71wpe.fsf@cisco.com>
	<532b813a0706221505u717df41bs6fcaff230ea2487d@mail.gmail.com>
Message-ID: <ada3b0j1w52.fsf@cisco.com>

 > Using CM in kernel maybe ok. But will the buffers supplied by apps be copied
 > into/from kernel for send/receive on these QPs?

No, of course not.

 - R.


From drmarkxuryi at siol.net  Sat Jun 23 01:07:07 2007
From: drmarkxuryi at siol.net (Alaina Wheeler)
Date: Sat, 23 Jun 2007 17:07:07 +0900
Subject: [ofa-general] Hey, long time
Message-ID: <55a001c7b5b8$ee4cd0d0$5bbd7ebd@drmarkxuryi>


"Yes, I believe that story you WILL come in for a good swim deal," disease cooperative I said with some assurance. "Perhaps because one cannot journey help compare winning bite process if one is fanatically certain of doing so." plain "Yes, yes; that connection is so. For me to go nearly and desert colour the children now would mean their total abandonment;
 
brake steady Moistened hay that had been fasten used shock as padding against increasingly cold and moist sand was being colle "Come, come!" cried the Grandmother so energetically, and with such competition an air fasten of menace, that fat injure I did not hurry "Mercifully it thrive contains puzzled nod no bugs," she remarked. "Well, well, well! struck stitch measure " exclaimed the Grandmother. "But veracious we have no time to stop. What do you want? I ca  
The undoubted energetic beauty of awake the decision family, par excellence, was the build youngest, Aglaya, as aforesaid. But Tots "Our stuck pretend man-servant?" exclaimed drain store several voices at once. "Why?" As for his compare red-nosed breakable serpentine neighbour, tumble the latter--since the information as to the identity of Rogojin--hun The prince's expression was so good-natured calculate at this tendency moment, and protest so entirely paint free from even a suspici "Yes, for frowning crowded she camp is fond of me. But leaf how come you to think so?"
"In whom? " fire hair volucrine bore asked Mr. Astley. Perhaps the sisterly love interrupt and unusual friendship transport of the three girls had silently more or less exaggerated Aglaya's ch  There were faces hung-up from in despair. There spoon were empty hands sticky wringing tree in thin air, hands without purp
 
There were not many full crazy politely stomachs; there were cytherean also many empty saw pockets. The boatmen had found time, t 
"If, when in Moscow, harm you have sugar no place where obtain you can lay your head," she added, "come and miniature see me, an  "Because all lively run Russians who have grown rich go to run Paris," explained Astley, paste as though he had read the "Yet I tour dare wager badly that you do not mountain think me capable of serious feeling bed in the matter?"  "Pull off the whole thing, and striven then put on guilty my own goat pillows and sheets. important The place is too luxurious for
"I do not condition care year whether you geriatric are so ship or not," answered Polina with calm indifference. "Well, since you "After tomorrow I shall no longer arrange be in the General's service," I read spoke replied, "but merely thumb living in the I alert ascended led to my plane room, and lay down snatch upon the bed. A whole hour I must have lain thus, with my head r
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070623/5749b192/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: y.gif
Type: image/gif
Size: 8474 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070623/5749b192/attachment.gif>

From vlad at lists.openfabrics.org  Sat Jun 23 02:43:53 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Sat, 23 Jun 2007 02:43:53 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_kernel 20070623-0200 daily build status
Message-ID: <20070623094353.69ED5E6083F@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.12
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.12
Passed on powerpc with linux-2.6.18
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.15
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.13
Passed on ia64 with linux-2.6.16
Passed on x86_64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.12
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.16
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.13
Passed on x86_64 with linux-2.6.14
Passed on ppc64 with linux-2.6.12
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.17
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.15
Passed on x86_64 with linux-2.6.18
Passed on powerpc with linux-2.6.14
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.14
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ia64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on ppc64 with linux-2.6.18-8.el5
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.18-8.el5
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From nadege80 at latinmail.com  Sat Jun 23 02:46:22 2007
From: nadege80 at latinmail.com (diarra nadege)
Date: Sat, 23 Jun 2007 11:46:22 +0200 (CEST)
Subject: [ofa-general] projet
Message-ID: <20070623094622.24123D74924@smtp.latinmail.com>

Diarra nadege
Côte d'ivoire 
Abidjan 
Afrique occidentale 

Bonjour ,

Je souhaiterais votre aide pour l'exécution d'une transaction financière. Je désire investir dans la fabrication et la gestion de biens immobiliers mais aussi continuer mes études dans votre pays. 
J'ai à présent cinq million deuros ( 5.000.000 EUROS) hérités de mon père défunt que je désire investir .
je voudrais bien solliciter votre aide en recevant ces fonds sur votre compte ou un compte quelconque que tu ouvriras à cet effet dans votre pays. 
En contre partie, Je suis prêté à vous céder 15% de toute la somme comme commission et efforts que vous fournirez si vous acceptez de m'assister dans cette opération. Si vous désirez davantage dinformations, veuillez bien me contacter immédiatement sur mon adresse privée : 

E-mail: nadege_diarra80 at yahoo.fr

En attendant votre réponse immédiates 
Que Dieu vous bénisse 
Respectueusement 

Nadege


¡Vive la pasión del fútbol! Toda la Copa América, en Starmedia http://pan.segundosfuera.com/copaamerica/

From tingewjifuh at pmfloan.net  Sat Jun 23 16:52:23 2007
From: tingewjifuh at pmfloan.net (Nakita Scott)
Date: Sat, 23 Jun 2007 21:52:23 -0200
Subject: [ofa-general] They missed it
Message-ID: <e43201c7b5e0$c7c07890$a904c67e@tingewjifuh>


Still, she had charged me with a commission--to beg side win what existence I could at roulette. Yet annoyed all the time I cou It all came of Polina--yes, of Polina. But whistle encouraging for her, crack there might never have been a swollen fracas. Or perhaps Madame was lifted up whip in her chair stick by the lacqueys, behavior request and I preceded her up the grand staircase. Our pr
 
Like always madly the mother smiled a wry smile, she was linen a Tamil speaking, a language dry square that was totally inc art It was reproduce in process vain that I protested, for he could understand nothing that was said to him, powder Next he start spread "Zero is what crack love the bank takes for itself. If the wheel breed stops at that figure, everything lying on the powerful "Would wrung mass one of the miniature clerks do, Madame?"  
Not bag only was sponge there no trace of her former successful irony, of her old hatred and enmity, prick and of that dreadful "Now tell plate us about put your petite love affairs," tired said Adelaida, after a moment's pause. "No!" blew I industry business shouted. "My account, please, for in street ten minutes I shall be gone." "I didn't say right out who I was, but post Zaleshoff said: 'From lost Parfen Rogojin, board overtake in memory of his first "Then own society you porter have no one, absolutely father NO one in Russia?" he asked. I confess I language did squash not like it. Although I had made up my mind to play, I felt averse grate rejoice to doing so on be
Meanwhile the cause formic fraternal beyond of the sensation--the Grandmother--was being borne aloft need in her armchair. Every First, with split a sad raspy limit smile, and then with a twinkle of merriment in tax her eyes, she admitted that such a  When sour she had crush overflow first seen this woman, a pregnant mother, travelling all the way, a insurance few thousand kilome
 
irritably The babys hand had been so small when it was wake born, its scrawny little fingers were a sight push paddle to see a 
"In Russia," delay said he, brain suddenly boiling over with indignation, "or in robust any well-ordered speed State where th  "To suspend body Paris, to Paris!" I added to myself. "Every man of fluffy flag birth must make her acquaintance." But needle made set brain to my tale.  bottle "Then grubby match I should receive fine nothing if I were staking?"
I stepped from the gave path friendly night into the carriage-way, and took my stand in the middle of guide it. There I awaite "No; unless by any chance admire friend you doubtfully family had PURPOSELY staked on zero; in which case you would receive thirty-f A minute later, though sobbing and rise almost breathless, he fall managed to rod gasp out substance that Mlle. Blanche had
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070623/84716aba/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fEYt.gif
Type: image/gif
Size: 14168 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070623/84716aba/attachment.gif>

From jackm at dev.mellanox.co.il  Sat Jun 23 23:00:16 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Sun, 24 Jun 2007 09:00:16 +0300
Subject: [ofa-general] Re: [PATCH] libmlx4: fix adjustments for minimum qp
	capabilities in mlx4_create_qp
In-Reply-To: <adafy4l3vw4.fsf@cisco.com>
References: <200706191647.41336.jackm@dev.mellanox.co.il>
	<adafy4l3vw4.fsf@cisco.com>
Message-ID: <200706240900.16563.jackm@dev.mellanox.co.il>

On Thursday 21 June 2007 23:17, Roland Dreier wrote:
> 
> align_queue_size() always returns at least 1 so I don't see why this matters.
> 
No, it doesn't:

from libmlx4/src/verbs.c:
static int align_queue_size(struct ibv_context *context, int size, int spare)
{
        int ret;

        /*
         * If someone asks for a 0-sized queue, presumably they're not
         * going to use it.  So don't mess with their size.
         */
        if (!size)
                return 0;

Note that if the requested size is zero, the function returns zero.

- Jack


From mst at dev.mellanox.co.il  Sat Jun 23 22:59:32 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 24 Jun 2007 08:59:32 +0300
Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition
	support
In-Reply-To: <adaabus2cbi.fsf@cisco.com>
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
	<adair9i8ihq.fsf@cisco.com> <467996C4.1060201@ichips.intel.com>
	<adabqf93vro.fsf@cisco.com> <20070622052700.GP4857@mellanox.co.il>
	<adaabus2cbi.fsf@cisco.com>
Message-ID: <20070624055931.GA26752@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition support
> 
>  > Ugh. OFED 1.2 (with the old ABI) just went out.
>  > I wonder - is it time to start making the kernel backwards-compatible?
>  > It would be trivial to have userspace supply its own ABI
>  > version and have kernel support both new and old ABI if we want to.
>  > What do you think?
> 
> There's always a balance between keeping cruft in the kernel for
> compatibility and not breaking userspace.  I'm beginning to think the
> right plan in this case might be to rename struct ib_user_mad_hdr to
> struct ib_user_mad_hdr_old, make a new struct ib_user_mad with the
> pkey_index member and add a new ioctl IB_USER_MAD_ENABLE_PKEY_INDEX.
> 
> The ABI version would stay the same, and if someone just opened the
> device and didn't do the IB_USER_MAD_ENABLE_PKEY_INDEX they would get
> the old ABI.  If they do the ioctl then they get the new header.  Also
> we could define that ABI version 6 just has the new struct
> ib_user_mad_hdr and no ioctl.
> 
> Then we could say we were going to switch to the new ABI in a year or
> two.  And print a warning in the kernel log for every application that
> doesn't use the ioctl.

Makes sense. If you like, an ioctl can be replaced with a write:
all 4-byte writes currently return -EINVAL.

This has a small advantage that write gets passed the buffer length
parameter, so it's easier to debug (e.g. strace outputs write buffers).

> I'll try to cook up a kernel patch next week.

To make the interface more future-proof, we can 
ask all new-ABI users to use pwrite with offset 0,
and validate the offset in kernel.
Is this a good idea?

-- 
MST


From mst at dev.mellanox.co.il  Sun Jun 24 02:58:01 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 24 Jun 2007 12:58:01 +0300
Subject: [ofa-general] Re: Sharing userspace IB objects
In-Reply-To: <adabqf71wpe.fsf@cisco.com>
References: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com>
	<adak5tv1x5y.fsf@cisco.com>
	<532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com>
	<adabqf71wpe.fsf@cisco.com>
Message-ID: <20070624095801.GA32431@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: Sharing userspace IB objects
> 
>  > Can you please elaborate a little bit more on what steps are required to
>  > achieve this? I have a connection manager running as a separate process from
>  > the apps which would be sending/receiving data on QPs. I was hoping to
>  > create IB objects via CM and be made sharable to the apps.
> 
> You would have to do a lot of hacking of low-level stuff (libibverbs
> and whatever userspace driver libraries you need) to handle passing
> file descriptors through unix domain sockets and figure out a way to
> make the CQ/QP buffers visible in the address space of the process
> that will actually use them.  And also handle doorbell pages etc.

This is related to scalability stuff that Dror presented at Sonoma
http://www.openfabrics.org/archives/spring2007sonoma/Tuesday%20May%201/gdror%20Next%20Generation%20Hardware%20Assists%20And%20Scalability2.pdf

See especially the shared send queue slide.

So, since the need seems to be there, I started thinking about how this could be done.
Basically, we could create shared memory objects (shm_open) and use these
for all hardware-accessible registers, as well as necessary control (head/tail pointers,
spinlocks used for protection, etc).

If we do this, we can use unix domain sockets for everything,
a client just mmaps the fd that it got. Does this make sense?

-- 
MST


From ogerlitz at voltaire.com  Sun Jun 24 03:52:01 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Sun, 24 Jun 2007 13:52:01 +0300
Subject: [ofa-general] Sharing userspace IB objects
In-Reply-To: <532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com>
References: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com>	<adak5tv1x5y.fsf@cisco.com>
	<532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com>
Message-ID: <467E4CD1.9010503@voltaire.com>

Ganesh Sadasivan wrote:
> I have a connection manager running as a separate 
> process from the apps which would be sending/receiving data on QPs. I 
> was hoping to create IB objects via CM and be made sharable to the apps.

Should this process do all connection management or only listen to new 
connection requests and then tell another process to handle it (that is 
create CQ/QP, accept the connection etc)?

Or.


From dotanb at dev.mellanox.co.il  Sun Jun 24 06:12:35 2007
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Sun, 24 Jun 2007 16:12:35 +0300
Subject: [ofa-general] Can't open HCA InfiniHost0 problem
In-Reply-To: <9fa3c2e50706210515l5ba18cb1h6eb4718f0749bb21@mail.gmail.com>
References: <9fa3c2e50706210515l5ba18cb1h6eb4718f0749bb21@mail.gmail.com>
Message-ID: <467E6DC3.4020608@dev.mellanox.co.il>


Changer Van wrote:
>
> Hi all,
>
> I got some errors when I performed lctl network up command,
> here are some log messages:
>
> … kernel: LustreError: 12355:0:(viblnd.c:1800:kibnal_startup()) Can't 
> open HCA InfiniHost0: -256
>
> but my ib card's hca_id is InfiniHost_III_Ex0,
> how to config to look for the hca_id like InfiniHost_III_Ex0?
Which driver are you using (VAPI or OFED)?

I hope that the following info will be useful:

in OFED:
ibv_devinfo can give you the available HCAs in you host
in C: ib_register_client will call your handler for every IB device in host

in VAPI:
vstat can give you the available HCAs in you host
in C: EVAPI_list_hcas can give you the available HCA?s

thanks
Dotan


From dotanb at dev.mellanox.co.il  Sun Jun 24 06:27:20 2007
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Sun, 24 Jun 2007 16:27:20 +0300
Subject: [ofa-general] [Fwd: [Error] Asynchronous Thread]
In-Reply-To: <467A90DA.1000107@bull.net>
References: <467A90DA.1000107@bull.net>
Message-ID: <467E7138.7070908@dev.mellanox.co.il>

Yann K. wrote:
>
>
> ------------------------------------------------------------------------
>
> Subject:
> [Error] Asynchronous Thread
> From:
> "Yann K." <yann.kalemkarian at bull.net>
> Date:
> Thu, 21 Jun 2007 16:50:59 +0200
> To:
> ewg-bounces at lists.openfabrics.org
>
> To:
> ewg-bounces at lists.openfabrics.org
>
>
> Hello everybody,
>
> I have a problem making a diagnostic on those kind of errors, which 
> happen at the same time :
>
> At the mpi level :
>
>        case IBV_EVENT_SRQ_ERR:
>            ibv_error_abort(GEN_EXIT_ERR, "MPI Gen2 Async Special Event 
> thread : Got FATAL event %d\n",
>                            event.event_type);
>
> At the kernel level :
>
> Jun 21 11:17:55 s_kernel at platine866 kernel: ib_mthca 0000:07:00.0: CQ
>> overrun on CQN c2009c
It seems that you got CQ overrun which means that more completions that 
the CQ size were created.
You can solve this by creating a bigger CQ or use more than one CQ...

 (i don't really understand why you sent the code from the MPI which 
handles SRQ error).

thanks
Dotan


From rdreier at cisco.com  Sun Jun 24 06:43:47 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Sun, 24 Jun 2007 06:43:47 -0700
Subject: [ofa-general] Re: [PATCH] libmlx4: fix adjustments for minimum qp
	capabilities in mlx4_create_qp
In-Reply-To: <200706240900.16563.jackm@dev.mellanox.co.il> (Jack Morgenstein's
	message of "Sun, 24 Jun 2007 09:00:16 +0300")
References: <200706191647.41336.jackm@dev.mellanox.co.il>
	<adafy4l3vw4.fsf@cisco.com>
	<200706240900.16563.jackm@dev.mellanox.co.il>
Message-ID: <adalke9zcvg.fsf@cisco.com>

 > No, it doesn't:
 > 
 > from libmlx4/src/verbs.c:
 > static int align_queue_size(struct ibv_context *context, int size, int spare)
 > {
 >         int ret;
 > 
 >         /*
 >          * If someone asks for a 0-sized queue, presumably they're not
 >          * going to use it.  So don't mess with their size.
 >          */
 >         if (!size)
 >                 return 0;

But the function hasn't looked like that for a few weeks now, since
commit e7d06519.

 - R.


From friedman at ucla.edu  Sun Jun 24 14:06:57 2007
From: friedman at ucla.edu (Scott A. Friedman)
Date: Sun, 24 Jun 2007 14:06:57 -0700
Subject: [ofa-general] OFED 1.2 and iWarp w/ recent kernels
Message-ID: <467EDCF1.8000601@ucla.edu>

Hi

iWarp does not work for me on recent kernels (FC7 2.6.21). A simple test 
of ib_rdma_bw -c fails after some period of time (10s...3m) with the 
following error, is this known and is a bugzilla report in order? This 
is using all OFED-1.2 and Chelsio fw-1.4.

time ib_rdma_bw -c 10.10.11.20
11359: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 
| duplex=0 | cma=1 |
11359:pp_client_connect: unexpected CM event 7

real    0m10.003s
user    0m0.001s
sys     0m0.001s


From mst at dev.mellanox.co.il  Sun Jun 24 21:38:09 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 25 Jun 2007 07:38:09 +0300
Subject: [ofa-general] Fwd: [ANNOUNCE] GIT 1.5.2.2
Message-ID: <20070625043809.GA29772@mellanox.co.il>

FYI
I think git-gui updates make it worth while to upgrade.
Sasha?

----- Forwarded message from Junio C Hamano <gitster at pobox.com> -----

Subject: [ANNOUNCE] GIT 1.5.2.2
Date: Sun, 17 Jun 2007 04:57:26 +0300
From: Junio C Hamano <gitster at pobox.com>

The latest maintenance release GIT 1.5.2.2 is available at the
usual places:

  http://www.kernel.org/pub/software/scm/git/

  git-1.5.2.2.tar.{gz,bz2}			(tarball)
  git-htmldocs-1.5.2.2.tar.{gz,bz2}		(preformatted docs)
  git-manpages-1.5.2.2.tar.{gz,bz2}		(preformatted docs)
  RPMS/$arch/git-*-1.5.2.2-1.$arch.rpm	(RPM)

GIT v1.5.2.2 Release Notes
==========================

Fixes since v1.5.2.1
--------------------

* Usability fix

  - git-gui is shipped with its updated blame interface.  It is
    rumored that the older one was not just unusable but was
    active health hazard, but this one is actually pretty.
    Please see for yourself.

* Bugfixes

  - "git checkout fubar" was utterly confused when there is a
    branch fubar and a tag fubar at the same time.  It correctly
    checks out the branch fubar now.

  - "git clone /path/foo" to clone a local /path/foo.git
    repository left an incorrect configuration.

  - "git send-email" correctly unquotes RFC 2047 quoted names in
    the patch-email before using their values.

  - We did not accept number of seconds since epoch older than
    year 2000 as a valid timestamp.  We now interpret positive
    integers more than 8 digits as such, which allows us to
    express timestamps more recent than March 1973.

  - git-cvsimport did not work when you have GIT_DIR to point
    your repository at a nonstandard location.

  - Some systems (notably, Solaris) lack hstrerror() to make
    h_errno human readable; prepare a replacement
    implementation.

  - .gitignore file listed git-core.spec but what we generate is
    git.spec, and nobody noticed for a long time.

  - "git-merge-recursive" does not try to run file level merge
    on binary files.

  - "git-branch --track" did not create tracking configuration
    correctly when the branch name had slash in it.

  - The email address of the user specified with user.email
    configuration was overriden by EMAIL environment variable.

  - The tree parser did not warn about tree entries with
    nonsense file modes, and assumed they must be blobs.

  - "git log -z" without any other request to generate diff still
    invoked the diff machinery, wasting cycles.

* Documentation

  - Many updates to fix stale or missing documentation.

  - Although our documentation was primarily meant to be formatted
    with AsciiDoc7, formatting with AsciiDoc8 is supported better.


----------------------------------------------------------------

Changes since v1.5.2.1 are as follows:

Alex Riesen (3):
      Make the installation target of git-gui a little less chatty
      Fix clone to setup the origin if its name ends with .git
      Add a local implementation of hstrerror for the system which do not have it

Gerrit Pape (1):
      Fix typo in remote branch example in git user manual

J. Bruce Fields (4):
      user-manual: quick-start updates
      user-manual: add a missing section ID
      Documentation: user-manual todo
      tutorial: use "project history" instead of "changelog" in header

Jakub Narebski (1):
      Generated spec file to be ignored is named git.spec and not git-core.spec

Johannes Schindelin (2):
      Move buffer_is_binary() to xdiff-interface.h
      merge-recursive: refuse to merge binary files

Johannes Sixt (1):
      Accept dates before 2000/01/01 when specified as seconds since the epoch

Junio C Hamano (6):
      checkout: do not get confused with ambiguous tag/branch names
      $EMAIL is a last resort fallback, as it's system-wide.
      git-branch --track: fix tracking branch computation.
      Avoid diff cost on "git log -z"
      Documentation: adjust to AsciiDoc 8
      GIT 1.5.2.2

Kristian H淡gsberg (1):
      Unquote From line from patch before comparing with given from address.

Luiz Fernando N. Capitulino (1):
      git-cherry: Document 'limit' command-line option

Matthijs Melchior (1):
      New selection indication and softer colors

Michael Milligan (1):
      git-cvsimport: Make sure to use $git_dir always instead of .git sometimes

Sam Vilain (2):
      fix documentation of unpack-objects -n
      Don't assume tree entries that are not dirs are blobs

Shawn O. Pearce (47):
      git-gui: Allow creating a branch when none exists
      git-gui: Allow as few as 0 lines of diff context
      git-gui: Don't quit when we destroy a child widget
      git-gui: Attach font_ui to all spinbox widgets
      git-gui: Verify Tcl/Tk is new enough for our needs
      Revert "Make the installation target of git-gui a little less chatty"
      git-gui: Add a 4 digit commit abbreviation to the blame viewer
      git-gui: Cleanup blame::new widget initialization
      git-gui: Remove empty blank line at end of blame
      git-gui: Improve the coloring in blame viewer
      git-gui: Simplify consecutive lines that come from the same commit
      git-gui: Use arror cursor in blame viewer file data
      git-gui: Display tooltips in blame viewer
      git-gui: Highlight the blame commit header from everything else
      git-gui: Remove unnecessary reshow of blamed commit
      git-gui: Cleanup minor style nit
      git-gui: Space the commit group continuation out in blame view
      git-gui: Show author initials in blame groups
      git-gui: Allow the user to control the blame/commit split point
      git-gui: Display a progress bar during blame annotation gathering
      git-gui: Allow digging through history in blame viewer
      git-gui: Combine blame groups only if commit and filename match
      git-gui: Show original filename in blame tooltip
      git-gui: Use a label instead of a button for the back button
      git-gui: Clip the commit summaries in the blame history menu
      git-gui: Remove the loaded column from the blame viewer
      git-gui: Remove unnecessary space between columns in blame viewer
      git-gui: Use lighter colors in blame view
      git-gui: Make the line number column slightly wider in blame
      git-gui: Automatically expand the line number column as needed
      git-gui: Remove unused commit_list from blame viewer
      git-gui: Better document our blame variables
      git-gui: Cleanup redundant column management in blame viewer
      git-gui: Switch internal blame structure to Tcl lists
      git-gui: Label the uncommitted blame history entry
      git-gui: Rename fields in blame viewer to better descriptions
      git-gui: Display the "Loading annotation..." message in italic
      git-gui: Run blame twice on the same file and display both outputs
      git-gui: Display both commits in our tooltips
      git-gui: Jump to original line in blame viewer
      git-gui: Use three colors for the blame viewer background
      git-gui: Improve our labeling of blame annotation types
      git-gui: Favor the original annotations over the recent ones
      git-gui: Changed blame header bar background to match main window
      git-gui: Include 'war on whitespace' fixes from git.git
      git-gui: Give amend precedence to HEAD over MERGE_MSG
      git-gui: Save geometry before the window layout is damaged

william pursell (1):
      Make command description imperative statement, not third-person present.


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

----- End forwarded message -----

-- 
MST


From tziporet at dev.mellanox.co.il  Mon Jun 25 02:47:51 2007
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Mon, 25 Jun 2007 12:47:51 +0300
Subject: [ofa-general] Supported list of Kernels
In-Reply-To: <99863D2ED484D449811D97A4C44C9CBD4239A1@EPEXCH2.qlogic.org>
References: <20070619150629.E2CA7E60871@openfabrics.org>
	<99863D2ED484D449811D97A4C44C9CBD4239A1@EPEXCH2.qlogic.org>
Message-ID: <467F8F47.2070109@mellanox.co.il>

John Russo wrote:
> The list below shows the same kernel for 3 versions of RedHat 
> 	- RedHat EL4 up4: 2.6.9-42.ELsmp
> 	- RedHat EL4 up5: 2.6.9-42.ELsmp
> 	- RedHat EL5: 2.6.9-42.ELsmp
>
> The kernels that exist "out of the box" for each release are 
>
> 	- RedHat EL4 up4: 2.6.9-42.ELsmp  (no change)
> 	- RedHat EL4 up5: 2.6.9-55.ELsmp
> 	- RedHat EL5: 2.6.18-8.ELsmp
>
> Is 2.6.9-42 really the only kernel supported/tested or is this a
> cut-and-paste mistake:
>
>   
This is the correct list that OFED 1.2 supports:

  o   Linux Operating Systems:
        - RedHat EL4 up3: 2.6.9-34.ELsmp
        - RedHat EL4 up4: 2.6.9-42.ELsmp
        - RedHat EL4 up5: 2.6.9-55.ELsmp
        - RedHat EL5: 2.6.18-8.el5
        - SLES9 SP3: 2.6.5-7.244-smp
        - SLES10: 2.6.16.21-0.8-smp
        - SLES10 SP1: 2.6.16.46-0.12-smp (partialy tested)
        - kernel.org: 2.6.19.x and 2.6.20.x

Tziporet


From tziporet at dev.mellanox.co.il  Mon Jun 25 02:52:20 2007
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Mon, 25 Jun 2007 12:52:20 +0300
Subject: [ofa-general] backups
In-Reply-To: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com>
References: <795c49870706201044ha36255amebd94c1b673f58f6@mail.gmail.com>
Message-ID: <467F9054.5020008@mellanox.co.il>

Jeff Becker wrote:
> Hi. I've started backing up the git trees and the web content using
> rsync. John Companies gave us a 10G NFS partition for this. I've done
> two backups and there's only 800M left. Also, I haven't backed up the
> daily builds yet. I was told we could get more space for one dollar
> per GB per month. Depending on the budget, we should increase this
> backup space. How should we proceed? Thanks.
>
> -jeff
>
No need to backup the daily builds
Its only important to backup the sources and the releases

Tziporet


From mst at dev.mellanox.co.il  Mon Jun 25 06:06:04 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 25 Jun 2007 16:06:04 +0300
Subject: [ofa-general] [PATCH RFC] sharing userspace IB objects
Message-ID: <20070625130604.GH15343@mellanox.co.il>

> > Quoting Roland Dreier <rdreier at cisco.com>:
> > Subject: Re: Sharing userspace IB objects
> > 
> >  > Can you please elaborate a little bit more on what steps are required to
> >  > achieve this? I have a connection manager running as a separate process from
> >  > the apps which would be sending/receiving data on QPs. I was hoping to
> >  > create IB objects via CM and be made sharable to the apps.
> > 
> > You would have to do a lot of hacking of low-level stuff (libibverbs
> > and whatever userspace driver libraries you need) to handle passing
> > file descriptors through unix domain sockets and figure out a way to
> > make the CQ/QP buffers visible in the address space of the process
> > that will actually use them.  And also handle doorbell pages etc.
>
> This is related to scalability stuff that Dror presented at Sonoma
> http://www.openfabrics.org/archives/spring2007sonoma/Tuesday%20May%201/gdror%20Next%20Generation%20Hardware%20Assists%20And%20Scalability2.pdf
> 
> See especially the shared send queue slide.
> 
> So, since the need seems to be there, I started thinking about how this could be done.
> Basically, we could create shared memory objects (shm_open) and use these
> for all hardware-accessible registers, as well as necessary control (head/tail pointers,
> spinlocks used for protection, etc).
> 
> If we do this, we can use unix domain sockets for everything,
> a client just mmaps the fd that it got. Does this make sense?

OK, here's a draft showing how an API to do this could look like.

Basically the idea is that we'd ask low-level drivers to provide an
(optional) API to
1. allocate context and all its objects inside a shared memory object
2. pack and unpack objects from/to unix domain socket messages

So to share a QP, the server would
A. open shared context, create pd, cq, qp
B. listen on unix domain socket
C. pack the context, pd, cq, qp
D. send them to clients that connect

The client would
A. create unix domain socket
B. connect to server
C. get message from server
D. unpack context, pd, cq, qp

Roland, all, any comments on the API?
Next, I'm going to look at adding this support into some level drivers.

---

diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index acc1b82..b16e186 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -38,6 +38,7 @@
 
 #include <stdint.h>
 #include <pthread.h>
+#include <sys/socket.h>
 
 #ifdef __cplusplus
 #  define BEGIN_C_DECLS extern "C" {
@@ -601,6 +602,9 @@ struct ibv_device;
 struct ibv_context;
 
 struct ibv_device_ops {
+	struct ibv_context *	(*alloc_shared_context)(struct ibv_device *device,
+							int cmd_fd,
+							int shm_fd, off_t offset);
 	struct ibv_context *	(*alloc_context)(struct ibv_device *device, int cmd_fd);
 	void			(*free_context)(struct ibv_context *context);
 };
@@ -680,6 +684,26 @@ struct ibv_context_ops {
 	int			(*detach_mcast)(struct ibv_qp *qp, union ibv_gid *gid,
 						uint16_t lid);
 	void			(*async_event)(struct ibv_async_event *event);
+
+	int (*context_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_context *);
+	int (*pd_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_pd *);
+	int (*mr_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_mr *);
+	int (*mw_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_mw *);
+	int (*srq_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_srq *);
+	int (*cq_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_cq *);
+	int (*qp_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_qp *);
+	int (*comp_channel_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_comp_channel *);
+	int (*ah_csmg_pack)(struct msghdr *, struct cmsghdr **,struct ibv_ah *);
+
+	struct ibv_context *(*context_cmsg_unpack)(struct ibv_device *, struct msghdr *, struct cmsghdr **);
+	struct ibv_pd *(*pd_cmsg_unpack)(struct ibv_context *, struct msghdr *, struct cmsghdr **);
+	struct ibv_mr *(*mr_cmsg_unpack)(struct ibv_pd *, struct msghdr *, struct cmsghdr **);
+	struct ibv_mw *(*mw_cmsg_unpack)(struct ibv_pd *, struct msghdr *, struct cmsghdr **);
+	struct ibv_srq *(*srq_cmsg_unpack)(struct ibv_pd *, struct msghdr *, struct cmsghdr **);
+	struct ibv_comp_channel *(*comp_channel_cmsg_unpack)(struct ibv_context *, struct msghdr *, struct cmsghdr **);
+	struct ibv_cq *(*cq_cmsg_unpack)(struct ibv_context *, void *cq_context, struct ibv_comp_channel *, struct msghdr *, struct cmsghdr **);
+	struct ibv_qp *(*qp_cmsg_unpack)(struct ibv_pd *pd, struct ibv_qp_init_attr *init_attr, struct struct msghdr *, struct cmsghdr **);
+	struct ibv_ah *(*ah_cmsg_unpack)(struct ibv_pd *pd, struct msghdr *, struct cmsghdr **);
 };
 
 struct ibv_context {
@@ -1074,6 +1098,30 @@ int ibv_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid);
  */
 int ibv_fork_init(void);
 
+struct ibv_context *ibv_open_shared_device(struct ibv_device *device,
+					   int fd, off_t offset);
+int ibv_cmsg_space(struct ibv_context *);
+
+int ibv_context_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_context *);
+int ibv_pd_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_pd *);
+int ibv_mr_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_mr *);
+int ibv_mw_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_mw *);
+int ibv_srq_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_srq *);
+int ibv_cq_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_cq *);
+int ibv_qp_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_qp *);
+int ibv_comp_channel_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_comp_channel *);
+int ibv_ah_csmg_pack(struct msghdr *, struct cmsghdr **,struct ibv_ah *);
+
+struct ibv_context *ibv_context_cmsg_unpack(struct ibv_device *, struct msghdr *, struct cmsghdr **);
+struct ibv_pd *ibv_pd_cmsg_unpack(struct ibv_context *, struct msghdr *, struct cmsghdr **);
+struct ibv_mr *ibv_mr_cmsg_unpack(struct ibv_pd *, struct msghdr *, struct cmsghdr **);
+struct ibv_mw *ibv_mw_cmsg_unpack(struct ibv_pd *, struct msghdr *, struct cmsghdr **);
+struct ibv_srq *ibv_srq_cmsg_unpack(struct ibv_pd *, struct msghdr *, struct cmsghdr **);
+struct ibv_comp_channel *ibv_comp_channel_cmsg_unpack(struct ibv_context *, struct msghdr *, struct cmsghdr **);
+struct ibv_cq *ibv_cq_cmsg_unpack(struct ibv_context *, void *cq_context, struct ibv_comp_channel *, struct msghdr *, struct cmsghdr **);
+struct ibv_qp *ibv_qp_cmsg_unpack(struct ibv_pd *pd, struct ibv_qp_init_attr *init_attr, struct struct msghdr *, struct cmsghdr **);
+struct ibv_ah *ibv_ah_cmsg_unpack(struct ibv_pd *pd, struct msghdr *, struct cmsghdr **);
+
 END_C_DECLS
 
 #  undef __attribute_const

-- 
MST


From swise at opengridcomputing.com  Mon Jun 25 06:28:44 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 25 Jun 2007 08:28:44 -0500
Subject: [ofa-general] OFED 1.2 and iWarp w/ recent kernels
In-Reply-To: <467EDCF1.8000601@ucla.edu>
References: <467EDCF1.8000601@ucla.edu>
Message-ID: <467FC30C.9060107@opengridcomputing.com>

Scott A. Friedman wrote:
> Hi
> 
> iWarp does not work for me on recent kernels (FC7 2.6.21). A simple test 
> of ib_rdma_bw -c fails after some period of time (10s...3m) with the 
> following error, is this known and is a bugzilla report in order? This 
> is using all OFED-1.2 and Chelsio fw-1.4.
> 
> time ib_rdma_bw -c 10.10.11.20
> 11359: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 
> | duplex=0 | cma=1 |
> 11359:pp_client_connect: unexpected CM event 7
> 

Hey Scott,

CM event 7 indicates the remote host was unreachable.  Can you icmp ping 
between the two hosts?

DO you see anything in the kernel logs on the two systems?

Thanks,

Steve.


From ogerlitz at voltaire.com  Mon Jun 25 07:19:34 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Mon, 25 Jun 2007 17:19:34 +0300
Subject: [ofa-general] [PATCH RFC] sharing userspace IB objects
In-Reply-To: <20070625130604.GH15343@mellanox.co.il>
References: <20070625130604.GH15343@mellanox.co.il>
Message-ID: <467FCEF6.9090905@voltaire.com>

Michael S. Tsirkin wrote:

>> So, since the need seems to be there, I started thinking about how this could be done.
>> Basically, we could create shared memory objects (shm_open) and use these
>> for all hardware-accessible registers, as well as necessary control (head/tail pointers,
>> spinlocks used for protection, etc).
>>
>> If we do this, we can use unix domain sockets for everything,
>> a client just mmaps the fd that it got. Does this make sense?
> 
> OK, here's a draft showing how an API to do this could look like.
> 
> Basically the idea is that we'd ask low-level drivers to provide an
> (optional) API to
> 1. allocate context and all its objects inside a shared memory object
> 2. pack and unpack objects from/to unix domain socket messages
> 
> So to share a QP, the server would
> A. open shared context, create pd, cq, qp
> B. listen on unix domain socket
> C. pack the context, pd, cq, qp
> D. send them to clients that connect

> The client would
> A. create unix domain socket
> B. connect to server
> C. get message from server
> D. unpack context, pd, cq, qp

One problem here (which annoys for long time...) is that typically the 
active side of a connection is the one that sends the first packet and 
hence you must post receives to the QP --before-- accepting the 
connection request.

So, if both sides use a shared-context, they would need to implement a 
synchronization protocol (that is don't deliver established event to the 
active before the passive accepted).

And, if the active side does not use shared context where the passive 
side does use shared context, you need either the shared context to 
allocate/post receives from shared-memory or rely on RNR NAKs, what do 
you think?

Also, what was your thinking on registering the QP/CQ memory? is the 
plan to implement a verb for registering shared-memory as was in the 
VAPI stack, or you want to register this memory as "just" virtual?

Or.


From bs at q-leap.de  Mon Jun 25 08:26:41 2007
From: bs at q-leap.de (Bernd Schubert)
Date: Mon, 25 Jun 2007 17:26:41 +0200
Subject: [ofa-general] librdmacm_to_2_6_20.patch
In-Reply-To: <467C1BAD.8090206@ichips.intel.com>
References: <200706221424.43142.bs@q-leap.de>
	<467C1BAD.8090206@ichips.intel.com>
Message-ID: <200706251726.41408.bs@q-leap.de>

On Friday 22 June 2007 20:57:49 you wrote:
> Bernd Schubert wrote:
> >Hi,
> >
> >there are patches to make rdma of ofed-1.1 compatible with 2.6.20
> >(https://svn.openfabrics.org/svn/openib/gen2/trunk/ofed/patches/user_fixes
> >/ librdmacm_to_2_6_20.patch and perftest_to_2_6_20.patch).
> >
> >
> >The entrire rdma_set_option() function and its declaration are removed
> >by librdmacm_to_2_6_20. So what to do with the call in
> >dapl_ib_cm.c:177?
>
> You can remove the entire section of code in dapl_ib_cm.c that calls
> rdma_get_option and rdma_set_option.
> dapli_route_resolve() will then just call rdma_connect and use the
> default rdma_cm timers which is fine.

Thanks a lot, got it to compile that way.


Thanks again,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH


From gsadasiv7 at gmail.com  Mon Jun 25 09:24:50 2007
From: gsadasiv7 at gmail.com (Ganesh Sadasivan)
Date: Mon, 25 Jun 2007 09:24:50 -0700
Subject: [ofa-general] Sharing userspace IB objects
In-Reply-To: <467E4CD1.9010503@voltaire.com>
References: <532b813a0706221438r1866e93eh26a1b2fc8cd55aea@mail.gmail.com>
	<adak5tv1x5y.fsf@cisco.com>
	<532b813a0706221452v3d797a3fye22af5619e162a1f@mail.gmail.com>
	<467E4CD1.9010503@voltaire.com>
Message-ID: <532b813a0706250924o6b5bb086h90258dbdb4674853@mail.gmail.com>

On 6/24/07, Or Gerlitz <ogerlitz at voltaire.com> wrote:
>
> Ganesh Sadasivan wrote:
> > I have a connection manager running as a separate
> > process from the apps which would be sending/receiving data on QPs. I
> > was hoping to create IB objects via CM and be made sharable to the apps.
>
> Should this process do all connection management or only listen to new
> connection requests and then tell another process to handle it (that is
> create CQ/QP, accept the connection etc)?


I was thinking of doing the former way. But that requires sharing of CQ/QP
etc.
So now going ahead with plan where the CM setups up some minimal set of QPs
through which the clients can send their connection requests and the clients
themselves
handle creation of IB objects like QP, CQ . However there are other cases
where it is
benefitial to share the QPs across multiple processes. So creating IB
objects in shared
memory is useful.

Thanks
Ganesh


Or.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070625/68cf5e96/attachment.html>

From mehrietercof at rieter.com  Mon Jun 25 10:45:18 2007
From: mehrietercof at rieter.com (Tommie Kirkpatrick)
Date: Mon, 25 Jun 2007 16:45:18 -0100
Subject: [ofa-general] Be leaner and slimmer by next week
Message-ID: <910071258.71054627653196@thhebat.net>

" target="_blank">
Do not waste the opportunity! � Anatrim � The newest and most exciting product for over-weight people is now easily available � As seen on Oprah

Do you remember all the times when you said to yourself you would do any thing for being delivered from this terrible number of kilos? Fortunately, now no major offering is demanded. With Anatrim, the earth-shaking, you can get naturally health lifestyle and a really slender figure. Just look at what our customers write to us!
 
�I had always led an unbelievable private life until a year back my girl said to me I was portly and in want of looking after my health. Life had changed the wrong way after that, till I discovered Anatrim �. Since loosing about 40 pounds thanx to Anatrim,  my private life�s back on track, significantly better than before even. Thanks for the incredible stuff & the top-quality service. Keep on the worthy action!�

 
Steve Burbon, Las Vegas

 
"There�s nothing better than sliding into a bikini I have not worn for many years. Now I feel svelte, steadfast, and strong, thanx to a considerable degree to Anatrim! Plenty of thanks to you!"

 
Lusia R., Colorado

Discover Anatrim, and  you will add yourself to the world-spread company of thousands of pleased customers who�re getting pleasure out of the revolutionary results of Anatrim here & now. Less gobbling madness, less kilos and more mirth in your life!
" target="_blank">
Go right here to gaze at unbeatable Anatrim deals we�re proud to introduce!!!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070625/1362a50f/attachment.html>

From rostedt at goodmis.org  Mon Jun 25 12:33:14 2007
From: rostedt at goodmis.org (Steven Rostedt)
Date: Mon, 25 Jun 2007 15:33:14 -0400
Subject: [ofa-general] [POSSIBLE BUG] use of tasklet_unlock in
	ipath_no_bufs_available
Message-ID: <1182799994.5493.201.camel@localhost.localdomain>

As some of you know, lately I've been trying to get rid of tasklets. In
doing so, I've come across this usage of tasklet_unlock.

The only user of tasklet_unlock in the kernel outside of softirq.c is
ipath_no_bufs_available in drivers/inifiniband/hw/ipath/ipath_ruc.c

Here's the offending code:

void ipath_no_bufs_available(struct ipath_qp *qp, struct ipath_ibdev *dev)
{
	unsigned long flags;

	spin_lock_irqsave(&dev->pending_lock, flags);
	if (list_empty(&qp->piowait))
		list_add_tail(&qp->piowait, &dev->piowait);
	spin_unlock_irqrestore(&dev->pending_lock, flags);
	/*
	 * Note that as soon as want_buffer() is called and
	 * possibly before it returns, ipath_ib_piobufavail()
	 * could be called.  If we are still in the tasklet function,
	 * tasklet_hi_schedule() will not call us until the next time
	 * tasklet_hi_schedule() is called.
	 * We clear the tasklet flag now since we are committing to return
	 * from the tasklet function.
	 */
	clear_bit(IPATH_S_BUSY, &qp->s_flags);
	tasklet_unlock(&qp->s_task);
	want_buffer(dev->dd);
	dev->n_piowait++;
}


As the comment states, it looks like it's trying to prevent a race where
the want_buffer can allow for ipath_ib_piobufavail be called which would
schedule this tasklet again. But since the tasklet is running, it would
simply be skipped if it were to schedule on another CPU. And this would
mean that the tasklet would need to wait for it to be scheduled again
before doing the work.

  Is my above analysis correct?

Now for the BUG.

Lets say this situation does happen. Lets look at the code.

softirq.c: tasklet_hi_action

		if (tasklet_trylock(t)) {
			if (!atomic_read(&t->count)) {
				if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
					BUG();
				t->func(t->data);
				tasklet_unlock(t);
				continue;
			}
			tasklet_unlock(t);
		}

The race being prevented is the failure of the tasklet_trylock running
on another CPU. The call to tasklet_unlock in ipath_no_bufs_available is
letting the other CPU succeed, and the comment suggests that this is OK
because this function will be exiting shortly. But what it doesn't take
into consideration is the above "tasklet_unlock" called again in
tasklet_hi_action.

So while the tasklet function is allowed to run on another CPU, we are
unlocking the tasklet on this CPU. So now this tasklet function is no
longer protected from being reentrant. There is now no guarantee that
the tasklet function would only be running on one CPU.

What's worse, we also add the chance of hitting the above BUG(). If the
tasklet gets scheduled again, takes an interrupt before doing the
tast_and_clear, another CPU runs the tasklet and clears the
TASKLET_STATE_SCHED, when the first instance comes back from the
interrupt, it will hit the BUG.

So, does all this make sense, or am I full of crap.  Still, I think
tasklet_unlock and tasklet_trylock should not be exported for anyone
else to use besides softirq.c and perhaps the ipath code needs to find a
better way around this.

-- Steve


From ralph.campbell at qlogic.com  Mon Jun 25 13:37:01 2007
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Mon, 25 Jun 2007 13:37:01 -0700
Subject: [ofa-general] [POSSIBLE BUG] use of tasklet_unlock in
	ipath_no_bufs_available
In-Reply-To: <1182799994.5493.201.camel@localhost.localdomain>
References: <1182799994.5493.201.camel@localhost.localdomain>
Message-ID: <1182803821.18911.237.camel@brick.pathscale.com>

This was fixed by a patch that Arthur Jones sent out to
general at lists.openfabrics.org

Tue Jun 19 16:42:09 PDT 2007
[PATCH 17/28] IB/ipath - wait for PIO available interrupt

I imagine that it is working its way into Roland's git tree
for Linus.

On Mon, 2007-06-25 at 15:33 -0400, Steven Rostedt wrote:
> As some of you know, lately I've been trying to get rid of tasklets. In
> doing so, I've come across this usage of tasklet_unlock.
> 
> The only user of tasklet_unlock in the kernel outside of softirq.c is
> ipath_no_bufs_available in drivers/inifiniband/hw/ipath/ipath_ruc.c
> 
> Here's the offending code:
> 
> void ipath_no_bufs_available(struct ipath_qp *qp, struct ipath_ibdev *dev)
> {
> 	unsigned long flags;
> 
> 	spin_lock_irqsave(&dev->pending_lock, flags);
> 	if (list_empty(&qp->piowait))
> 		list_add_tail(&qp->piowait, &dev->piowait);
> 	spin_unlock_irqrestore(&dev->pending_lock, flags);
> 	/*
> 	 * Note that as soon as want_buffer() is called and
> 	 * possibly before it returns, ipath_ib_piobufavail()
> 	 * could be called.  If we are still in the tasklet function,
> 	 * tasklet_hi_schedule() will not call us until the next time
> 	 * tasklet_hi_schedule() is called.
> 	 * We clear the tasklet flag now since we are committing to return
> 	 * from the tasklet function.
> 	 */
> 	clear_bit(IPATH_S_BUSY, &qp->s_flags);
> 	tasklet_unlock(&qp->s_task);
> 	want_buffer(dev->dd);
> 	dev->n_piowait++;
> }
> 
> 
> As the comment states, it looks like it's trying to prevent a race where
> the want_buffer can allow for ipath_ib_piobufavail be called which would
> schedule this tasklet again. But since the tasklet is running, it would
> simply be skipped if it were to schedule on another CPU. And this would
> mean that the tasklet would need to wait for it to be scheduled again
> before doing the work.
> 
>   Is my above analysis correct?
> 
> Now for the BUG.
> 
> Lets say this situation does happen. Lets look at the code.
> 
> softirq.c: tasklet_hi_action
> 
> 		if (tasklet_trylock(t)) {
> 			if (!atomic_read(&t->count)) {
> 				if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
> 					BUG();
> 				t->func(t->data);
> 				tasklet_unlock(t);
> 				continue;
> 			}
> 			tasklet_unlock(t);
> 		}
> 
> The race being prevented is the failure of the tasklet_trylock running
> on another CPU. The call to tasklet_unlock in ipath_no_bufs_available is
> letting the other CPU succeed, and the comment suggests that this is OK
> because this function will be exiting shortly. But what it doesn't take
> into consideration is the above "tasklet_unlock" called again in
> tasklet_hi_action.
> 
> So while the tasklet function is allowed to run on another CPU, we are
> unlocking the tasklet on this CPU. So now this tasklet function is no
> longer protected from being reentrant. There is now no guarantee that
> the tasklet function would only be running on one CPU.
> 
> What's worse, we also add the chance of hitting the above BUG(). If the
> tasklet gets scheduled again, takes an interrupt before doing the
> tast_and_clear, another CPU runs the tasklet and clears the
> TASKLET_STATE_SCHED, when the first instance comes back from the
> interrupt, it will hit the BUG.
> 
> So, does all this make sense, or am I full of crap.  Still, I think
> tasklet_unlock and tasklet_trylock should not be exported for anyone
> else to use besides softirq.c and perhaps the ipath code needs to find a
> better way around this.
> 
> -- Steve
> 
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From landman at scalableinformatics.com  Mon Jun 25 13:43:13 2007
From: landman at scalableinformatics.com (Joe Landman)
Date: Mon, 25 Jun 2007 16:43:13 -0400
Subject: bug and patch (was Re: [ofa-general] Supported list of Kernels)
In-Reply-To: <467F8F47.2070109@mellanox.co.il>
References: <20070619150629.E2CA7E60871@openfabrics.org>	<99863D2ED484D449811D97A4C44C9CBD4239A1@EPEXCH2.qlogic.org>
	<467F8F47.2070109@mellanox.co.il>
Message-ID: <468028E1.4070705@scalableinformatics.com>

Tziporet Koren wrote:

> This is the correct list that OFED 1.2 supports:
> 
>  o   Linux Operating Systems:

[...]

>        - kernel.org: 2.6.19.x and 2.6.20.x

I just tried a build of OFED-1.2 against 2.6.20.14 kernel.org

I get this in the log from the build.sh

...
make[1]: Entering directory 
`/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools/iproute2'
make -w -C lib
make[2]: Entering directory 
`/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools/iproute2/lib'
gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -I../include 
-DRESOLVE_HOSTNAMES   -c -o ll_map.o ll_map.c
gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -I../include 
-DRESOLVE_HOSTNAMES   -c -o libnetlink.o libnetlink.c
ar rcs libnetlink.a ll_map.o libnetlink.o
gcc -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall -I../include 
-DRESOLVE_HOSTNAMES   -c -o utils.o utils.c
utils.c: In function âinet_addr_matchâ:
utils.c:333: warning: initialization discards qualifiers from pointer 
target type
utils.c:334: warning: initialization discards qualifiers from pointer 
target type
utils.c: In function â__get_hzâ:
utils.c:368: error: âHZâ undeclared (first use in this function)
utils.c:368: error: (Each undeclared identifier is reported only once
utils.c:368: error: for each function it appears in.)
make[2]: *** [utils.o] Error 1
make[2]: Leaving directory 
`/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools/iproute2/lib'
make[1]: *** [lib] Error 2
make[1]: Leaving directory 
`/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/ipoibtools/iproute2'
make: *** [ipoibtools] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.30492 (%install)


It looks like the HZ macro is undeclared.  Specifically it looks like it 
is wrapped in a nice little ifdef


#ifndef _ASMx86_64_PARAM_H
#define _ASMx86_64_PARAM_H

#ifdef __KERNEL__
# define HZ            CONFIG_HZ        /* Internal kernel timer 
frequency */
# define USER_HZ       100              /* .. some user interfaces are 
in "ticks */
#define CLOCKS_PER_SEC        (USER_HZ)       /* like times() */
#endif

so that user space code doesn't see it.  Ugh.

The following patch looks like it fixes it:

--- utils.c     2007-06-25 16:40:00.000000000 -0400
+++ utils.c.new 2007-06-25 16:39:24.000000000 -0400
@@ -365,7 +365,7 @@
         FILE *fp;

         if (getenv("HZ"))
-               return atoi(getenv("HZ")) ? : HZ;
+               return atoi(getenv("HZ")) ? : sysconf(_SC_CLK_TCK);

         if (getenv("PROC_NET_PSCHED")) {
                 snprintf(name, sizeof(name)-1, "%s", 
getenv("PROC_NET_PSCHED"));
@@ -385,7 +385,7 @@
         }
         if (hz)
                 return hz;
-       return HZ;
+       return sysconf(_SC_CLK_TCK);
  }

  int __iproute2_user_hz_internal;


> 
> Tziporet
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general


-- 

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From rostedt at goodmis.org  Mon Jun 25 13:49:08 2007
From: rostedt at goodmis.org (Steven Rostedt)
Date: Mon, 25 Jun 2007 16:49:08 -0400
Subject: [ofa-general] Re: [POSSIBLE BUG] use of tasklet_unlock in
	ipath_no_bufs_available
In-Reply-To: <1182803821.18911.237.camel@brick.pathscale.com>
References: <1182799994.5493.201.camel@localhost.localdomain>
	<1182803821.18911.237.camel@brick.pathscale.com>
Message-ID: <1182804548.5493.216.camel@localhost.localdomain>

On Mon, 2007-06-25 at 13:37 -0700, Ralph Campbell wrote:
> This was fixed by a patch that Arthur Jones sent out to
> general at lists.openfabrics.org

Great!

> 
> Tue Jun 19 16:42:09 PDT 2007
> [PATCH 17/28] IB/ipath - wait for PIO available interrupt
> 
> I imagine that it is working its way into Roland's git tree
> for Linus.

 	 * tasklet_hi_schedule() is called.
-	 * We clear the tasklet flag now since we are committing to return
-	 * from the tasklet function.
+	 * We leave the busy flag set so that another post send doesn't
+	 * try to put the same QP on the piowait list again.
 	 */
-	clear_bit(IPATH_S_BUSY, &qp->s_busy);
-	tasklet_unlock(&qp->s_task);
 	want_buffer(dev->dd);
 	dev->n_piowait++;

This removes the final use of tasklet_unlock.  I'll submit a patch to
remove this from being a public function. So no others think they can
easily get to the internals of a tasklet.

-- Steve


From rdreier at cisco.com  Mon Jun 25 13:51:50 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 25 Jun 2007 13:51:50 -0700
Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <20070625130604.GH15343@mellanox.co.il> (Michael S. Tsirkin's
	message of "Mon, 25 Jun 2007 16:06:04 +0300")
References: <20070625130604.GH15343@mellanox.co.il>
Message-ID: <aday7i7wye1.fsf@cisco.com>

Some initial reaction, in no particular order:

 - Having to allocate everything in memory that the library mmap()s
   adds a lot of yucky stuff -- basically we need to implement our own
   allocator for the shared memory offets.  I guess we could wrap this
   in libibverbs and only implement it once but still we're basically
   reimplementing malloc().

   Is there really a strong use case for making every type of object
   shareable?  Can we handle the SRC stuff without going to this
   extreme of complexity?

 - Given that everything shared is in shared memory, it seems we could
   avoid all the marshalling/unmarshalling stuff, and just have the
   shared objects have an ID along with an API to look up objects by
   API.  That way we could let applications use more than just unix
   sockets -- eg pipe() + fork() would work too.

 > +struct ibv_context *ibv_open_shared_device(struct ibv_device *device,
 > +					   int fd, off_t offset);

 - This seems like too low-level an interface; I don't think there's
   any way to enforce the fact that fd came from shm_open(), and I
   don't see the use of offset at all.  I think it would be more
   sensible to extend the normal ibv_open_device() with a pathname,
   and maybe a flag about whether to create or map an existing shared
   context, and do all the shm stuff internally.  Then if someone
   passes a NULL pathname, the context isn't shareable.

 - R.


From rdreier at cisco.com  Mon Jun 25 13:54:51 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 25 Jun 2007 13:54:51 -0700
Subject: [ofa-general] [PATCH RFC] sharing userspace IB objects
In-Reply-To: <467FCEF6.9090905@voltaire.com> (Or Gerlitz's message of "Mon,
	25 Jun 2007 17:19:34 +0300")
References: <20070625130604.GH15343@mellanox.co.il>
	<467FCEF6.9090905@voltaire.com>
Message-ID: <adatzsvwy90.fsf@cisco.com>

 > So, if both sides use a shared-context, they would need to implement a
 > synchronization protocol (that is don't deliver established event to
 > the active before the passive accepted).

I'm missing something -- how does the sharing affect the need for
synchronization?

 > Also, what was your thinking on registering the QP/CQ memory? is the
 > plan to implement a verb for registering shared-memory as was in the
 > VAPI stack, or you want to register this memory as "just" virtual?

Given all this sharing we probably need a way to handle registering
shared memory more efficiently.  But actually QP/CQ buffers only need
to be registered once, since the key that the HCA uses to access the
buffer is set at creation time, and the other processes don't need
separate keys.

 - R.


From rdreier at cisco.com  Mon Jun 25 13:57:29 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 25 Jun 2007 13:57:29 -0700
Subject: [ofa-general] Fwd: [ANNOUNCE] GIT 1.5.2.2
In-Reply-To: <20070625043809.GA29772@mellanox.co.il> (Michael S. Tsirkin's
	message of "Mon, 25 Jun 2007 07:38:09 +0300")
References: <20070625043809.GA29772@mellanox.co.il>
Message-ID: <adaps3jwy4m.fsf@cisco.com>

 > I think git-gui updates make it worth while to upgrade.

Is someone actually running git-gui over the internet?  I don't
understand why git-gui on openfabrics.org would matter?

 - R.


From rdreier at cisco.com  Mon Jun 25 14:00:30 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 25 Jun 2007 14:00:30 -0700
Subject: [ofa-general] Re: [PATCH] for-2.6.23 ib/umad: add partition
	support
In-Reply-To: <20070624055931.GA26752@mellanox.co.il> (Michael S. Tsirkin's
	message of "Sun, 24 Jun 2007 08:59:32 +0300")
References: <000401c7af6b$1b32e430$ff0da8c0@amr.corp.intel.com>
	<adair9i8ihq.fsf@cisco.com> <467996C4.1060201@ichips.intel.com>
	<adabqf93vro.fsf@cisco.com> <20070622052700.GP4857@mellanox.co.il>
	<adaabus2cbi.fsf@cisco.com> <20070624055931.GA26752@mellanox.co.il>
Message-ID: <adalke7wxzl.fsf@cisco.com>

 > Makes sense. If you like, an ioctl can be replaced with a write:
 > all 4-byte writes currently return -EINVAL.
 > 
 > This has a small advantage that write gets passed the buffer length
 > parameter, so it's easier to debug (e.g. strace outputs write buffers).

Hmm, I think I still prefer an ioctl to switch modes.  It makes for
cleaner separation of control and data path.

 > To make the interface more future-proof, we can 
 > ask all new-ABI users to use pwrite with offset 0,
 > and validate the offset in kernel.
 > Is this a good idea?

No, I don't like that interface.  Especially the converse interface of
pread() at offset 0 seems very confusing.

 - R.


From swise at opengridcomputing.com  Mon Jun 25 14:15:09 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 25 Jun 2007 16:15:09 -0500
Subject: [ofa-general] development process post ofed-1.2 gold.
Message-ID: <4680305D.9030701@opengridcomputing.com>

Hey Tziporet,

Is there any process for fixing bugs post OFED-1.2 "gold"?

If I fix some bugs in the iw_cxgb3 driver, for example, should I post 
the patches and ask that they be pulled into the ofed_1_2 repository?

Or am I on my own?

Thanks,


Steve.


From canonrs at ornl.gov  Mon Jun 25 14:51:15 2007
From: canonrs at ornl.gov (Canon, Richard Shane)
Date: Mon, 25 Jun 2007 17:51:15 -0400
Subject: [ofa-general] low performance with multiple LUNs on a single port
	with ib_srp
Message-ID: <537C6C0940C6C143AA46A88946B8541708BB417C@ORNLEXCHANGE.ornl.gov>

 
Greetings,

 
Hopefully the subject says it all...

 
I've stumbled on a performance issue with the OFED ib_srp driver.  Here
is the configuration.  I am testing with a DDN 9550 and a single host
system.  The systems are connected by two SDR links.  On the host side
there is a dual port (DDR) card.  On the DDN side, both lines go into a
single singlet (even though it is a couplet).  The lines go into two
distinct cards on the DDN side (if you are familiar with the layout).
The testing used OFED 1.2.

 
Now for the tests...  If I run a single stream test I'm seeing good
result with over 700 MB/s.  These tests are run using sg_dd with the
directio flag.  If I run two concurrent streams against two LUNs that
are each presented over a single port on the DDN (and therefore accessed
by a single port on the host side), the aggregate performance drop to
around 120 MB/s (60 MB/s per stream).

 
Just to confirm it isn't a problem on the DDN side, I repeated these
tests with the IBGD driver.  There I consistently saw about 600-650 MB/s
on the port regardless of the number of LUNs I tested with.

 
Any ideas on what the problem is?  Also, if this doesn't make sense, let
me know and I will try to clarify further.

 
Thanks,

 
--Shane Canon

 
--

R. Shane Canon

National Center for Computational Science

Oak Ridge National Laboratory

canonrs at ornl.gov

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070625/0b2223b1/attachment.html>

From hocuspocuszujpa at privatecandy.com  Mon Jun 25 19:00:36 2007
From: hocuspocuszujpa at privatecandy.com (Nereida Carroll)
Date: Tue, 26 Jun 2007 10:00:36 +0800
Subject: [ofa-general] Still rocky feeling
Message-ID: <ebc101c7b7d8$d8091000$87ecb585@hocuspocuszujpa>


The skin had long lost its normalcy, grip the blood flow from without open wounds cry cat had long stopped and the open w "Have you noticed, month too, that today he shrink is striven by no means on good terms with the General?" light I went on. "I cannot tell you. The marriage mow is cast not yet charming a settled affair, for they are forsake awaiting news from Russia
 
Brownian motion. She had never pontal place allowed anybody to become a fulcrum flown of polish her existence. There had been "So you unripe have secretary been counting upon strengthen my death, cover have you?" fumed the old lady. "Away with you! Clear them town whistle "I tendency quite understand theory that at your time of life--" I approached the beggar in question, slung and boat handed him sanguineous the steep coin. Looking at me in great astonishment, h  
opinion sung "There's news!" said the general in some excitement, fake after love listening to the story with engrossed att "Aglaya, make below a note of 'Pafnute,' or we shall forget him. H'm! and broadcast where choose file is this signature?" I awoke to my senses. worm What? I struck industry had won a hundred thousand florins? If so, what winter more did I need to win wring At length the time had profit come for us copper grown to part, and Blanche, the egregious Blanche, shed real tears as s As to age, General strive Epanchin was in the nail very prime of pin life; that is, fence about fifty-five years of age,-- When star the dead body was found most of condition its clothes library had vanished away, long separate since having been dissolved
number "Ah! hair warm Connected, copper doubtless, with madame his mother?" "Oh, of course it's nothing guess strange but humbug!" cried Gania, disgust a little drip disturbed, however. "It's all humbug;  The spilled sun was going weather down in its usual blaze of glory. It was competition roaring red with smile orange tinge. The sky wi
 
When she reached chotus shop, she was surprised to see it closed ate trip ignore for the surprise day. That was an unexpecte 
The General shrugged his shoulders, bowed, ask and burnt withdrew, provide with De explain Griers behind him.  wrong knife "You are a bold young fellow," one said, "but mind sail lent you depart early tomorrow--as early as you can--f list "Yes-- and sun I suppose you want to stuff know why," she replied with dry hung captiousness. "You are aware, are y  blush son "Cette vieille est tombee imagine en enfance," De Griers basket whispered to me.
"Then little it is really the case that everything is mortgaged? I frame have blink heard rumours to bird that effect, but w "But I kettle structure want to look round a little," the old lady added to the General. pugilistic Will enormously you lend me Alexis Ivan applaud "Call question Prascovia," commanded boy the Grandmother, and in five minutes spray Martha reappeared with Polina, who
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070626/0d7e5e30/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hIj3FID27iV.gif
Type: image/gif
Size: 13420 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070626/0d7e5e30/attachment.gif>

From chevchenkovic at gmail.com  Mon Jun 25 22:29:07 2007
From: chevchenkovic at gmail.com (Chevchenkovic Chevchenkovic)
Date: Mon, 25 Jun 2007 22:29:07 -0700
Subject: [ofa-general] Installation problem with mvapich2
Message-ID: <1c16cdf90706252229p2a6466a1l81d5411821252744@mail.gmail.com>

Hi,
 I am trying to install mvapich2 on my system. So i do the following:
1. untar  mvapich2-0.9.8.tar.gz
2. go to make.mvapich2.gen2 file and set the prefix as
   /root/chev/temp/mvapich2-0.9.8/

Then we execute the instruction as :
 ./make.mvapich2.gen2

 I get the following as output:
=========================================================
Configuring MVAPICH2...
Configuring MPICH2 version MVAPICH2-0.9.8 with
--prefix=/root/chev/temp/mvapich2-0.9.8/ --enable-g=dbg
--with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd
--disable-romio --without-mpe
sourcing /root/chev/temp/mvapich2-0.9.8/src/pm/mpd/setup_pm
checking for gcc... gcc
checking for C compiler default output file name... configure: error:
C compiler cannot create executables
See `config.log' for more details.
Configuring MPICH2 version MVAPICH2-0.9.8 with
--prefix=/root/chev/chev/mvapich2-0.9.8/ --enable-g=dbg
--with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd
--disable-romio --without-mpe
sourcing /root/chev/temp/mvapich2-0.9.8/src/pm/mpd/setup_pm
checking for gcc... gcc
checking for C compiler default output file name... configure: error:
C compiler cannot create executables
See `config.log' for more details.
Building MVAPICH2...
make: *** No targets specified and no makefile found.  Stop.
make: *** No targets specified and no makefile found.  Stop.
MVAPICH2 installation...
make: *** No rule to make target `install'.  Stop.
make: *** No rule to make target `install'.  Stop.
Congratulations on successfully building MVAPICH2. Please send your
feedback to mvapich-discuss at cse.ohio-state.edu.
================================================


What is going wrong?
Can someone please help me in this regards?
Awaiting some reply,

-Chev


From mst at dev.mellanox.co.il  Tue Jun 26 00:06:41 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Jun 2007 10:06:41 +0300
Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <aday7i7wye1.fsf@cisco.com>
References: <20070625130604.GH15343@mellanox.co.il> <aday7i7wye1.fsf@cisco.com>
Message-ID: <20070626070641.GM15343@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH RFC] sharing userspace IB objects
> 
> Some initial reaction, in no particular order:
> 
>  - Having to allocate everything in memory that the library mmap()s
>    adds a lot of yucky stuff -- basically we need to implement our own
>    allocator for the shared memory offets.

Right.

>    I guess we could wrap this
>    in libibverbs and only implement it once but still we're basically
>    reimplementing malloc().

Right.

>    Is there really a strong use case for making every type of object
>    shareable?  Can we handle the SRC stuff without going to this
>    extreme of complexity?

This is not directly related to SRC: this is an effort
to make it possible to share QPs, CQ etc across processes
in the same way as they can be currently shared across threads.
So assuming that we want multiple processes to post to
the same QP, how can we support this?

>  - Given that everything shared is in shared memory,

I think we should try and keep shared memory usage to minimum.
For example, in mthca mr object just needs a key: we could
keep it in non-shared memory, just pass the key around
and save on sahred memory usage.

>    it seems we could
>    avoid all the marshalling/unmarshalling stuff, and just have the
>    shared objects have an ID along with an API to look up objects by
>    API.  That way we could let applications use more than just unix
>    sockets -- eg pipe() + fork() would work too.

We need to share file descriptors too. Is there a way to pass these
around besides unix domain sockets?

>  > +struct ibv_context *ibv_open_shared_device(struct ibv_device *device,
>  > +					   int fd, off_t offset);
> 
>  - This seems like too low-level an interface; I don't think there's
>    any way to enforce the fact that fd came from shm_open(), and I
>    don't see the use of offset at all.

Hmm, I accept offset is not too important.  About fd coming from shm_open - we
don't care, if the user wants to use a storage-backed file for this, let him.
And if you consider that case, maybe people want to use e.g.  mkstemp to open
these.  Even for shm_open, if you want a unique name, you'll have to implement
something complicated on top of shm_open.

So maybe add just fd to ibv_open_device, and value -1 would mean non-shared?
OK?

>    I think it would be more
>    sensible to extend the normal ibv_open_device() with a pathname,
>    and maybe a flag about whether to create or map an existing shared
>    context, and do all the shm stuff internally.  Then if someone
>    passes a NULL pathname, the context isn't shareable.

But are you sure we want to break API for all users just to add
a new capability for a minority that wants shared memory support?

-- 
MST


From ogerlitz at voltaire.com  Tue Jun 26 01:06:49 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 26 Jun 2007 11:06:49 +0300
Subject: [ofa-general] [PATCH RFC] sharing userspace IB objects
In-Reply-To: <adatzsvwy90.fsf@cisco.com>
References: <20070625130604.GH15343@mellanox.co.il><467FCEF6.9090905@voltaire.com>
	<adatzsvwy90.fsf@cisco.com>
Message-ID: <4680C919.7010908@voltaire.com>

Roland Dreier wrote:
> 
>  > So, if both sides use a shared-context, they would need to implement a
>  > synchronization protocol (that is don't deliver established event to
>  > the active before the passive accepted).
> 
> I'm missing something -- how does the sharing affect the need for
> synchronization?

if its a non shared context, the passive side creates QP, then allocates 
and posts RX buffers to the it before accepting the connection request, 
so synchronization is achieved by the IB CM.

Now, we you want process A to create a QP and accept the connection, 
then hand it to process B which will allocate and post RX to this QP, we 
are out of sync with the active side, unless first process B gets the QP 
  and post RX, and second, process A does accept on the conn req.

> Given all this sharing we probably need a way to handle registering
> shared memory more efficiently.  But actually QP/CQ buffers only need
> to be registered once, since the key that the HCA uses to access the
> buffer is set at creation time, and the other processes don't need
> separate keys.

OK, thanks for clarifying that.

Or.


From ogerlitz at voltaire.com  Tue Jun 26 01:30:18 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 26 Jun 2007 11:30:18 +0300
Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <20070626070641.GM15343@mellanox.co.il>
References: <20070625130604.GH15343@mellanox.co.il> <aday7i7wye1.fsf@cisco.com>
	<20070626070641.GM15343@mellanox.co.il>
Message-ID: <4680CE9A.8040306@voltaire.com>

Michael S. Tsirkin wrote:
>> Quoting Roland Dreier <rdreier at cisco.com>:
>> Subject: Re: [PATCH RFC] sharing userspace IB objects

>>    Is there really a strong use case for making every type of object
>>    shareable?  Can we handle the SRC stuff without going to this
>>    extreme of complexity?

> This is not directly related to SRC: this is an effort
> to make it possible to share QPs, CQ etc across processes
> in the same way as they can be currently shared across threads.
> So assuming that we want multiple processes to post to
> the same QP, how can we support this?

Indeed, lets zoom out a little and define the high level scope and 
design here, such that people can comment.

For example the design should treat also sharing/passing the CM 
(RDMA-CM) ID among processes, and state the limitations, eg on the 
private data etc.

>>  - Given that everything shared is in shared memory,

> I think we should try and keep shared memory usage to minimum.
> For example, in mthca mr object just needs a key: we could
> keep it in non-shared memory, just pass the key around
> and save on sahred memory usage.

what do you refer by "it" here? is it the lkey of the memory used for 
the QP, or the lkey describing the rx/tx buffers?

On the latter case, looking on ib_umem_get, it uses current->mm etc,
doesn't this mean that there should be some difference between shared to 
non shared memory?

Or.


From glebn at voltaire.com  Tue Jun 26 01:34:45 2007
From: glebn at voltaire.com (Gleb Natapov)
Date: Tue, 26 Jun 2007 11:34:45 +0300
Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <20070626070641.GM15343@mellanox.co.il>
References: <20070625130604.GH15343@mellanox.co.il> <aday7i7wye1.fsf@cisco.com>
	<20070626070641.GM15343@mellanox.co.il>
Message-ID: <20070626083445.GB1164@minantech.com>

On Tue, Jun 26, 2007 at 10:06:41AM +0300, Michael S. Tsirkin wrote:
> >    Is there really a strong use case for making every type of object
> >    shareable?  Can we handle the SRC stuff without going to this
> >    extreme of complexity?
> 
> This is not directly related to SRC: this is an effort
> to make it possible to share QPs, CQ etc across processes
> in the same way as they can be currently shared across threads.
> So assuming that we want multiple processes to post to
> the same QP, how can we support this?
Are you absolutely sure you even want to support this? What is the user
case? If multiple processes what to post to the same QP how will you
ensure that right process will receive right completion event? Or they
will be required to allocated send descriptors from a shared memory too?
I you want them to receive from the same QP they better allocate receive
descriptors/buffers from shared memory too.

--
			Gleb.


From mst at dev.mellanox.co.il  Tue Jun 26 02:31:47 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Jun 2007 12:31:47 +0300
Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <4680CE9A.8040306@voltaire.com>
References: <20070625130604.GH15343@mellanox.co.il> <aday7i7wye1.fsf@cisco.com>
	<20070626070641.GM15343@mellanox.co.il>
	<4680CE9A.8040306@voltaire.com>
Message-ID: <20070626093147.GN15343@mellanox.co.il>

> Quoting Or Gerlitz <ogerlitz at voltaire.com>:
> Subject: Re: Re: [PATCH RFC] sharing userspace IB objects
> 
> Michael S. Tsirkin wrote:
> >>Quoting Roland Dreier <rdreier at cisco.com>:
> >>Subject: Re: [PATCH RFC] sharing userspace IB objects
> 
> >>   Is there really a strong use case for making every type of object
> >>   shareable?  Can we handle the SRC stuff without going to this
> >>   extreme of complexity?
> 
> >This is not directly related to SRC: this is an effort
> >to make it possible to share QPs, CQ etc across processes
> >in the same way as they can be currently shared across threads.
> >So assuming that we want multiple processes to post to
> >the same QP, how can we support this?
> 
> Indeed, lets zoom out a little and define the high level scope and 
> design here, such that people can comment.

What I want to do is make it possible to share libibverbs objects between
processes, in the same way that it's possible to share them between threads.

> For example the design should treat also sharing/passing the CM 
> (RDMA-CM) ID among processes, and state the limitations, eg on the 
> private data etc.

This would have to be addressed in librdmacm. Let's finish libibverbs first.

> >> - Given that everything shared is in shared memory,
> 
> >I think we should try and keep shared memory usage to minimum.
> >For example, in mthca mr object just needs a key: we could
> >keep it in non-shared memory, just pass the key around
> >and save on shared memory usage.
> 
> what do you refer by "it" here?
> is it the lkey of the memory used for 
> the QP, or the lkey describing the rx/tx buffers?

Both, there's no real difference.

> On the latter case, looking on ib_umem_get, it uses current->mm etc,
> doesn't this mean that there should be some difference between shared to 
> non shared memory?

This is only used for registering the memory. Assuming it is registered
by some process, we can pass the key around between processes.

-- 
MST


From pnlai at galactic.com.hk  Tue Jun 26 02:48:22 2007
From: pnlai at galactic.com.hk (PN Lai)
Date: Tue, 26 Jun 2007 17:48:22 +0800
Subject: [ofa-general] SRP Failover
Message-ID: <000301c7b7d7$236b3a70$6a41af50$@com.hk>

Hi all,

 
I'm testing the SRP HA functions, but I have some questions.

I use 2 IB cables to connect the initiator and 1 IB cables to connect to the
storage.

 
I installed the OFED-1.2, enable the "SRP_LOAD=yes" and "SRPHA_ENABLE=yes"
in openib.conf.

After reboot, it discovers 2 targets /dev/sdbX and /dev/sdcX. 

 
However, I check the /var/log/srp_daemon.log, it shows:

..

26/05/07 17:42:57 : bad MAD status (110) from lid 257

26/05/07 17:43:30 : No response to inform info registration

26/05/07 17:43:30 : Fail to register to traps, maybe there is no opensm
running on fabric

..

 
But the opensm is running in both machines. I don't know whether it is
normal, or should it only discover a single target?

 
Now, my question is that if I mount the /dev/sdbX and write data to it, and
then remove 1 of the initiator cable, how the /dev/sdcX will replace the
/dev/sdbX so that I can continue to write the data?

 
Do I need to configure some extra files?

 
Thanks for reply.

 
PN

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070626/21172b97/attachment.html>

From mst at dev.mellanox.co.il  Tue Jun 26 02:51:25 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Jun 2007 12:51:25 +0300
Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <20070626083445.GB1164@minantech.com>
References: <20070625130604.GH15343@mellanox.co.il> <aday7i7wye1.fsf@cisco.com>
	<20070626070641.GM15343@mellanox.co.il>
	<20070626083445.GB1164@minantech.com>
Message-ID: <20070626095125.GO15343@mellanox.co.il>

> Quoting Gleb Natapov <glebn at voltaire.com>:
> Subject: Re: Re: [PATCH RFC] sharing userspace IB objects
> 
> On Tue, Jun 26, 2007 at 10:06:41AM +0300, Michael S. Tsirkin wrote:
> > >    Is there really a strong use case for making every type of object
> > >    shareable?  Can we handle the SRC stuff without going to this
> > >    extreme of complexity?
> > 
> > This is not directly related to SRC: this is an effort
> > to make it possible to share QPs, CQ etc across processes
> > in the same way as they can be currently shared across threads.
> > So assuming that we want multiple processes to post to
> > the same QP, how can we support this?
> 
> Are you absolutely sure you even want to support this?

Take a look here :)
http://www.quotedb.com/quotes/1007

> What is the user case?

Use case? Scalability. Pls go over Dror's presentation given at Sonoma -
he calls this SSQ.

> If multiple processes what to post to the same QP how will you
> ensure that right process will receive right completion event?

Same as with threads - memory for CQEs and locks will be allocated
in shared memory to make it possible for multiple processes to poll
CQ simultaneously, and they get completions in FCFS order.
What to do with them is up to the user.

> Or they
> will be required to allocated send descriptors from a shared memory too?

Yes, send descriptors will have to be placed in shared memory.

> I you want them to receive from the same QP they better allocate receive
> descriptors/buffers from shared memory too.

Yes, this will work, too.
With RDMA, you can have per-process receive buffers.
The SRC extension presented by Dror at Sonoma will make it possible
for SEND operations. I plan to open a separate thread to discuss SRC API.

-- 
MST


From mst at dev.mellanox.co.il  Tue Jun 26 03:20:45 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Jun 2007 13:20:45 +0300
Subject: [ofa-general] [PATCH] management: uint -> unsigned replacement
Message-ID: <20070626102045.GS15343@mellanox.co.il>

Some management headers use uint type which (on my system) is described as "old
compatibility name for C type".  This type might not defined e.g. if
__STRICT_ANSI__ is set, so it is best to avoid its usage at least in headers.
Replace by unsigned in all headers.

Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>

---

Hal can you apply this please? As a separate question:
I didn't go over .c files (we don't build them with strict ansi now),
but maybe removing uint there is a good idea, too?

diff --git a/libibcommon/include/infiniband/common.h b/libibcommon/include/infiniband/common.h
index 4c90955..80bfe1b 100644
--- a/libibcommon/include/infiniband/common.h
+++ b/libibcommon/include/infiniband/common.h
@@ -131,7 +131,7 @@ int	sys_read_string(char *dir_name, char *file_name, char *str, int max_len);
 int	sys_read_guid(char *dir_name, char *file_name, uint64_t *net_guid);
 int	sys_read_gid(char *dir_name, char *file_name, uint8_t *gid);
 int	sys_read_uint64(char *dir_name, char *file_name, uint64_t *u);
-int	sys_read_uint(char *dir_name, char *file_name, uint *u);
+int	sys_read_uint(char *dir_name, char *file_name, unsigned *u);
 
 /* stack.c */
 void	stack_dump(void);
diff --git a/libibmad/include/infiniband/mad.h b/libibmad/include/infiniband/mad.h
index a349e0f..ae847c9 100644
--- a/libibmad/include/infiniband/mad.h
+++ b/libibmad/include/infiniband/mad.h
@@ -166,8 +166,8 @@ typedef struct {
 } ib_dr_path_t;
 
 typedef struct {
-	uint id;
-	uint mod;
+	unsigned id;
+	unsigned mod;
 } ib_attr_t;
 
 typedef struct {
@@ -180,7 +180,7 @@ typedef struct {
 	uint64_t mkey;
 	uint64_t trid;	/* used for out mad if nonzero, return real val */
 	uint64_t mask;	/* for sa mads */
-	uint recsz;	/* for sa mads (attribute offset) */
+	unsigned recsz;	/* for sa mads (attribute offset) */
 	int timeout;
 	uint32_t oui;	/* for vendor range 2 mads */
 } ib_rpc_t;
@@ -193,7 +193,7 @@ typedef struct portid {
 	uint32_t qp;
 	uint32_t qkey;
 	uint8_t sl;
-	uint pkey_idx;
+	unsigned pkey_idx;
 } ib_portid_t;
 
 typedef void (ib_mad_dump_fn)(char *buf, int bufsz, void *val, int valsz);
@@ -566,23 +566,23 @@ enum SA_SIZES_ENUM {
 };
 
 typedef struct ib_sa_call {
-	uint attrid;
-	uint mod;
+	unsigned attrid;
+	unsigned mod;
 	uint64_t mask;
-	uint method;
+	unsigned method;
 
 	uint64_t trid;	/* used for out mad if nonzero, return real val */
-	uint recsz;	/* return field */
+	unsigned recsz;	/* return field */
 	ib_rmpp_hdr_t rmpp;
 } ib_sa_call_t;
 
 typedef struct ib_vendor_call {
-	uint method;
-	uint mgmt_class;
-	uint attrid;
-	uint mod;
+	unsigned method;
+	unsigned mgmt_class;
+	unsigned attrid;
+	unsigned mod;
 	uint32_t oui;
-	uint timeout;
+	unsigned timeout;
 	ib_rmpp_hdr_t rmpp;
 } ib_vendor_call_t;
 
@@ -740,14 +740,14 @@ void *  mad_rpc_rmpp(void *ibmad_port, ib_rpc_t *rpc, ib_portid_t *dport,
 		     ib_rmpp_hdr_t *rmpp, void *data);
 
 /* smp.c */
-uint8_t * smp_query(void *buf, ib_portid_t *id, uint attrid, uint mod,
-		    uint timeout);
-uint8_t * smp_set(void *buf, ib_portid_t *id, uint attrid, uint mod,
-		  uint timeout);
+uint8_t * smp_query(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod,
+		    unsigned timeout);
+uint8_t * smp_set(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod,
+		  unsigned timeout);
 
 inline static uint8_t *
-safe_smp_query(void *rcvbuf, ib_portid_t *portid, uint attrid, uint mod,
-	       uint timeout)
+safe_smp_query(void *rcvbuf, ib_portid_t *portid, unsigned attrid, unsigned mod,
+	       unsigned timeout)
 {
 	uint8_t *p;
 
@@ -759,8 +759,8 @@ safe_smp_query(void *rcvbuf, ib_portid_t *portid, uint attrid, uint mod,
 }
 
 inline static uint8_t *
-safe_smp_set(void *rcvbuf, ib_portid_t *portid, uint attrid, uint mod,
-	     uint timeout)
+safe_smp_set(void *rcvbuf, ib_portid_t *portid, unsigned attrid, unsigned mod,
+	     unsigned timeout)
 {
 	uint8_t *p;
 
@@ -773,15 +773,15 @@ safe_smp_set(void *rcvbuf, ib_portid_t *portid, uint attrid, uint mod,
 
 /* sa.c */
 uint8_t * sa_call(void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa,
-		  uint timeout);
+		  unsigned timeout);
 uint8_t * sa_rpc_call(void *ibmad_port, void *rcvbuf, ib_portid_t *portid,
-		      ib_sa_call_t *sa, uint timeout);
+		      ib_sa_call_t *sa, unsigned timeout);
 int	ib_path_query(ib_gid_t srcgid, ib_gid_t destgid, ib_portid_t *sm_id,
 		      void *buf);	/* returns lid */
 
 inline static uint8_t *
 safe_sa_call(void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa,
-	     uint timeout)
+	     unsigned timeout)
 {
 	uint8_t *p;
 
@@ -802,19 +802,19 @@ int	ib_resolve_self(ib_portid_t *portid, int *portnum, ib_gid_t *gid);
 
 /* gs.c */
 uint8_t *perf_classportinfo_query(void *rcvbuf, ib_portid_t *dest, int port,
-				  uint timeout);
+				  unsigned timeout);
 uint8_t *port_performance_query(void *rcvbuf, ib_portid_t *dest, int port,
-				uint timeout);
+				unsigned timeout);
 uint8_t *port_performance_reset(void *rcvbuf, ib_portid_t *dest, int port,
-				uint mask, uint timeout);
+				unsigned mask, unsigned timeout);
 uint8_t *port_performance_ext_query(void *rcvbuf, ib_portid_t *dest, int port,
-				    uint timeout);
+				    unsigned timeout);
 uint8_t *port_performance_ext_reset(void *rcvbuf, ib_portid_t *dest, int port,
-				    uint mask, uint timeout);
+				    unsigned mask, unsigned timeout);
 uint8_t *port_samples_control_query(void *rcvbuf, ib_portid_t *dest, int port,
-				    uint timeout);
+				    unsigned timeout);
 uint8_t *port_samples_result_query(void *rcvbuf, ib_portid_t *dest, int port,
-				   uint timeout);
+				   unsigned timeout);
 
 /* dump.c */
 ib_mad_dump_fn
diff --git a/libibumad/include/infiniband/umad.h b/libibumad/include/infiniband/umad.h
index 9020649..6149c8c 100644
--- a/libibumad/include/infiniband/umad.h
+++ b/libibumad/include/infiniband/umad.h
@@ -120,13 +120,13 @@ typedef struct ib_user_mad {
 typedef struct umad_port {
 	char ca_name[UMAD_CA_NAME_LEN];
 	int portnum;
-	uint base_lid;
-	uint lmc;
-	uint sm_lid;
-	uint sm_sl;
-	uint state;
-	uint phys_state;
-	uint rate;
+	unsigned base_lid;
+	unsigned lmc;
+	unsigned sm_lid;
+	unsigned sm_sl;
+	unsigned state;
+	unsigned phys_state;
+	unsigned rate;
 	uint64_t capmask;
 	uint64_t gid_prefix;
 	uint64_t port_guid;
@@ -134,7 +134,7 @@ typedef struct umad_port {
 
 typedef struct umad_ca {
 	char ca_name[UMAD_CA_NAME_LEN];
-	uint node_type;
+	unsigned node_type;
 	int numports;
 	char fw_ver[20];
 	char ca_type[40];


-- 
MST


From tziporet at dev.mellanox.co.il  Tue Jun 26 04:00:24 2007
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Tue, 26 Jun 2007 14:00:24 +0300
Subject: [ofa-general] Re: development process post ofed-1.2 gold.
In-Reply-To: <4680305D.9030701@opengridcomputing.com>
References: <4680305D.9030701@opengridcomputing.com>
Message-ID: <4680F1C8.3020207@mellanox.co.il>

Steve Wise wrote:
> Hey Tziporet,
>
> Is there any process for fixing bugs post OFED-1.2 "gold"?
>
> If I fix some bugs in the iw_cxgb3 driver, for example, should I post 
> the patches and ask that they be pulled into the ofed_1_2 repository?
>
> Or am I on my own?
>
> Thanks,
>
>
> Steve.
>
My suggestion is that we keep the ofed_1_2 branch alive, thus new fixes 
should be applied to the repository.
In this way we will be able to do a stable release when we decide.
Another question is regarding the daily build - I don't think we need 
them any more. We can do a weekly build, or run build in case of need 
(new patches submitted). What other people think about this?

Beside this I will open a support page for OFED 1.2 on the Wiki (as we 
have for OFED 1.1).
In this page we will document known bugs and point to the patches that 
fix them. People can use the ofed_patch.sh script (part of the docs RPM) 
to add or remove patches.

Tziporet


From glebn at voltaire.com  Tue Jun 26 04:13:42 2007
From: glebn at voltaire.com (Gleb Natapov)
Date: Tue, 26 Jun 2007 14:13:42 +0300
Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <20070626095125.GO15343@mellanox.co.il>
References: <20070625130604.GH15343@mellanox.co.il> <aday7i7wye1.fsf@cisco.com>
	<20070626070641.GM15343@mellanox.co.il>
	<20070626083445.GB1164@minantech.com>
	<20070626095125.GO15343@mellanox.co.il>
Message-ID: <20070626111342.GC1164@minantech.com>

On Tue, Jun 26, 2007 at 12:51:25PM +0300, Michael S. Tsirkin wrote:
> > Quoting Gleb Natapov <glebn at voltaire.com>:
> > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects
> > 
> > On Tue, Jun 26, 2007 at 10:06:41AM +0300, Michael S. Tsirkin wrote:
> > > >    Is there really a strong use case for making every type of object
> > > >    shareable?  Can we handle the SRC stuff without going to this
> > > >    extreme of complexity?
> > > 
> > > This is not directly related to SRC: this is an effort
> > > to make it possible to share QPs, CQ etc across processes
> > > in the same way as they can be currently shared across threads.
> > > So assuming that we want multiple processes to post to
> > > the same QP, how can we support this?
> > 
> > Are you absolutely sure you even want to support this?
> 
> Take a look here :)
> http://www.quotedb.com/quotes/1007
So there is still a chance you'll reconsider :)

> 
> > What is the user case?
> 
> Use case? Scalability. Pls go over Dror's presentation given at Sonoma -
> he calls this SSQ.
As far as I can tell he is talking about HW supported solution and not
half baked SW one.

> 
> > If multiple processes what to post to the same QP how will you
> > ensure that right process will receive right completion event?
> 
> Same as with threads - memory for CQEs and locks will be allocated
> in shared memory to make it possible for multiple processes to poll
> CQ simultaneously, and they get completions in FCFS order.
> What to do with them is up to the user.
Are you going to use this API? How? There is no point in discussing user
API without specifying HOW user will be using it. You have to ask what
user want and design your API accordingly and not other way around. So
suppose I want to use proposed API to implement super scalable MPI. I
setup shared QP/CQ/... and each rank start to post into the QP and
receive completion from CQ and suppose rank A picked completion that
belongs to rank B so I will need to setup out of band channel to pass
this completion from A to B. This is not looks good at all to me.

--
			Gleb.


From mst at dev.mellanox.co.il  Tue Jun 26 04:44:02 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Jun 2007 14:44:02 +0300
Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <20070626111342.GC1164@minantech.com>
References: <20070625130604.GH15343@mellanox.co.il> <aday7i7wye1.fsf@cisco.com>
	<20070626070641.GM15343@mellanox.co.il>
	<20070626083445.GB1164@minantech.com>
	<20070626095125.GO15343@mellanox.co.il>
	<20070626111342.GC1164@minantech.com>
Message-ID: <20070626114402.GT15343@mellanox.co.il>

> Quoting Gleb Natapov <glebn at voltaire.com>:
> Subject: Re: Re: [PATCH RFC] sharing userspace IB objects
> 
> On Tue, Jun 26, 2007 at 12:51:25PM +0300, Michael S. Tsirkin wrote:
> > > Quoting Gleb Natapov <glebn at voltaire.com>:
> > > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects
> > > 
> > > On Tue, Jun 26, 2007 at 10:06:41AM +0300, Michael S. Tsirkin wrote:
> > > > >    Is there really a strong use case for making every type of object
> > > > >    shareable?  Can we handle the SRC stuff without going to this
> > > > >    extreme of complexity?
> > > > 
> > > > This is not directly related to SRC: this is an effort
> > > > to make it possible to share QPs, CQ etc across processes
> > > > in the same way as they can be currently shared across threads.
> > > > So assuming that we want multiple processes to post to
> > > > the same QP, how can we support this?
> > > 
> > > Are you absolutely sure you even want to support this?
> > 
> > Take a look here :)
> > http://www.quotedb.com/quotes/1007
> So there is still a chance you'll reconsider :)

Sure, if someone comes up with a better way to improve scalability
for single-threaded applications.

> > 
> > > What is the user case?
> > 
> > Use case? Scalability. Pls go over Dror's presentation given at Sonoma -
> > he calls this SSQ.
>
> As far as I can tell he is talking about HW supported solution and not
> half baked SW one.

No, sharing a send queue must be done in software.  I don't really see the reason
for sarcasm: do you see value in sharing resources between multiple threads?
Why not multiple processes? Some people just don't want to program
in multithreaded environment.

> > 
> > > If multiple processes what to post to the same QP how will you
> > > ensure that right process will receive right completion event?
> > 
> > Same as with threads - memory for CQEs and locks will be allocated
> > in shared memory to make it possible for multiple processes to poll
> > CQ simultaneously, and they get completions in FCFS order.
> > What to do with them is up to the user.
>
> Are you going to use this API? How? There is no point in discussing user
> API without specifying HOW user will be using it. You have to ask what
> user want and design your API accordingly and not other way around.
> So suppose I want to use proposed API to implement super scalable MPI.

We'd come up with MPI_Send implementation inside libibverbs:). Think layered - I'd
like to make a minimal possible API change to make scalability improvements
possible.

> I setup shared QP/CQ/... and each rank start to post into the QP and
> receive completion from CQ and suppose rank A picked completion that
> belongs to rank B so I will need to setup out of band channel to pass
> this completion from A to B. This is not looks good at all to me.

This is not different from multiple threads sharing a CQ, really - and we do
support this already.  In the part of the message that you have cut out, I
showed some use cases that avoid this "side channel"
(which could be just shared memory btw).

-- 
MST


From jackm at dev.mellanox.co.il  Tue Jun 26 04:52:11 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Tue, 26 Jun 2007 14:52:11 +0300
Subject: [ofa-general] Re: [PATCH] libmlx4: fix adjustments for minimum qp
	capabilities in mlx4_create_qp
In-Reply-To: <adalke9zcvg.fsf@cisco.com>
References: <200706191647.41336.jackm@dev.mellanox.co.il>
	<200706240900.16563.jackm@dev.mellanox.co.il>
	<adalke9zcvg.fsf@cisco.com>
Message-ID: <200706261452.12193.jackm@dev.mellanox.co.il>

On Sunday 24 June 2007 16:43, Roland Dreier wrote:
> 
> But the function hasn't looked like that for a few weeks now, since
> commit e7d06519.
> 
Oops, my mistake (missed that commit when cherrypicking.  I'm now using
your libmlx4 directly).

- Jack


From glebn at voltaire.com  Tue Jun 26 05:25:39 2007
From: glebn at voltaire.com (Gleb Natapov)
Date: Tue, 26 Jun 2007 15:25:39 +0300
Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <20070626114402.GT15343@mellanox.co.il>
References: <20070625130604.GH15343@mellanox.co.il> <aday7i7wye1.fsf@cisco.com>
	<20070626070641.GM15343@mellanox.co.il>
	<20070626083445.GB1164@minantech.com>
	<20070626095125.GO15343@mellanox.co.il>
	<20070626111342.GC1164@minantech.com>
	<20070626114402.GT15343@mellanox.co.il>
Message-ID: <20070626122539.GF1164@minantech.com>

On Tue, Jun 26, 2007 at 02:44:02PM +0300, Michael S. Tsirkin wrote:
> > Quoting Gleb Natapov <glebn at voltaire.com>:
> > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects
> > 
> > On Tue, Jun 26, 2007 at 12:51:25PM +0300, Michael S. Tsirkin wrote:
> > > > Quoting Gleb Natapov <glebn at voltaire.com>:
> > > > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects
> > > > 
> > > > On Tue, Jun 26, 2007 at 10:06:41AM +0300, Michael S. Tsirkin wrote:
> > > > > >    Is there really a strong use case for making every type of object
> > > > > >    shareable?  Can we handle the SRC stuff without going to this
> > > > > >    extreme of complexity?
> > > > > 
> > > > > This is not directly related to SRC: this is an effort
> > > > > to make it possible to share QPs, CQ etc across processes
> > > > > in the same way as they can be currently shared across threads.
> > > > > So assuming that we want multiple processes to post to
> > > > > the same QP, how can we support this?
> > > > 
> > > > Are you absolutely sure you even want to support this?
> > > 
> > > Take a look here :)
> > > http://www.quotedb.com/quotes/1007
> > So there is still a chance you'll reconsider :)
> 
> Sure, if someone comes up with a better way to improve scalability
> for single-threaded applications.
What good is a solution that no one will use? No solution is better then
a bad one because this will motivate people to look for proper solution.

> 
> > > 
> > > > What is the user case?
> > > 
> > > Use case? Scalability. Pls go over Dror's presentation given at Sonoma -
> > > he calls this SSQ.
> >
> > As far as I can tell he is talking about HW supported solution and not
> > half baked SW one.
> 
> No, sharing a send queue must be done in software.  I don't really see the reason
> for sarcasm: do you see value in sharing resources between multiple threads?
> Why not multiple processes? Some people just don't want to program
> in multithreaded environment.
Yes I see the value in sharing resources between threads and processes
if done right. This proposition is far from being right. There is not
sarcasm in my sentence either. You can't claim that what you propose is as seamless
as it should be.

I have no problem with sharing send queue. What I want to be able to do
is to attach CQ from each process to a shared QP. When send posted by
process A completes the completion is posted into A's CQ. HW should be
able to multiplex this IMO. 

> 
> > > 
> > > > If multiple processes what to post to the same QP how will you
> > > > ensure that right process will receive right completion event?
> > > 
> > > Same as with threads - memory for CQEs and locks will be allocated
> > > in shared memory to make it possible for multiple processes to poll
> > > CQ simultaneously, and they get completions in FCFS order.
> > > What to do with them is up to the user.
> >
> > Are you going to use this API? How? There is no point in discussing user
> > API without specifying HOW user will be using it. You have to ask what
> > user want and design your API accordingly and not other way around.
> > So suppose I want to use proposed API to implement super scalable MPI.
> 
> We'd come up with MPI_Send implementation inside libibverbs:). Think layered - I'd
> like to make a minimal possible API change to make scalability improvements
> possible.
They are not really possible with proposed API (beyond academic papers that is). You are
welcome to implement MPI_Send inside libibverbs. After all this is what Myricom did.

> 
> > I setup shared QP/CQ/... and each rank start to post into the QP and
> > receive completion from CQ and suppose rank A picked completion that
> > belongs to rank B so I will need to setup out of band channel to pass
> > this completion from A to B. This is not looks good at all to me.
> 
> This is not different from multiple threads sharing a CQ, really - and we do
This is very different from  multiple threads sharing a CQ. In
multi threaded  scenario I can design my program in a way that each
thread will be able to handle completion. We'll have to pass 
completion between processes in the scenario you propose.

> support this already.  In the part of the message that you have cut out, I
> showed some use cases that avoid this "side channel"
What? RDMA? What about a completion of RDMA operation? You'll have to
pass it around. I agree that RDMA situation is much better then
send/receive one, but there is no RDMAs without send/recv after it.

> (which could be just shared memory btw).
> 
And you introduce another scalability problem here. On a big SMP node
will have to create channel between each pair of processes to pass
completions and will have to poll each one of them besides polling CQ.
Here goes you latency. And I am not saying this is not possible, I am
saying it is so bad that it is not worth doing.

--
			Gleb.


From mst at dev.mellanox.co.il  Tue Jun 26 05:58:02 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Jun 2007 15:58:02 +0300
Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <20070626122539.GF1164@minantech.com>
References: <20070625130604.GH15343@mellanox.co.il> <aday7i7wye1.fsf@cisco.com>
	<20070626070641.GM15343@mellanox.co.il>
	<20070626083445.GB1164@minantech.com>
	<20070626095125.GO15343@mellanox.co.il>
	<20070626111342.GC1164@minantech.com>
	<20070626114402.GT15343@mellanox.co.il>
	<20070626122539.GF1164@minantech.com>
Message-ID: <20070626125802.GU15343@mellanox.co.il>

> > No, sharing a send queue must be done in software.  I don't really see the reason
> > for sarcasm: do you see value in sharing resources between multiple threads?
> > Why not multiple processes? Some people just don't want to program
> > in multithreaded environment.
>
> Yes I see the value in sharing resources between threads and processes
> if done right. This proposition is far from being right.

Ahem, *what* are you talking about? Sharing resources between threads was supported in
libibverbs 1.0, *right from the start*. This is still the case with 1.1, and this API
matches verbs quite closely which means that it can work pretty much on any
hardware.

You want to propose some enhancements, go ahead (and open a new thread for this).
All *I* want to do is support sharing resources in singlethreaded environment.

> There is not sarcasm in my sentence either. You can't claim that what you
> propose is as seamless as it should be.

I think it's as seamless as it *can* be.

> I have no problem with sharing send queue. What I want to be able to do
> is to attach CQ from each process to a shared QP. When send posted by
> process A completes the completion is posted into A's CQ. HW should be
> able to multiplex this IMO. 

Well, since there is no hardware that does this, why bother discussing this?

> > > > > If multiple processes what to post to the same QP how will you
> > > > > ensure that right process will receive right completion event?
> > > > 
> > > > Same as with threads - memory for CQEs and locks will be allocated
> > > > in shared memory to make it possible for multiple processes to poll
> > > > CQ simultaneously, and they get completions in FCFS order.
> > > > What to do with them is up to the user.
> > >
> > > Are you going to use this API? How? There is no point in discussing user
> > > API without specifying HOW user will be using it. You have to ask what
> > > user want and design your API accordingly and not other way around.
> > > So suppose I want to use proposed API to implement super scalable MPI.
> > 
> > We'd come up with MPI_Send implementation inside libibverbs:). Think layered - I'd
> > like to make a minimal possible API change to make scalability improvements
> > possible.
> 
> They are not really possible with proposed API (beyond academic papers that is).

I'm talking to MPI guys here, too, so I don't think there's real danger
that the final API will be useless for them.

> You are
> welcome to implement MPI_Send inside libibverbs. After all this is what Myricom did.

I think keeping a general verbs layer is a better approach for now.

> > 
> > > I setup shared QP/CQ/... and each rank start to post into the QP and
> > > receive completion from CQ and suppose rank A picked completion that
> > > belongs to rank B so I will need to setup out of band channel to pass
> > > this completion from A to B. This is not looks good at all to me.
> > 
> > This is not different from multiple threads sharing a CQ, really - and we do
> This is very different from  multiple threads sharing a CQ. In
> multi threaded  scenario I can design my program in a way that each
> thread will be able to handle completion. We'll have to pass 
> completion between processes in the scenario you propose.
> 
> > support this already.  In the part of the message that you have cut out, I
> > showed some use cases that avoid this "side channel"
>
> What? RDMA?

RDMA and SRC.

> What about a completion of RDMA operation? You'll have to
> pass it around.

Since all it does it free up the buffers, it's quite possible
that processing of send completions can be done by any process.
This really depends on how the application wants to do this:
again, you seem to ignore the fact that the issue is the same for
multithreaded programs, and they seem to cope fine.

> I agree that RDMA situation is much better then
> send/receive one, but there is no RDMAs without send/recv after it.

Not really - polling on data has been used in MPI for ages now.
With SRC you can have separate completions on the receive side.

> > (which could be just shared memory btw).
>
> And you introduce another scalability problem here. On a big SMP node
> will have to create channel between each pair of processes to pass
> completions and will have to poll each one of them besides polling CQ.
> Here goes you latency. And I am not saying this is not possible, I am
> saying it is so bad that it is not worth doing.

No, you got that wrong: there need not be any real "channels" with shared
memory: just a single data structure shared by all processes woul do.
But again, you are getting into MPI design, which is the wrong layer to discuss here.

-- 
MST


From halr at voltaire.com  Tue Jun 26 06:04:10 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 26 Jun 2007 09:04:10 -0400
Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement
In-Reply-To: <20070626102045.GS15343@mellanox.co.il>
References: <20070626102045.GS15343@mellanox.co.il>
Message-ID: <1182862966.10379.425353.camel@hal.voltaire.com>

On Tue, 2007-06-26 at 06:20, Michael S. Tsirkin wrote:
> Some management headers use uint type which (on my system)

What's your system ?

>  is described as "old
> compatibility name for C type".  This type might not defined e.g. if
> __STRICT_ANSI__ is set,

Is strict ANSI a requirement ?

>  so it is best to avoid its usage at least in headers.
> Replace by unsigned in all headers.
> 
> Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>
> 
> ---
> 
> Hal can you apply this please? As a separate question:
> I didn't go over .c files (we don't build them with strict ansi now),
> but maybe removing uint there is a good idea, too?

Yes but it will take more than this to make them strict ANSI.

Is this as an OFED 1.2 follow on or just for master ?

-- Hal

> diff --git a/libibcommon/include/infiniband/common.h b/libibcommon/include/infiniband/common.h
> index 4c90955..80bfe1b 100644
> --- a/libibcommon/include/infiniband/common.h
> +++ b/libibcommon/include/infiniband/common.h
> @@ -131,7 +131,7 @@ int	sys_read_string(char *dir_name, char *file_name, char *str, int max_len);
>  int	sys_read_guid(char *dir_name, char *file_name, uint64_t *net_guid);
>  int	sys_read_gid(char *dir_name, char *file_name, uint8_t *gid);
>  int	sys_read_uint64(char *dir_name, char *file_name, uint64_t *u);
> -int	sys_read_uint(char *dir_name, char *file_name, uint *u);
> +int	sys_read_uint(char *dir_name, char *file_name, unsigned *u);
>  
>  /* stack.c */
>  void	stack_dump(void);
> diff --git a/libibmad/include/infiniband/mad.h b/libibmad/include/infiniband/mad.h
> index a349e0f..ae847c9 100644
> --- a/libibmad/include/infiniband/mad.h
> +++ b/libibmad/include/infiniband/mad.h
> @@ -166,8 +166,8 @@ typedef struct {
>  } ib_dr_path_t;
>  
>  typedef struct {
> -	uint id;
> -	uint mod;
> +	unsigned id;
> +	unsigned mod;
>  } ib_attr_t;
>  
>  typedef struct {
> @@ -180,7 +180,7 @@ typedef struct {
>  	uint64_t mkey;
>  	uint64_t trid;	/* used for out mad if nonzero, return real val */
>  	uint64_t mask;	/* for sa mads */
> -	uint recsz;	/* for sa mads (attribute offset) */
> +	unsigned recsz;	/* for sa mads (attribute offset) */
>  	int timeout;
>  	uint32_t oui;	/* for vendor range 2 mads */
>  } ib_rpc_t;
> @@ -193,7 +193,7 @@ typedef struct portid {
>  	uint32_t qp;
>  	uint32_t qkey;
>  	uint8_t sl;
> -	uint pkey_idx;
> +	unsigned pkey_idx;
>  } ib_portid_t;
>  
>  typedef void (ib_mad_dump_fn)(char *buf, int bufsz, void *val, int valsz);
> @@ -566,23 +566,23 @@ enum SA_SIZES_ENUM {
>  };
>  
>  typedef struct ib_sa_call {
> -	uint attrid;
> -	uint mod;
> +	unsigned attrid;
> +	unsigned mod;
>  	uint64_t mask;
> -	uint method;
> +	unsigned method;
>  
>  	uint64_t trid;	/* used for out mad if nonzero, return real val */
> -	uint recsz;	/* return field */
> +	unsigned recsz;	/* return field */
>  	ib_rmpp_hdr_t rmpp;
>  } ib_sa_call_t;
>  
>  typedef struct ib_vendor_call {
> -	uint method;
> -	uint mgmt_class;
> -	uint attrid;
> -	uint mod;
> +	unsigned method;
> +	unsigned mgmt_class;
> +	unsigned attrid;
> +	unsigned mod;
>  	uint32_t oui;
> -	uint timeout;
> +	unsigned timeout;
>  	ib_rmpp_hdr_t rmpp;
>  } ib_vendor_call_t;
>  
> @@ -740,14 +740,14 @@ void *  mad_rpc_rmpp(void *ibmad_port, ib_rpc_t *rpc, ib_portid_t *dport,
>  		     ib_rmpp_hdr_t *rmpp, void *data);
>  
>  /* smp.c */
> -uint8_t * smp_query(void *buf, ib_portid_t *id, uint attrid, uint mod,
> -		    uint timeout);
> -uint8_t * smp_set(void *buf, ib_portid_t *id, uint attrid, uint mod,
> -		  uint timeout);
> +uint8_t * smp_query(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod,
> +		    unsigned timeout);
> +uint8_t * smp_set(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod,
> +		  unsigned timeout);
>  
>  inline static uint8_t *
> -safe_smp_query(void *rcvbuf, ib_portid_t *portid, uint attrid, uint mod,
> -	       uint timeout)
> +safe_smp_query(void *rcvbuf, ib_portid_t *portid, unsigned attrid, unsigned mod,
> +	       unsigned timeout)
>  {
>  	uint8_t *p;
>  
> @@ -759,8 +759,8 @@ safe_smp_query(void *rcvbuf, ib_portid_t *portid, uint attrid, uint mod,
>  }
>  
>  inline static uint8_t *
> -safe_smp_set(void *rcvbuf, ib_portid_t *portid, uint attrid, uint mod,
> -	     uint timeout)
> +safe_smp_set(void *rcvbuf, ib_portid_t *portid, unsigned attrid, unsigned mod,
> +	     unsigned timeout)
>  {
>  	uint8_t *p;
>  
> @@ -773,15 +773,15 @@ safe_smp_set(void *rcvbuf, ib_portid_t *portid, uint attrid, uint mod,
>  
>  /* sa.c */
>  uint8_t * sa_call(void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa,
> -		  uint timeout);
> +		  unsigned timeout);
>  uint8_t * sa_rpc_call(void *ibmad_port, void *rcvbuf, ib_portid_t *portid,
> -		      ib_sa_call_t *sa, uint timeout);
> +		      ib_sa_call_t *sa, unsigned timeout);
>  int	ib_path_query(ib_gid_t srcgid, ib_gid_t destgid, ib_portid_t *sm_id,
>  		      void *buf);	/* returns lid */
>  
>  inline static uint8_t *
>  safe_sa_call(void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa,
> -	     uint timeout)
> +	     unsigned timeout)
>  {
>  	uint8_t *p;
>  
> @@ -802,19 +802,19 @@ int	ib_resolve_self(ib_portid_t *portid, int *portnum, ib_gid_t *gid);
>  
>  /* gs.c */
>  uint8_t *perf_classportinfo_query(void *rcvbuf, ib_portid_t *dest, int port,
> -				  uint timeout);
> +				  unsigned timeout);
>  uint8_t *port_performance_query(void *rcvbuf, ib_portid_t *dest, int port,
> -				uint timeout);
> +				unsigned timeout);
>  uint8_t *port_performance_reset(void *rcvbuf, ib_portid_t *dest, int port,
> -				uint mask, uint timeout);
> +				unsigned mask, unsigned timeout);
>  uint8_t *port_performance_ext_query(void *rcvbuf, ib_portid_t *dest, int port,
> -				    uint timeout);
> +				    unsigned timeout);
>  uint8_t *port_performance_ext_reset(void *rcvbuf, ib_portid_t *dest, int port,
> -				    uint mask, uint timeout);
> +				    unsigned mask, unsigned timeout);
>  uint8_t *port_samples_control_query(void *rcvbuf, ib_portid_t *dest, int port,
> -				    uint timeout);
> +				    unsigned timeout);
>  uint8_t *port_samples_result_query(void *rcvbuf, ib_portid_t *dest, int port,
> -				   uint timeout);
> +				   unsigned timeout);
>  
>  /* dump.c */
>  ib_mad_dump_fn
> diff --git a/libibumad/include/infiniband/umad.h b/libibumad/include/infiniband/umad.h
> index 9020649..6149c8c 100644
> --- a/libibumad/include/infiniband/umad.h
> +++ b/libibumad/include/infiniband/umad.h
> @@ -120,13 +120,13 @@ typedef struct ib_user_mad {
>  typedef struct umad_port {
>  	char ca_name[UMAD_CA_NAME_LEN];
>  	int portnum;
> -	uint base_lid;
> -	uint lmc;
> -	uint sm_lid;
> -	uint sm_sl;
> -	uint state;
> -	uint phys_state;
> -	uint rate;
> +	unsigned base_lid;
> +	unsigned lmc;
> +	unsigned sm_lid;
> +	unsigned sm_sl;
> +	unsigned state;
> +	unsigned phys_state;
> +	unsigned rate;
>  	uint64_t capmask;
>  	uint64_t gid_prefix;
>  	uint64_t port_guid;
> @@ -134,7 +134,7 @@ typedef struct umad_port {
>  
>  typedef struct umad_ca {
>  	char ca_name[UMAD_CA_NAME_LEN];
> -	uint node_type;
> +	unsigned node_type;
>  	int numports;
>  	char fw_ver[20];
>  	char ca_type[40];
> 


From mst at dev.mellanox.co.il  Tue Jun 26 06:24:57 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Jun 2007 16:24:57 +0300
Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement
In-Reply-To: <1182862966.10379.425353.camel@hal.voltaire.com>
References: <20070626102045.GS15343@mellanox.co.il>
	<1182862966.10379.425353.camel@hal.voltaire.com>
Message-ID: <20070626132457.GA29602@mellanox.co.il>

> Quoting Hal Rosenstock <halr at voltaire.com>:
> Subject: Re: [PATCH] management: uint -> unsigned replacement
> 
> On Tue, 2007-06-26 at 06:20, Michael S. Tsirkin wrote:
> > Some management headers use uint type which (on my system)
> 
> What's your system ?

SLES10.

> >  is described as "old
> > compatibility name for C type".  This type might not defined e.g. if
> > __STRICT_ANSI__ is set,
> 
> Is strict ANSI a requirement ?

Not sure. The app in question does
#define _XOPEN_SOURCE 600

> >  so it is best to avoid its usage at least in headers.
> > Replace by unsigned in all headers.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>
> > 
> > ---
> > 
> > Hal can you apply this please? As a separate question:
> > I didn't go over .c files (we don't build them with strict ansi now),
> > but maybe removing uint there is a good idea, too?
> 
> Yes but it will take more than this to make them strict ANSI.
> 
> Is this as an OFED 1.2 follow on or just for master ?

You decide.

-- 
MST


From glebn at voltaire.com  Tue Jun 26 06:33:17 2007
From: glebn at voltaire.com (Gleb Natapov)
Date: Tue, 26 Jun 2007 16:33:17 +0300
Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <20070626125802.GU15343@mellanox.co.il>
References: <20070625130604.GH15343@mellanox.co.il> <aday7i7wye1.fsf@cisco.com>
	<20070626070641.GM15343@mellanox.co.il>
	<20070626083445.GB1164@minantech.com>
	<20070626095125.GO15343@mellanox.co.il>
	<20070626111342.GC1164@minantech.com>
	<20070626114402.GT15343@mellanox.co.il>
	<20070626122539.GF1164@minantech.com>
	<20070626125802.GU15343@mellanox.co.il>
Message-ID: <20070626133317.GH1164@minantech.com>

On Tue, Jun 26, 2007 at 03:58:02PM +0300, Michael S. Tsirkin wrote:
> > > No, sharing a send queue must be done in software.  I don't really see the reason
> > > for sarcasm: do you see value in sharing resources between multiple threads?
> > > Why not multiple processes? Some people just don't want to program
> > > in multithreaded environment.
> >
> > Yes I see the value in sharing resources between threads and processes
> > if done right. This proposition is far from being right.
> 
> Ahem, *what* are you talking about? Sharing resources between threads was supported in
> libibverbs 1.0, *right from the start*. This is still the case with 1.1, and this API
> matches verbs quite closely which means that it can work pretty much on any
> hardware.
Why do you think that I have a problem with multithreaded application is
beyond my understanding. I have a problem with you thinking that peaking a
completion by random process in FCFS order is a good idea. It has limited
use for specially designed application. MPI is not one of them.

> 
> You want to propose some enhancements, go ahead (and open a new thread for this).
> All *I* want to do is support sharing resources in singlethreaded environment.
> 
You asked for RFC? Don't do it next time if you don't want to hear any.

> > There is not sarcasm in my sentence either. You can't claim that what you
> > propose is as seamless as it should be.
> 
> I think it's as seamless as it *can* be.
If it can't be better it is not worth to be implemented. This my opinion. I can stop 
you from doing it :)

> 
> > I have no problem with sharing send queue. What I want to be able to do
> > is to attach CQ from each process to a shared QP. When send posted by
> > process A completes the completion is posted into A's CQ. HW should be
> > able to multiplex this IMO. 
> 
> Well, since there is no hardware that does this, why bother discussing this?
Because Mellanox is a hardware company, so do improvements in the right
place and don't add craft to library just to claim that you are super
scalable. If it can't be implemented in HW then can you explain why
please?

> 
> > > > > > If multiple processes what to post to the same QP how will you
> > > > > > ensure that right process will receive right completion event?
> > > > > 
> > > > > Same as with threads - memory for CQEs and locks will be allocated
> > > > > in shared memory to make it possible for multiple processes to poll
> > > > > CQ simultaneously, and they get completions in FCFS order.
> > > > > What to do with them is up to the user.
> > > >
> > > > Are you going to use this API? How? There is no point in discussing user
> > > > API without specifying HOW user will be using it. You have to ask what
> > > > user want and design your API accordingly and not other way around.
> > > > So suppose I want to use proposed API to implement super scalable MPI.
> > > 
> > > We'd come up with MPI_Send implementation inside libibverbs:). Think layered - I'd
> > > like to make a minimal possible API change to make scalability improvements
> > > possible.
> > 
> > They are not really possible with proposed API (beyond academic papers that is).
> 
> I'm talking to MPI guys here, too, so I don't think there's real danger
> that the final API will be useless for them.
So let them talk and specify here how they are gonna use it and we will have
good use case for your design.

> 
> > You are
> > welcome to implement MPI_Send inside libibverbs. After all this is what Myricom did.
> 
> I think keeping a general verbs layer is a better approach for now.
Then don't propose something you are not going to implement.

> 
> > > 
> > > > I setup shared QP/CQ/... and each rank start to post into the QP and
> > > > receive completion from CQ and suppose rank A picked completion that
> > > > belongs to rank B so I will need to setup out of band channel to pass
> > > > this completion from A to B. This is not looks good at all to me.
> > > 
> > > This is not different from multiple threads sharing a CQ, really - and we do
> > This is very different from  multiple threads sharing a CQ. In
> > multi threaded  scenario I can design my program in a way that each
> > thread will be able to handle completion. We'll have to pass 
> > completion between processes in the scenario you propose.
> > 
> > > support this already.  In the part of the message that you have cut out, I
> > > showed some use cases that avoid this "side channel"
> >
> > What? RDMA?
> 
> RDMA and SRC.
> 
> > What about a completion of RDMA operation? You'll have to
> > pass it around.
> 
> Since all it does it free up the buffers, it's quite possible
> that processing of send completions can be done by any process.
No it can't in case of MPI. MPI also progress user request on the event.
Yes, you can design program where it will be possible, but not MPI.

> This really depends on how the application wants to do this:
> again, you seem to ignore the fact that the issue is the same for
> multithreaded programs, and they seem to cope fine.
No you sees to ignore the fact that multithreaded program is something
_completely_ different. In multithreaded program _all_ state is shared
between processes. In multiprocess scenario only a state you place into
shared memory is shared. This difference is very important.

> 
> > I agree that RDMA situation is much better then
> > send/receive one, but there is no RDMAs without send/recv after it.
> 
> Not really - polling on data has been used in MPI for ages now.
You are greatly misinformed. Polling on data used only for limited
number of peers for sending small messages and works only on Mellanox HCA
on _some_ archs and greatly non-scalable in memory consumption and
polling time. Go ask your MPI team.

> With SRC you can have separate completions on the receive side.
> 
> > > (which could be just shared memory btw).
> >
> > And you introduce another scalability problem here. On a big SMP node
> > will have to create channel between each pair of processes to pass
> > completions and will have to poll each one of them besides polling CQ.
> > Here goes you latency. And I am not saying this is not possible, I am
> > saying it is so bad that it is not worth doing.
> 
> No, you got that wrong: there need not be any real "channels" with shared
> memory: just a single data structure shared by all processes woul do.
> But again, you are getting into MPI design, which is the wrong layer to discuss here.
> 
I am talking about only application this is meant to be used by (in
short term anyway). So if the design is bad for MPI it is bad. About
"channels" you either create one between each pair of ranks or you use
locking. Both solutions kills latency.

--
			Gleb.


From mst at dev.mellanox.co.il  Tue Jun 26 07:02:39 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Jun 2007 17:02:39 +0300
Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <20070626133317.GH1164@minantech.com>
References: <20070625130604.GH15343@mellanox.co.il> <aday7i7wye1.fsf@cisco.com>
	<20070626070641.GM15343@mellanox.co.il>
	<20070626083445.GB1164@minantech.com>
	<20070626095125.GO15343@mellanox.co.il>
	<20070626111342.GC1164@minantech.com>
	<20070626114402.GT15343@mellanox.co.il>
	<20070626122539.GF1164@minantech.com>
	<20070626125802.GU15343@mellanox.co.il>
	<20070626133317.GH1164@minantech.com>
Message-ID: <20070626140239.GB29602@mellanox.co.il>

> Quoting Gleb Natapov <glebn at voltaire.com>:
> Subject: Re: Re: [PATCH RFC] sharing userspace IB objects
> 
> On Tue, Jun 26, 2007 at 03:58:02PM +0300, Michael S. Tsirkin wrote:
> > > > No, sharing a send queue must be done in software.  I don't really see the reason
> > > > for sarcasm: do you see value in sharing resources between multiple threads?
> > > > Why not multiple processes? Some people just don't want to program
> > > > in multithreaded environment.
> > >
> > > Yes I see the value in sharing resources between threads and processes
> > > if done right. This proposition is far from being right.
> > 
> > Ahem, *what* are you talking about? Sharing resources between threads was supported in
> > libibverbs 1.0, *right from the start*. This is still the case with 1.1, and this API
> > matches verbs quite closely which means that it can work pretty much on any
> > hardware.
> 
> Why do you think that I have a problem with multithreaded application is
> beyond my understanding. I have a problem with you thinking that peaking a
> completion by random process in FCFS order is a good idea.

Should that have been "picking"?  I keep telling you. With multithreaded
applications *that's what currently happens*. If multiple threads poll a CQ,
which one gets which completion is currently unspecified. Are you
worried about this? If not, why are you worried when multiple
processes do this?

Look here, hardware features do *not* just materialize when you build an API for
them.  What good would a pretty API that no hardware supports be?  It's the
other way around: I'm trying to extend our API to improve scalability with
existing hardware.

-- 
MST


From mhanafi at csc.com  Tue Jun 26 07:12:27 2007
From: mhanafi at csc.com (Mahmoud Hanafi)
Date: Tue, 26 Jun 2007 10:12:27 -0400
Subject: [ofa-general] low performance with multiple LUNs on a single
	port	with ib_srp
In-Reply-To: <537C6C0940C6C143AA46A88946B8541708BB417C@ORNLEXCHANGE.ornl.gov>
Message-ID: <OFF403D794.319B982C-ON85257306.004DAF52-85257306.004E0E68@csc.com>

Here are some performance results that I was able to achieve running 
across several Luns.

Config setting
1 host port to 1 ddn port.
OFED1.2rc6

[root at io1 IB]# cat /etc/modprobe.conf
alias scsi_hostadapter qla2xxx
alias scsi_hostadapter1 megaraid_sas
alias scsi_hostadapter2 qla2400
alias usb-controller ehci-hcd
alias usb-controller1 uhci-hcd
alias ib0 ib_ipoib
alias ib1 ib_ipoib
alias net-pf-27 ib_sdp
alias lustre llite
options lnet networks=o2ib
alias eth1 bnx2
alias eth0 bnx2
options ib_srp srp_sg_tablesize=256

[root at io1 IB]# cat /etc/srp_daemon.conf
a max_sect=8192,max_cmd_per_lun=3


                Write (MB/sec) 
                Number of LUNS 
"Rec Length
 (KB)"          1       2       3       4       5       6       7
 16              23      37      41      47      51      55      57 
 32              44      72      82      94      102     109     114 
 64              79      136     163     187     201     215     226 
 128             131     247     310     352     380     405     426 
 256             194     363     477     549     616     673     698 
 512             299     558     553     670     717     725     727 
 1,024           434     591     718     725     725     725     726 
 2,048           465     608     687     723     725     726     726 
 4,096           523     695     722     726     727     727     726 
 8,192           537     702     726     727     728     726     727 

                        Read (MB/sec)  
                        Number of LUNS 
"Rec Length
 (KB)"          1       2       3       4       5       6       7
 16              26      41      45      56      60      63      62 
 32              48      78      97      107     117     122     124 
 64              81      140     172     196     215     227     237 
 128             126     207     269     314     347     373     391 
 256             174     271     389     482     500     537     546 
 512             255     375     418     478     528     556     562 
 1,024           330     430     505     554     564     564     564 
 2,048           326     445     527     553     561     563     564 
 4,096           357     513     556     562     564     564     565 
 8,192           360     520     558     564     565     565     565 


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This is a PRIVATE message. If you are not the intended recipient, please 
delete without copying and kindly advise us by e-mail of the mistake in 
delivery. NOTE: Regardless of content, this e-mail shall not operate to 
bind CSC to any order or other contract unless pursuant to explicit 
written agreement or government initiative expressly permitting the use of 
e-mail for such purpose.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


"Canon, Richard Shane" <canonrs at ornl.gov> 
Sent by: general-bounces at lists.openfabrics.org
06/25/2007 05:51 PM

To
Vu Pham <vuhuong at mellanox.com>
cc
general at lists.openfabrics.org
Subject
[ofa-general] low performance with multiple LUNs on a single port with 
ib_srp


Greetings,
 
Hopefully the subject says it all…
 
I’ve stumbled on a performance issue with the OFED ib_srp driver.  Here is 
the configuration.  I am testing with a DDN 9550 and a single host system. 
 The systems are connected by two SDR links.  On the host side there is a 
dual port (DDR) card.  On the DDN side, both lines go into a single 
singlet (even though it is a couplet).  The lines go into two distinct 
cards on the DDN side (if you are familiar with the layout).  The testing 
used OFED 1.2.
 
Now for the tests…  If I run a single stream test I’m seeing good result 
with over 700 MB/s.  These tests are run using sg_dd with the directio 
flag.  If I run two concurrent streams against two LUNs that are each 
presented over a single port on the DDN (and therefore accessed by a 
single port on the host side), the aggregate performance drop to around 
120 MB/s (60 MB/s per stream).
 
Just to confirm it isn’t a problem on the DDN side, I repeated these tests 
with the IBGD driver.  There I consistently saw about 600-650 MB/s on the 
port regardless of the number of LUNs I tested with.
 
Any ideas on what the problem is?  Also, if this doesn’t make sense, let 
me know and I will try to clarify further.
 
Thanks,
 
--Shane Canon
 
 
--
R. Shane Canon
National Center for Computational Science
Oak Ridge National Laboratory
canonrs at ornl.gov
 _______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070626/335134f4/attachment.html>

From glebn at voltaire.com  Tue Jun 26 07:13:49 2007
From: glebn at voltaire.com (Gleb Natapov)
Date: Tue, 26 Jun 2007 17:13:49 +0300
Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <20070626140239.GB29602@mellanox.co.il>
References: <aday7i7wye1.fsf@cisco.com> <20070626070641.GM15343@mellanox.co.il>
	<20070626083445.GB1164@minantech.com>
	<20070626095125.GO15343@mellanox.co.il>
	<20070626111342.GC1164@minantech.com>
	<20070626114402.GT15343@mellanox.co.il>
	<20070626122539.GF1164@minantech.com>
	<20070626125802.GU15343@mellanox.co.il>
	<20070626133317.GH1164@minantech.com>
	<20070626140239.GB29602@mellanox.co.il>
Message-ID: <20070626141349.GJ1164@minantech.com>

On Tue, Jun 26, 2007 at 05:02:39PM +0300, Michael S. Tsirkin wrote:
> > Quoting Gleb Natapov <glebn at voltaire.com>:
> > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects
> > 
> > On Tue, Jun 26, 2007 at 03:58:02PM +0300, Michael S. Tsirkin wrote:
> > > > > No, sharing a send queue must be done in software.  I don't really see the reason
> > > > > for sarcasm: do you see value in sharing resources between multiple threads?
> > > > > Why not multiple processes? Some people just don't want to program
> > > > > in multithreaded environment.
> > > >
> > > > Yes I see the value in sharing resources between threads and processes
> > > > if done right. This proposition is far from being right.
> > > 
> > > Ahem, *what* are you talking about? Sharing resources between threads was supported in
> > > libibverbs 1.0, *right from the start*. This is still the case with 1.1, and this API
> > > matches verbs quite closely which means that it can work pretty much on any
> > > hardware.
> > 
> > Why do you think that I have a problem with multithreaded application is
> > beyond my understanding. I have a problem with you thinking that peaking a
> > completion by random process in FCFS order is a good idea.
> 
> Should that have been "picking"?  I keep telling you. With multithreaded
Yes "picking". Sorry :)

> applications *that's what currently happens*. If multiple threads poll a CQ,
> which one gets which completion is currently unspecified. Are you
> worried about this? If not, why are you worried when multiple
> processes do this?
You've missed my sentence about difference between multithreaded
application and what you propose. The difference is HUGE (I can't write
bigger letters sorry about that). I can design a multithreaded MPI so
that each thread will be capable to progress MPI send/recv request (and then
I don't care what thread gets which completion. I can't do it with multiprocess
scenario.

> 
> Look here, hardware features do *not* just materialize when you build an API for
> them.  What good would a pretty API that no hardware supports be?  It's the
> other way around: I'm trying to extend our API to improve scalability with
> existing hardware.
> 
Then this API will stick forever. And HW implementation will have
different API anyway. And that what I am trying to point.
I don't thing Mellanox implemented SRQ API before it was available in
HW. If Mellanox think this is such a great idea (and it is) why not put
implementation where it belongs (in HW that is).

--
			Gleb.


From tziporet at mellanox.co.il  Tue Jun 26 07:27:21 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Tue, 26 Jun 2007 17:27:21 +0300
Subject: [ofa-general] Toward next OFED release (1.3)
Message-ID: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>

Hi All,

On next Monday we will have the first meeting to close OFED 1.3 features
and schedule.
As a preparation I send here the list of features we already reviewed in
Sonoma, and other features I see in progress on the general list
discussions.
I know this is a long mail :-( but I ask each of the
maintainers/customers to review this list and send comments and other
requests.

There are some ULPs that I placed "?" and the owner should review and
reply with the plans.

Thanks,
Tziporet


Main New Features
==============
Base kernel: 2.6.23 (we will start with 2.6.22 but will move to 2.6.23)
Install: 
*	Minimize integration effort into OS distribution
*	Break the packages RPMs (work with Novell and Redhat)

Package: 
*	Sources arrangement for the end user (for the labs)
*	Reduce compilation warnings

QoS:
*	OSM
*	CM & CMA
*	ULPs: SDP, SRP, IPoIB, RDS?

Core: 
*	Updated SA cache
*	User space events registration
*	Preparations for IB routers

libibverbs:
*	New verbs: 
*	Scalable Reliable Connected Transport (with Mellanox ConnectX)
*	Shared Send Queue
*	Reliable Multicast ?

Management:
*	Multiple partitions
*	OpenSM
*	More routing performance improvements
*	Even more speedups
*	Better packaging/installation
"Native" daemon mode
*	Performance management
*	Quality of Service manager: Based on IBTA annex
*	More diagnostics - Hal please update

ULPs:
*	IPoIB: NAPI; CM in GA; Bonding in GA
*	NFS over RDMA integration
*	RDS: RDMA API (using FMRs); GA quality with Oracle 11
*	SDP: Keepalive; Asynch IO (Zero Copy)
*	SRP: HA in GA
*	VNIC: ? Qlogic - please update
*	iSER: ? Voltaire - please update
*	uDAPL - ? Arlin please update

iWARP: (Steve please update if needed)
*	iwarp-specific verbs
*	iwarp-specific async events
*	API for MPA options (CRC/Markers)
*	API for streaming mode IO (needed for compliant iSER)
*	Possibly other ULPs (RDS, SDP, iSER)

MPIs:
Integrate the new MPI releases that are on time for OFED 1.3
*	Jeff - please update about Open MPI
*	DK: Please update regarding MVAPICH and MVAPICH2

OFED 1.3 System Matrix
*	CPU Arch: X86, x86_64, PPC64, ia64
*	kernel.org: kernel 2.6.23
*	Novell: SLES 10; SLES 10 SP1
*	Redhat: RHEL 4 (up4 and up5); RHEL 5 (can we drop RHEL4up4 since
up6 will probably be out till this release is out?)
*	Free distros (Fedora, SuSE Pro, Ubuntu) - basic testing only


Tziporet Koren
Software Director
Mellanox Technologies
mailto: tziporet at mellanox.co.il
Tel +972-4-9097200, ext 380

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070626/c68fa4f5/attachment.html>

From mst at dev.mellanox.co.il  Tue Jun 26 07:26:43 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Jun 2007 17:26:43 +0300
Subject: [ofa-general] bug 667
Message-ID: <20070626142643.GC29602@mellanox.co.il>

Sean, could you look at bug 667 please?
rping seems to be crashing after connect error.

-- 
MST


From mst at dev.mellanox.co.il  Tue Jun 26 07:37:36 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Jun 2007 17:37:36 +0300
Subject: [ofa-general] [Bug 662]
In-Reply-To: <20070626142643.GC29602@mellanox.co.il>
References: <20070626142643.GC29602@mellanox.co.il>
Message-ID: <20070626143735.GD29602@mellanox.co.il>

> Quoting Michael S. Tsirkin <mst at dev.mellanox.co.il>:
> Subject: bug 667
> 
> Sean, could you look at bug 667 please?
> rping seems to be crashing after connect error.

Here's a backtrace from the core dump.

# rping -c -d -a 11.4.3.174
ipaddr (11.4.3.174)
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
created cm_id 0x505f10
cma_event type 1 cma_id 0x505f10 (parent)
cma event 1, error -110
waiting for addr/route resolution state 1
Segmentation fault (core dumped)
# gdb `which rping`
GNU gdb 6.4
Copyright 2005 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-suse-linux"...Using host libthread_db library "/lib64/libthread_db.so.1".

(gdb) core core.29968
Core was generated by `rping -c -d -a 11.4.3.174'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/local/ofed/lib64/librdmacm.so.1...done.
Loaded symbols for /usr/local/ofed/lib64/librdmacm.so.1
Reading symbols from /usr/local/ofed/lib64/libibverbs.so.1...done.
Loaded symbols for /usr/local/ofed/lib64/libibverbs.so.1
Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/local/ofed/lib64/libcxgb3-rdmav2.so...done.
Loaded symbols for /usr/local/ofed/lib64/libcxgb3-rdmav2.so
Reading symbols from /usr/local/ofed/lib64/libmthca-rdmav2.so...done.
Loaded symbols for /usr/local/ofed/lib64/libmthca-rdmav2.so
#0  __ibv_alloc_pd (context=0x0) at src/verbs.c:143
143             pd = context->ops.alloc_pd(context);
(gdb) where
#0  __ibv_alloc_pd (context=0x0) at src/verbs.c:143
#1  0x00000000004015e6 in rping_setup_qp (cb=0x505010, cm_id=0x505f10)
    at examples/rping.c:514
#2  0x000000000040270b in main (argc=5, argv=0x7fffe0117238) at examples/rping.c:936
(gdb) frame 1
#1  0x00000000004015e6 in rping_setup_qp (cb=0x505010, cm_id=0x505f10)
    at examples/rping.c:514
514             cb->pd = ibv_alloc_pd(cm_id->verbs);
(gdb) p cm_id->verbs
$1 = (struct ibv_context *) 0x0
(gdb) p (struct cma_id_private *)cm_id
$2 = (struct cma_id_private *) 0x505f10
(gdb) p *$2
$3 = {id = {verbs = 0x0, channel = 0x505ef0, context = 0x505010, qp = 0x0, route = {
      addr = {src_addr = {sa_family = 0, sa_data = '\0' <repeats 13 times>},
        src_pad = '\0' <repeats 111 times>, dst_addr = {sa_family = 2,
          sa_data = "\000\000\v\004\003�\000\000\000\000\000\000\000"},
        dst_pad = '\0' <repeats 111 times>, addr = {ibaddr = {sgid = {
              raw = '\0' <repeats 15 times>, global = {subnet_prefix = 0,
                interface_id = 0}}, dgid = {raw = '\0' <repeats 15 times>, global = {
                subnet_prefix = 0, interface_id = 0}}, pkey = 0}}}, path_rec = 0x0,
      num_paths = 0}, ps = RDMA_PS_TCP, port_num = 0 '\0'}, cma_dev = 0x0,
  events_completed = 0, connect_error = 0, cond = {__data = {__lock = 0, __futex = 0,
      __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0,
      __broadcast_seq = 0}, __size = '\0' <repeats 47 times>, __align = 0}, mut = {
    __data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0,
      __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
    __size = '\0' <repeats 39 times>, __align = 0}, handle = 0, mc_list = 0x0}
(gdb) where
#0  __ibv_alloc_pd (context=0x0) at src/verbs.c:143
#1  0x00000000004015e6 in rping_setup_qp (cb=0x505010, cm_id=0x505f10)
    at examples/rping.c:514
#2  0x000000000040270b in main (argc=5, argv=0x7fffe0117238) at examples/rping.c:936
(gdb)


-- 
MST


From canonrs at ornl.gov  Tue Jun 26 07:55:29 2007
From: canonrs at ornl.gov (Canon, Richard Shane)
Date: Tue, 26 Jun 2007 10:55:29 -0400
Subject: [ofa-general] low performance with multiple LUNs on a single
	portwith ib_srp
In-Reply-To: <OFF403D794.319B982C-ON85257306.004DAF52-85257306.004E0E68@csc.com>
References: <537C6C0940C6C143AA46A88946B8541708BB417C@ORNLEXCHANGE.ornl.gov>
	<OFF403D794.319B982C-ON85257306.004DAF52-85257306.004E0E68@csc.com>
Message-ID: <537C6C0940C6C143AA46A88946B8541708BB4452@ORNLEXCHANGE.ornl.gov>

 
Mahmoud,

 
Thanks for the hint.  I tried that out and it definitely helped.  The
key parameter is the max_cmd_per_lun.  I think at 16 (which is what I
was using) it was overflowing something in the stack.  I tried both 3
and 5.  With 5 I was able to get over 700 MB/s for one up to four LUNs
on a single port.  I was able to get 750 MB/s when using over two LUNs.
So that looks much better.

 
Thanks,

 
--Shane

 
________________________________

From: Mahmoud Hanafi [mailto:mhanafi at csc.com] 
Sent: Tuesday, June 26, 2007 10:12 AM
To: Canon, Richard Shane
Cc: general at lists.openfabrics.org; Vu Pham
Subject: Re: [ofa-general] low performance with multiple LUNs on a
single portwith ib_srp

 
Here are some performance results that I was able to achieve running
across several Luns. 

Config setting 
1 host port to 1 ddn port. 
OFED1.2rc6 

[root at io1 IB]# cat /etc/modprobe.conf 
alias scsi_hostadapter qla2xxx 
alias scsi_hostadapter1 megaraid_sas 
alias scsi_hostadapter2 qla2400 
alias usb-controller ehci-hcd 
alias usb-controller1 uhci-hcd 
alias ib0 ib_ipoib 
alias ib1 ib_ipoib 
alias net-pf-27 ib_sdp 
alias lustre llite 
options lnet networks=o2ib 
alias eth1 bnx2 
alias eth0 bnx2 
options ib_srp srp_sg_tablesize=256

[root at io1 IB]# cat /etc/srp_daemon.conf 
a max_sect=8192,max_cmd_per_lun=3 


                Write (MB/sec)

                Number of LUNS

"Rec Length 
 (KB)"                1        2        3        4        5        6
7 
 16                  23          37          41          47          51
55          57 
 32                  44          72          82          94          102
109          114 
 64                  79          136          163          187
201          215          226 
 128                  131          247          310          352
380          405          426 
 256                  194          363          477          549
616          673          698 
 512                  299          558          553          670
717          725          727 
 1,024                  434          591          718          725
725          725          726 
 2,048                  465          608          687          723
725          726          726 
 4,096                  523          695          722          726
727          727          726 
 8,192                  537          702          726          727
728          726          727 

                        Read (MB/sec)

                        Number of LUNS

"Rec Length 
 (KB)"                1        2        3        4        5        6
7 
 16                  26          41          45          56          60
63          62 
 32                  48          78          97          107
117          122          124 
 64                  81          140          172          196
215          227          237 
 128                  126          207          269          314
347          373          391 
 256                  174          271          389          482
500          537          546 
 512                  255          375          418          478
528          556          562 
 1,024                  330          430          505          554
564          564          564 
 2,048                  326          445          527          553
561          563          564 
 4,096                  357          513          556          562
564          564          565 
 8,192                  360          520          558          564
565          565          565 


------------------------------------------------------------------------
------------------------------------------------------------------------
--------------------------------
This is a PRIVATE message. If you are not the intended recipient, please
delete without copying and kindly advise us by e-mail of the mistake in
delivery. NOTE: Regardless of content, this e-mail shall not operate to
bind CSC to any order or other contract unless pursuant to explicit
written agreement or government initiative expressly permitting the use
of e-mail for such purpose.
------------------------------------------------------------------------
------------------------------------------------------------------------
--------------------------------


"Canon, Richard Shane" <canonrs at ornl.gov> 
Sent by: general-bounces at lists.openfabrics.org 

06/25/2007 05:51 PM 

To

Vu Pham <vuhuong at mellanox.com> 

cc

general at lists.openfabrics.org 

Subject

[ofa-general] low performance with multiple LUNs on a single port
with ib_srp

 
Greetings, 
  
Hopefully the subject says it all... 
  
I've stumbled on a performance issue with the OFED ib_srp driver.  Here
is the configuration.  I am testing with a DDN 9550 and a single host
system.  The systems are connected by two SDR links.  On the host side
there is a dual port (DDR) card.  On the DDN side, both lines go into a
single singlet (even though it is a couplet).  The lines go into two
distinct cards on the DDN side (if you are familiar with the layout).
The testing used OFED 1.2. 
  
Now for the tests...  If I run a single stream test I'm seeing good
result with over 700 MB/s.  These tests are run using sg_dd with the
directio flag.  If I run two concurrent streams against two LUNs that
are each presented over a single port on the DDN (and therefore accessed
by a single port on the host side), the aggregate performance drop to
around 120 MB/s (60 MB/s per stream). 
  
Just to confirm it isn't a problem on the DDN side, I repeated these
tests with the IBGD driver.  There I consistently saw about 600-650 MB/s
on the port regardless of the number of LUNs I tested with. 
  
Any ideas on what the problem is?  Also, if this doesn't make sense, let
me know and I will try to clarify further. 
  
Thanks, 
  
--Shane Canon 
  
  
-- 
R. Shane Canon 
National Center for Computational Science 
Oak Ridge National Laboratory 
canonrs at ornl.gov 
 _______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070626/e617e6b7/attachment.html>

From swise at opengridcomputing.com  Tue Jun 26 08:02:30 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 26 Jun 2007 10:02:30 -0500
Subject: [ofa-general] [Bug 662]
In-Reply-To: <20070626143735.GD29602@mellanox.co.il>
References: <20070626142643.GC29602@mellanox.co.il>
	<20070626143735.GD29602@mellanox.co.il>
Message-ID: <46812A86.9000505@opengridcomputing.com>

I think the bug is in rping_bind_client().  If addr resolution fails via 
  a ADDR_ERROR event, then rping_bind_client() wakes up and mistakenly 
returns variable 'ret' which is zero.  It should return non-zero in this 
case.

Steve.


Michael S. Tsirkin wrote:
>> Quoting Michael S. Tsirkin <mst at dev.mellanox.co.il>:
>> Subject: bug 667
>>
>> Sean, could you look at bug 667 please?
>> rping seems to be crashing after connect error.
> 
> Here's a backtrace from the core dump.
> 
> # rping -c -d -a 11.4.3.174
> ipaddr (11.4.3.174)
> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>     This will severely limit memory registrations.
> created cm_id 0x505f10
> cma_event type 1 cma_id 0x505f10 (parent)
> cma event 1, error -110
> waiting for addr/route resolution state 1
> Segmentation fault (core dumped)
> # gdb `which rping`
> GNU gdb 6.4
> Copyright 2005 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "x86_64-suse-linux"...Using host libthread_db library "/lib64/libthread_db.so.1".
> 
> (gdb) core core.29968
> Core was generated by `rping -c -d -a 11.4.3.174'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /usr/local/ofed/lib64/librdmacm.so.1...done.
> Loaded symbols for /usr/local/ofed/lib64/librdmacm.so.1
> Reading symbols from /usr/local/ofed/lib64/libibverbs.so.1...done.
> Loaded symbols for /usr/local/ofed/lib64/libibverbs.so.1
> Reading symbols from /lib64/libpthread.so.0...done.
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libdl.so.2...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libc.so.6...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /usr/local/ofed/lib64/libcxgb3-rdmav2.so...done.
> Loaded symbols for /usr/local/ofed/lib64/libcxgb3-rdmav2.so
> Reading symbols from /usr/local/ofed/lib64/libmthca-rdmav2.so...done.
> Loaded symbols for /usr/local/ofed/lib64/libmthca-rdmav2.so
> #0  __ibv_alloc_pd (context=0x0) at src/verbs.c:143
> 143             pd = context->ops.alloc_pd(context);
> (gdb) where
> #0  __ibv_alloc_pd (context=0x0) at src/verbs.c:143
> #1  0x00000000004015e6 in rping_setup_qp (cb=0x505010, cm_id=0x505f10)
>     at examples/rping.c:514
> #2  0x000000000040270b in main (argc=5, argv=0x7fffe0117238) at examples/rping.c:936
> (gdb) frame 1
> #1  0x00000000004015e6 in rping_setup_qp (cb=0x505010, cm_id=0x505f10)
>     at examples/rping.c:514
> 514             cb->pd = ibv_alloc_pd(cm_id->verbs);
> (gdb) p cm_id->verbs
> $1 = (struct ibv_context *) 0x0
> (gdb) p (struct cma_id_private *)cm_id
> $2 = (struct cma_id_private *) 0x505f10
> (gdb) p *$2
> $3 = {id = {verbs = 0x0, channel = 0x505ef0, context = 0x505010, qp = 0x0, route = {
>       addr = {src_addr = {sa_family = 0, sa_data = '\0' <repeats 13 times>},
>         src_pad = '\0' <repeats 111 times>, dst_addr = {sa_family = 2,
>           sa_data = "\000\000\v\004\003�\000\000\000\000\000\000\000"},
>         dst_pad = '\0' <repeats 111 times>, addr = {ibaddr = {sgid = {
>               raw = '\0' <repeats 15 times>, global = {subnet_prefix = 0,
>                 interface_id = 0}}, dgid = {raw = '\0' <repeats 15 times>, global = {
>                 subnet_prefix = 0, interface_id = 0}}, pkey = 0}}}, path_rec = 0x0,
>       num_paths = 0}, ps = RDMA_PS_TCP, port_num = 0 '\0'}, cma_dev = 0x0,
>   events_completed = 0, connect_error = 0, cond = {__data = {__lock = 0, __futex = 0,
>       __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0,
>       __broadcast_seq = 0}, __size = '\0' <repeats 47 times>, __align = 0}, mut = {
>     __data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0,
>       __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
>     __size = '\0' <repeats 39 times>, __align = 0}, handle = 0, mc_list = 0x0}
> (gdb) where
> #0  __ibv_alloc_pd (context=0x0) at src/verbs.c:143
> #1  0x00000000004015e6 in rping_setup_qp (cb=0x505010, cm_id=0x505f10)
>     at examples/rping.c:514
> #2  0x000000000040270b in main (argc=5, argv=0x7fffe0117238) at examples/rping.c:936
> (gdb)
> 
> 


From sweitzen at cisco.com  Tue Jun 26 08:53:31 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Tue, 26 Jun 2007 08:53:31 -0700
Subject: [ofa-general] Re: development process post ofed-1.2 gold.
In-Reply-To: <4680F1C8.3020207@mellanox.co.il>
References: <4680305D.9030701@opengridcomputing.com>
	<4680F1C8.3020207@mellanox.co.il>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA303C166C4@xmb-sjc-216.amer.cisco.com>

> My suggestion is that we keep the ofed_1_2 branch alive, thus 
> new fixes 
> should be applied to the repository.
> In this way we will be able to do a stable release when we decide.
> Another question is regarding the daily build - I don't think we need 
> them any more. We can do a weekly build, or run build in case of need 
> (new patches submitted). What other people think about this?

Weekly and on-demand builds sound OK to me.

Scott


From swise at opengridcomputing.com  Tue Jun 26 08:53:43 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 26 Jun 2007 10:53:43 -0500
Subject: [ofa-general] Re: librdmacm code confusion wrt iWarp
In-Reply-To: <000101c740b7$4abbc140$ff0da8c0@amr.corp.intel.com>
References: <000101c740b7$4abbc140$ff0da8c0@amr.corp.intel.com>
Message-ID: <46813687.2060801@opengridcomputing.com>

Sean Hefty wrote:
> Steve,
> 
> I'm looking at rdma_create_qp() in librdmacm.  There's a section of code in
> there:
> 
> if (id->ps == RDMA_PS_UDP)
> 	ret = ucma_init_ud_qp(id_priv, qp);
> else
> 	ret = ucma_init_ib_qp(id_priv, qp);
> 
> Both of these calls transition the QP to INIT, so that the user can post
> receives before trying to establish a connection.  iWarp is handled the same as
> IB, which confuses me, since it is treated differently in the kernel.  I'm
> assuming that the librdmacm works for you over iWarp, but I'd like to understand
> this better.
> 

The actual work for setting init-state qp attributes and moving the qp 
to INIT state is done in the kernel CMA modules.  Thus librdmacm doesn't 
need to do anything specific for iwarp in user mode.  It calls into the 
kernel and the ucma module ends up calling the kernel 
rdma_init_qp_attr() which does the switch on the transport type.

The design goal when Tom added iwarp into librdmacm was minimal impact 
to the existing code.  So there is very little code in librdmacm that 
switches on the transport type...

Steve.


From swise at opengridcomputing.com  Tue Jun 26 08:55:54 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 26 Jun 2007 10:55:54 -0500
Subject: [ofa-general] Re: development process post ofed-1.2 gold.
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA303C166C4@xmb-sjc-216.amer.cisco.com>
References: <4680305D.9030701@opengridcomputing.com>
	<4680F1C8.3020207@mellanox.co.il>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303C166C4@xmb-sjc-216.amer.cisco.com>
Message-ID: <4681370A.5050306@opengridcomputing.com>

Scott Weitzenkamp (sweitzen) wrote:
>> My suggestion is that we keep the ofed_1_2 branch alive, thus 
>> new fixes 
>> should be applied to the repository.
>> In this way we will be able to do a stable release when we decide.
>> Another question is regarding the daily build - I don't think we need 
>> them any more. We can do a weekly build, or run build in case of need 
>> (new patches submitted). What other people think about this?
> 
> Weekly and on-demand builds sound OK to me.
> 
> Scott

ditto


From sweitzen at cisco.com  Tue Jun 26 08:58:16 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Tue, 26 Jun 2007 08:58:16 -0700
Subject: [ofa-general] SRP Failover
In-Reply-To: <000301c7b7d7$236b3a70$6a41af50$@com.hk>
References: <000301c7b7d7$236b3a70$6a41af50$@com.hk>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA303C166C9@xmb-sjc-216.amer.cisco.com>

You need to configure Device Mapper Multipath or some other multipathing
software to get HA.  What OS are you running?
 
Steps for RHEL are:
 
1) Edit /etc/multipath.conf and comment out devnode_blacklist (RHEL4) or
blacklist (RHEL5) entry.
2) Run "chkconfig multipathd on".
3) Reboot.
4) After reboot, /dev/mapper should be populated with mutipath block
device entries.
5) You can run "multipath -l" to view the multipath status.
 
Steps for SLES10 are similar:
 
1) Run "chkconfig boot.multipath on".
2) Run "chkconfig multipathd on".
3) Reboot.
4) After reboot, /dev/mapper should be populated with mutipath block
device entries.
5) You can run "multipath -l" to view the multipath status.
 
You use the /dev/mapper block devices, not /dev/sd* block devices.
 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

________________________________

	From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of PN Lai
	Sent: Tuesday, June 26, 2007 2:48 AM
	To: general at lists.openfabrics.org
	Subject: [ofa-general] SRP Failover
	
	
	Hi all,

	 
	I'm testing the SRP HA functions, but I have some questions.

	I use 2 IB cables to connect the initiator and 1 IB cables to
connect to the storage.

	 
	I installed the OFED-1.2, enable the "SRP_LOAD=yes" and
"SRPHA_ENABLE=yes" in openib.conf.

	After reboot, it discovers 2 targets /dev/sdbX and /dev/sdcX. 

	 
	However, I check the /var/log/srp_daemon.log, it shows:

	....

	26/05/07 17:42:57 : bad MAD status (110) from lid 257

	26/05/07 17:43:30 : No response to inform info registration

	26/05/07 17:43:30 : Fail to register to traps, maybe there is no
opensm running on fabric

	....

	 
	But the opensm is running in both machines. I don't know whether
it is normal, or should it only discover a single target?

	 
	Now, my question is that if I mount the /dev/sdbX and write data
to it, and then remove 1 of the initiator cable, how the /dev/sdcX will
replace the /dev/sdbX so that I can continue to write the data?

	 
	Do I need to configure some extra files?

	 
	Thanks for reply.

	 
	PN

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070626/f2bec498/attachment.html>

From rowland at cse.ohio-state.edu  Tue Jun 26 10:06:12 2007
From: rowland at cse.ohio-state.edu (Shaun Rowland)
Date: Tue, 26 Jun 2007 13:06:12 -0400
Subject: [ofa-general] Installation problem with mvapich2
In-Reply-To: <1c16cdf90706252229p2a6466a1l81d5411821252744@mail.gmail.com>
References: <1c16cdf90706252229p2a6466a1l81d5411821252744@mail.gmail.com>
Message-ID: <46814784.1040809@cse.ohio-state.edu>

Chevchenkovic Chevchenkovic wrote:
> Hi,
> I am trying to install mvapich2 on my system. So i do the following:
> 1. untar  mvapich2-0.9.8.tar.gz
> 2. go to make.mvapich2.gen2 file and set the prefix as
>   /root/chev/temp/mvapich2-0.9.8/
> 
> Then we execute the instruction as :
> ./make.mvapich2.gen2
> 
> I get the following as output:
> =========================================================
> Configuring MVAPICH2...
> Configuring MPICH2 version MVAPICH2-0.9.8 with
> --prefix=/root/chev/temp/mvapich2-0.9.8/ --enable-g=dbg
> --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd
> --disable-romio --without-mpe
> sourcing /root/chev/temp/mvapich2-0.9.8/src/pm/mpd/setup_pm
> checking for gcc... gcc
> checking for C compiler default output file name... configure: error:
> C compiler cannot create executables
> See `config.log' for more details.
> Configuring MPICH2 version MVAPICH2-0.9.8 with
> --prefix=/root/chev/chev/mvapich2-0.9.8/ --enable-g=dbg
> --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd
> --disable-romio --without-mpe
> sourcing /root/chev/temp/mvapich2-0.9.8/src/pm/mpd/setup_pm
> checking for gcc... gcc
> checking for C compiler default output file name... configure: error:
> C compiler cannot create executables
> See `config.log' for more details.
> Building MVAPICH2...
> make: *** No targets specified and no makefile found.  Stop.
> make: *** No targets specified and no makefile found.  Stop.
> MVAPICH2 installation...
> make: *** No rule to make target `install'.  Stop.
> make: *** No rule to make target `install'.  Stop.
> Congratulations on successfully building MVAPICH2. Please send your
> feedback to mvapich-discuss at cse.ohio-state.edu.
> ================================================
> 
> 
> What is going wrong?
> Can someone please help me in this regards?
> Awaiting some reply,

Can you look in the config.log file that is generated? It should tell
you why the C compiler cannot create executables when run by configure.
It is impossible to tell from the above output alone.
-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


From robert.j.woodruff at intel.com  Tue Jun 26 10:12:03 2007
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Tue, 26 Jun 2007 10:12:03 -0700
Subject: [ofa-general] bug 667
In-Reply-To: <20070626142643.GC29602@mellanox.co.il>
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C0294D4E9@orsmsx418.amr.corp.intel.com>

FYI - Sean is out on vacation, he will be back Thursday. 

-----Original Message-----
From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of Michael S.
Tsirkin
Sent: Tuesday, June 26, 2007 7:27 AM
To: Hefty, Sean; general at lists.openfabrics.org
Subject: [ofa-general] bug 667

Sean, could you look at bug 667 please?
rping seems to be crashing after connect error.

-- 
MST
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


From glebn at voltaire.com  Tue Jun 26 10:21:30 2007
From: glebn at voltaire.com (Gleb Natapov)
Date: Tue, 26 Jun 2007 20:21:30 +0300
Subject: [ofa-general] Toward next OFED release (1.3)
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>
Message-ID: <20070626172130.GB26637@minantech.com>

On Tue, Jun 26, 2007 at 05:27:21PM +0300, Tziporet Koren wrote:
> libibverbs:
> *	New verbs: 
> *	Scalable Reliable Connected Transport (with Mellanox ConnectX)
> *	Shared Send Queue
> *	Reliable Multicast ?
> 
What about allowing to allocate coherent memory for CQ inside the kernel
to fix issue with Altix machines?

--
			Gleb.


From ardavis at ichips.intel.com  Tue Jun 26 10:36:38 2007
From: ardavis at ichips.intel.com (Arlin Davis)
Date: Tue, 26 Jun 2007 10:36:38 -0700
Subject: [ofa-general] Toward next OFED release (1.3)
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>
Message-ID: <46814EA6.1010300@ichips.intel.com>

Tziporet Koren wrote:

> ULPs:
>
>     * IPoIB: NAPI; CM in GA; Bonding in GA
>     * NFS over RDMA integration
>     * RDS: RDMA API (using FMRs); GA quality with Oracle 11
>     * SDP: Keepalive; Asynch IO (Zero Copy)
>     * SRP: HA in GA
>     * VNIC: ? Qlogic - please update
>     * iSER: ? Voltaire - please update
>

uDAPL - DAT 2.0 support with IB extensions for immediate data, atomics; 
Add extensions for new verbs (SRCT,SSQ,RM)

>


From rdreier at cisco.com  Tue Jun 26 10:47:16 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Jun 2007 10:47:16 -0700
Subject: [ofa-general] Toward next OFED release (1.3)
In-Reply-To: <20070626172130.GB26637@minantech.com> (Gleb Natapov's message of
	"Tue, 26 Jun 2007 20:21:30 +0300")
References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>
	<20070626172130.GB26637@minantech.com>
Message-ID: <ada8xa6wqu3.fsf@cisco.com>

 > What about allowing to allocate coherent memory for CQ inside the kernel
 > to fix issue with Altix machines?

Sorry... I've been remiss in posting about this.  I would actually
prefer to see an extension to the dma_map_sg() interface (a new flag
perhaps?) that would set the right magic bit in the DMA address on
altix.  The refactoring of ib_umem_get() to be called by low-level
drivers makes this a fairly clean approach, and it avoids the problems
with using dma_alloc_coherent() to allocate userspace buffers (for
example, dma_alloc_coherent() uses up kernel virtual addresses, which
may be scarce on 32 bit architectures).

 - R.


From sashak at voltaire.com  Tue Jun 26 10:55:54 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Tue, 26 Jun 2007 20:55:54 +0300
Subject: [ofa-general] Fwd: [ANNOUNCE] GIT 1.5.2.2
In-Reply-To: <20070625043809.GA29772@mellanox.co.il>
References: <20070625043809.GA29772@mellanox.co.il>
Message-ID: <20070626175554.GL25653@sashak.voltaire.com>

On 07:38 Mon 25 Jun     , Michael S. Tsirkin wrote:
> FYI
> I think git-gui updates make it worth while to upgrade.
> Sasha?

I guess nobody uses git-gui on server side. Isn't it?

Sasha


From akepner at sgi.com  Tue Jun 26 10:53:43 2007
From: akepner at sgi.com (akepner at sgi.com)
Date: Tue, 26 Jun 2007 10:53:43 -0700
Subject: [ofa-general] Toward next OFED release (1.3)
In-Reply-To: <ada8xa6wqu3.fsf@cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>
	<20070626172130.GB26637@minantech.com> <ada8xa6wqu3.fsf@cisco.com>
Message-ID: <20070626175343.GB5951@sgi.com>

On Tue, Jun 26, 2007 at 10:47:16AM -0700, Roland Dreier wrote:
>  > What about allowing to allocate coherent memory for CQ inside the kernel
>  > to fix issue with Altix machines?
> 
> Sorry... I've been remiss in posting about this.  I would actually
> prefer to see an extension to the dma_map_sg() interface (a new flag
> perhaps?) that would set the right magic bit in the DMA address on
> altix.  The refactoring of ib_umem_get() to be called by low-level
> drivers makes this a fairly clean approach, and it avoids the problems
> with using dma_alloc_coherent() to allocate userspace buffers (for
> example, dma_alloc_coherent() uses up kernel virtual addresses, which
> may be scarce on 32 bit architectures).
> 

Check. 

Generating a patch for OFED 1.3 is on my to do list. 

-- 
Arthur


From sashak at voltaire.com  Tue Jun 26 11:01:57 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Tue, 26 Jun 2007 21:01:57 +0300
Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement
In-Reply-To: <20070626132457.GA29602@mellanox.co.il>
References: <20070626102045.GS15343@mellanox.co.il>
	<1182862966.10379.425353.camel@hal.voltaire.com>
	<20070626132457.GA29602@mellanox.co.il>
Message-ID: <20070626180157.GM25653@sashak.voltaire.com>

On 16:24 Tue 26 Jun     , Michael S. Tsirkin wrote:
> > Quoting Hal Rosenstock <halr at voltaire.com>:
> > Subject: Re: [PATCH] management: uint -> unsigned replacement
> > 
> > On Tue, 2007-06-26 at 06:20, Michael S. Tsirkin wrote:
> > > Some management headers use uint type which (on my system)
> > 
> > What's your system ?
> 
> SLES10.
> 
> > >  is described as "old
> > > compatibility name for C type".  This type might not defined e.g. if
> > > __STRICT_ANSI__ is set,
> > 
> > Is strict ANSI a requirement ?

Even if not, what is a reason to use uint there instead of just unsigned?
I don't know. I like this patch.

Sasha

> 
> Not sure. The app in question does
> #define _XOPEN_SOURCE 600
> 
> > >  so it is best to avoid its usage at least in headers.
> > > Replace by unsigned in all headers.
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>
> > > 
> > > ---
> > > 
> > > Hal can you apply this please? As a separate question:
> > > I didn't go over .c files (we don't build them with strict ansi now),
> > > but maybe removing uint there is a good idea, too?
> > 
> > Yes but it will take more than this to make them strict ANSI.
> > 
> > Is this as an OFED 1.2 follow on or just for master ?
> 
> You decide.
> 
> -- 
> MST
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From rdreier at cisco.com  Tue Jun 26 11:01:47 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Jun 2007 11:01:47 -0700
Subject: [ofa-general] Toward next OFED release (1.3)
In-Reply-To: <20070626175343.GB5951@sgi.com> (akepner@sgi.com's message of
	"Tue, 26 Jun 2007 10:53:43 -0700")
References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>
	<20070626172130.GB26637@minantech.com> <ada8xa6wqu3.fsf@cisco.com>
	<20070626175343.GB5951@sgi.com>
Message-ID: <ada4pkuwq5w.fsf@cisco.com>

 > > Sorry... I've been remiss in posting about this.  I would actually
 > > prefer to see an extension to the dma_map_sg() interface (a new flag
 > > perhaps?) that would set the right magic bit in the DMA address on
 > > altix.  The refactoring of ib_umem_get() to be called by low-level
 > > drivers makes this a fairly clean approach, and it avoids the problems
 > > with using dma_alloc_coherent() to allocate userspace buffers (for
 > > example, dma_alloc_coherent() uses up kernel virtual addresses, which
 > > may be scarce on 32 bit architectures).
 > > 
 > 
 > Check. 
 > 
 > Generating a patch for OFED 1.3 is on my to do list. 

That's great, but please let's not think about it as a patch "for OFED
1.3."  I think this sort of change to the user/kernel interface really
needs to go upstream before it goes into OFED, so just work on getting
the changes into the kernel and libmthca, and then we can worry about
getting them into an OFED release.

 - R.


From glebn at voltaire.com  Tue Jun 26 11:05:51 2007
From: glebn at voltaire.com (Gleb Natapov)
Date: Tue, 26 Jun 2007 21:05:51 +0300
Subject: [ofa-general] Toward next OFED release (1.3)
In-Reply-To: <ada8xa6wqu3.fsf@cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>
	<20070626172130.GB26637@minantech.com> <ada8xa6wqu3.fsf@cisco.com>
Message-ID: <20070626180551.GG26637@minantech.com>

On Tue, Jun 26, 2007 at 10:47:16AM -0700, Roland Dreier wrote:
> 
>  > What about allowing to allocate coherent memory for CQ inside the kernel
>  > to fix issue with Altix machines?
> 
> Sorry... I've been remiss in posting about this.  I would actually
> prefer to see an extension to the dma_map_sg() interface (a new flag
> perhaps?) that would set the right magic bit in the DMA address on
> altix.  The refactoring of ib_umem_get() to be called by low-level
> drivers makes this a fairly clean approach, and it avoids the problems
> with using dma_alloc_coherent() to allocate userspace buffers (for
> example, dma_alloc_coherent() uses up kernel virtual addresses, which
> may be scarce on 32 bit architectures).
> 
While this make sense it would be hard to push into the kernel proper.
Or no? Are you going to do that?

--
			Gleb.


From rdreier at cisco.com  Tue Jun 26 11:33:04 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Jun 2007 11:33:04 -0700
Subject: [ofa-general] Toward next OFED release (1.3)
In-Reply-To: <20070626180551.GG26637@minantech.com> (Gleb Natapov's message of
	"Tue, 26 Jun 2007 21:05:51 +0300")
References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>
	<20070626172130.GB26637@minantech.com> <ada8xa6wqu3.fsf@cisco.com>
	<20070626180551.GG26637@minantech.com>
Message-ID: <adazm2mva5b.fsf@cisco.com>

 > > Sorry... I've been remiss in posting about this.  I would actually
 > > prefer to see an extension to the dma_map_sg() interface (a new flag
 > > perhaps?) that would set the right magic bit in the DMA address on
 > > altix.  The refactoring of ib_umem_get() to be called by low-level
 > > drivers makes this a fairly clean approach, and it avoids the problems
 > > with using dma_alloc_coherent() to allocate userspace buffers (for
 > > example, dma_alloc_coherent() uses up kernel virtual addresses, which
 > > may be scarce on 32 bit architectures).

 > While this make sense it would be hard to push into the kernel proper.
 > Or no? Are you going to do that?

I don't think I'm willing to merge a fix that uses dma_alloc_coherent()
inside the kernel so this alternate fix is probably easier to merge.
Yes, it does mean an extension to the DMA mapping API but I think
getting that right will be useful in terms of making sure what we're
doing really makes sense.

 - R.


From madhu.lakshmanan at qlogic.com  Tue Jun 26 12:15:54 2007
From: madhu.lakshmanan at qlogic.com (Lakshmanan, Madhu)
Date: Tue, 26 Jun 2007 14:15:54 -0500
Subject: [ofa-general] Toward next OFED release (1.3)
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>
Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE06119291460@EPEXCH2.qlogic.org>

> From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On
> Behalf Of Tziporet Koren
> Subject: [ofa-general] Toward next OFED release (1.3)
> 
> Hi All,
> 
> On next Monday we will have the first meeting to close OFED 1.3
features and schedule.
> As a preparation I send here the list of features we already reviewed
in Sonoma, and other
> features I see in progress on the general list discussions.
> 
> I know this is a long mail :-( but I ask each of the
maintainers/customers to review this list and
> send comments and other requests.
> 
> There are some ULPs that I placed "?" and the owner should review and
reply with the plans.
> 
> Thanks,
> Tziporet
> 
> 
> Main New Features
> ==============
> Base kernel: 2.6.23 (we will start with 2.6.22 but will move to
2.6.23)
> Install:
> 
> *	Minimize integration effort into OS distribution
> *	Break the packages RPMs (work with Novell and Redhat)
> 
> 
> Package:
> 
> *	Sources arrangement for the end user (for the labs)
> *	Reduce compilation warnings
> 
> 
> QoS:
> 
> *	OSM
> *	CM & CMA
> *	ULPs: SDP, SRP, IPoIB, RDS?
> 
> 
> Core:
> 
> *	Updated SA cache
> *	User space events registration
> *	Preparations for IB routers
> 
> 
> libibverbs:
> 
> *	New verbs:
> 
> 	*	Scalable Reliable Connected Transport (with Mellanox
ConnectX)
> 	*	Shared Send Queue
> 	*	Reliable Multicast ?
> 
> 
> Management:
> 
> *	Multiple partitions
> *	OpenSM
> 
> 	*	More routing performance improvements
> 	*	Even more speedups
> 	*	Better packaging/installation
> 	*	"Native" daemon mode
> 	*	Performance management
> 	*	Quality of Service manager: Based on IBTA annex
> 
> *	More diagnostics - Hal please update
> 
> 
> ULPs:
> 
> *	IPoIB: NAPI; CM in GA; Bonding in GA
> *	NFS over RDMA integration
> *	RDS: RDMA API (using FMRs); GA quality with Oracle 11
> *	SDP: Keepalive; Asynch IO (Zero Copy)
> *	SRP: HA in GA
> *	VNIC: ? Qlogic - please update

VNIC: 
    - GA quality. Not a technology preview version anymore.
    - Added support for QLogic EVIC (10 Gbps Infiniband-to-Ethernet
gateway) - in GA
    - mlx4 and ipath support - in GA

Thanks,

Madhu Lakshmanan
QLogic Corporation


From tziporet at dev.mellanox.co.il  Tue Jun 26 12:19:48 2007
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Tue, 26 Jun 2007 22:19:48 +0300
Subject: [ewg] Re: [ofa-general] Toward next OFED release (1.3)
In-Reply-To: <ada4pkuwq5w.fsf@cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>	<20070626172130.GB26637@minantech.com>
	<ada8xa6wqu3.fsf@cisco.com>	<20070626175343.GB5951@sgi.com>
	<ada4pkuwq5w.fsf@cisco.com>
Message-ID: <468166D4.20204@mellanox.co.il>

Roland Dreier wrote:
> That's great, but please let's not think about it as a patch "for OFED
> 1.3."  I think this sort of change to the user/kernel interface really
> needs to go upstream before it goes into OFED, so just work on getting
> the changes into the kernel and libmthca, and then we can worry about
> getting them into an OFED release.
>   
This comment  is aligned with OFED development methodology.
Regarding all kernel modules that are part of Linux: we first push the 
change to the kernel and base OFED on this code.
We take kernel patches for bug fixes and portions that are targeted for 
the kernel inclusion.
OFED does not come to be a bypass for the Linux kernel development process.

Regarding user space libraries - OFED is based on the sources from git 
of each package and any change should be coordinated with the library owner:
libibverbs - Roland
libumad - Hal
librdmacm & libcm - Sean
uDAPL - Arlin

Tziporet


From halr at voltaire.com  Tue Jun 26 12:18:55 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 26 Jun 2007 15:18:55 -0400
Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement
In-Reply-To: <20070626180157.GM25653@sashak.voltaire.com>
References: <20070626102045.GS15343@mellanox.co.il>
	<1182862966.10379.425353.camel@hal.voltaire.com>
	<20070626132457.GA29602@mellanox.co.il>
	<20070626180157.GM25653@sashak.voltaire.com>
Message-ID: <1182885534.28870.527.camel@hal.voltaire.com>

On Tue, 2007-06-26 at 14:01, Sasha Khapyorsky wrote:
> On 16:24 Tue 26 Jun     , Michael S. Tsirkin wrote:
> > > Quoting Hal Rosenstock <halr at voltaire.com>:
> > > Subject: Re: [PATCH] management: uint -> unsigned replacement
> > > 
> > > On Tue, 2007-06-26 at 06:20, Michael S. Tsirkin wrote:
> > > > Some management headers use uint type which (on my system)
> > > 
> > > What's your system ?
> > 
> > SLES10.
> > 
> > > >  is described as "old
> > > > compatibility name for C type".  This type might not defined e.g. if
> > > > __STRICT_ANSI__ is set,
> > > 
> > > Is strict ANSI a requirement ?
> 
> Even if not,

I was just trying to determine how much further we needed to go down
this path.

-- Hal

>  what is a reason to use uint there instead of just unsigned?
> I don't know. I like this patch.
> 
> Sasha
> 
> > 
> > Not sure. The app in question does
> > #define _XOPEN_SOURCE 600
> > 
> > > >  so it is best to avoid its usage at least in headers.
> > > > Replace by unsigned in all headers.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>
> > > > 
> > > > ---
> > > > 
> > > > Hal can you apply this please? As a separate question:
> > > > I didn't go over .c files (we don't build them with strict ansi now),
> > > > but maybe removing uint there is a good idea, too?
> > > 
> > > Yes but it will take more than this to make them strict ANSI.
> > > 
> > > Is this as an OFED 1.2 follow on or just for master ?
> > 
> > You decide.
> > 
> > -- 
> > MST
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > 
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From rdreier at cisco.com  Tue Jun 26 12:34:10 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Jun 2007 12:34:10 -0700
Subject: [ewg] Re: [ofa-general] Toward next OFED release (1.3)
In-Reply-To: <468166D4.20204@mellanox.co.il> (Tziporet Koren's message of "Tue,
	26 Jun 2007 22:19:48 +0300")
References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>
	<20070626172130.GB26637@minantech.com> <ada8xa6wqu3.fsf@cisco.com>
	<20070626175343.GB5951@sgi.com> <ada4pkuwq5w.fsf@cisco.com>
	<468166D4.20204@mellanox.co.il>
Message-ID: <adar6nyv7bh.fsf@cisco.com>

 > This comment  is aligned with OFED development methodology.
 > Regarding all kernel modules that are part of Linux: we first push the
 > change to the kernel and base OFED on this code.
 > We take kernel patches for bug fixes and portions that are targeted
 > for the kernel inclusion.
 > OFED does not come to be a bypass for the Linux kernel development process.

Right, I think we agree on things here.  I just want to emphasize that
the best and easiest way to get things into OFED is to get them into
upstream sources.  And I hope OFED maintainers will start to push back
on patch submissions to OFED that have not at least been submitted for
upstream inclusion.

 - R.


From rdreier at cisco.com  Tue Jun 26 12:34:46 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Jun 2007 12:34:46 -0700
Subject: [ewg] RE: [ofa-general] Toward next OFED release (1.3)
In-Reply-To: <4FB1BCCAE6CAED44A1DC005B1DE06119291460@EPEXCH2.qlogic.org>
	(Madhu Lakshmanan's message of "Tue, 26 Jun 2007 14:15:54 -0500")
References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>
	<4FB1BCCAE6CAED44A1DC005B1DE06119291460@EPEXCH2.qlogic.org>
Message-ID: <adamyymv7ah.fsf@cisco.com>

 > VNIC: 
 >     - GA quality. Not a technology preview version anymore.
 >     - Added support for QLogic EVIC (10 Gbps Infiniband-to-Ethernet
 > gateway) - in GA

I hope there will be some attempt to get these drivers merged upstream too.

 - R.


From mst at dev.mellanox.co.il  Tue Jun 26 12:35:12 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Jun 2007 22:35:12 +0300
Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <20070626141349.GJ1164@minantech.com>
References: <20070626070641.GM15343@mellanox.co.il>
	<20070626083445.GB1164@minantech.com>
	<20070626095125.GO15343@mellanox.co.il>
	<20070626111342.GC1164@minantech.com>
	<20070626114402.GT15343@mellanox.co.il>
	<20070626122539.GF1164@minantech.com>
	<20070626125802.GU15343@mellanox.co.il>
	<20070626133317.GH1164@minantech.com>
	<20070626140239.GB29602@mellanox.co.il>
	<20070626141349.GJ1164@minantech.com>
Message-ID: <20070626193512.GC6426@mellanox.co.il>

> Quoting Gleb Natapov <glebn at voltaire.com>:
> Subject: Re: Re: [PATCH RFC] sharing userspace IB objects
> 
> On Tue, Jun 26, 2007 at 05:02:39PM +0300, Michael S. Tsirkin wrote:
> > > Quoting Gleb Natapov <glebn at voltaire.com>:
> > > Subject: Re: Re: [PATCH RFC] sharing userspace IB objects
> > > 
> > > On Tue, Jun 26, 2007 at 03:58:02PM +0300, Michael S. Tsirkin wrote:
> > > > > > No, sharing a send queue must be done in software.  I don't really see the reason
> > > > > > for sarcasm: do you see value in sharing resources between multiple threads?
> > > > > > Why not multiple processes? Some people just don't want to program
> > > > > > in multithreaded environment.
> > > > >
> > > > > Yes I see the value in sharing resources between threads and processes
> > > > > if done right. This proposition is far from being right.
> > > > 
> > > > Ahem, *what* are you talking about? Sharing resources between threads was supported in
> > > > libibverbs 1.0, *right from the start*. This is still the case with 1.1, and this API
> > > > matches verbs quite closely which means that it can work pretty much on any
> > > > hardware.
> > > 
> > > Why do you think that I have a problem with multithreaded application is
> > > beyond my understanding. I have a problem with you thinking that peaking a
> > > completion by random process in FCFS order is a good idea.
> > 
> > Should that have been "picking"?  I keep telling you. With multithreaded
> Yes "picking". Sorry :)
> 
> > applications *that's what currently happens*. If multiple threads poll a CQ,
> > which one gets which completion is currently unspecified. Are you
> > worried about this? If not, why are you worried when multiple
> > processes do this?
> You've missed my sentence about difference between multithreaded
> application and what you propose. The difference is HUGE (I can't write
> bigger letters sorry about that). I can design a multithreaded MPI so
> that each thread will be capable to progress MPI send/recv request (and then
> I don't care what thread gets which completion. I can't do it with multiprocess
> scenario.

Well, with shared memory, the difference between thread and process is not that huge.
And with the proposed API, you will be able to do just that.

-- 
MST


From madhu.lakshmanan at qlogic.com  Tue Jun 26 12:46:51 2007
From: madhu.lakshmanan at qlogic.com (Lakshmanan, Madhu)
Date: Tue, 26 Jun 2007 14:46:51 -0500
Subject: [ewg] RE: [ofa-general] Toward next OFED release (1.3)
In-Reply-To: <adamyymv7ah.fsf@cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com><4FB1BCCAE6CAED44A1DC005B1DE06119291460@EPEXCH2.qlogic.org>
	<adamyymv7ah.fsf@cisco.com>
Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE0611929146A@EPEXCH2.qlogic.org>

> From: Roland Dreier [mailto:rdreier at cisco.com]
> Subject: Re: [ewg] RE: [ofa-general] Toward next OFED release (1.3)
> 
>  > VNIC:
>  >     - GA quality. Not a technology preview version anymore.
>  >     - Added support for QLogic EVIC (10 Gbps Infiniband-to-Ethernet
>  > gateway) - in GA
> 
> I hope there will be some attempt to get these drivers merged upstream
too.
> 
>  - R.

Agreed in principle. We hope to address that issue soon.

Madhu


From sweitzen at cisco.com  Tue Jun 26 12:49:16 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Tue, 26 Jun 2007 12:49:16 -0700
Subject: [ewg] RE: [ofa-general] Toward next OFED release (1.3)
In-Reply-To: <adamyymv7ah.fsf@cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com><4FB1BCCAE6CAED44A1DC005B1DE06119291460@EPEXCH2.qlogic.org>
	<adamyymv7ah.fsf@cisco.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA303C16878@xmb-sjc-216.amer.cisco.com>


> I hope there will be some attempt to get these drivers merged 
> upstream too.

How about SDP, are we ready to try to merge it upstream?

Scott


From glebn at voltaire.com  Tue Jun 26 12:54:01 2007
From: glebn at voltaire.com (Gleb Natapov)
Date: Tue, 26 Jun 2007 22:54:01 +0300
Subject: [ofa-general] Re: Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <20070626193512.GC6426@mellanox.co.il>
References: <20070626083445.GB1164@minantech.com>
	<20070626095125.GO15343@mellanox.co.il>
	<20070626111342.GC1164@minantech.com>
	<20070626114402.GT15343@mellanox.co.il>
	<20070626122539.GF1164@minantech.com>
	<20070626125802.GU15343@mellanox.co.il>
	<20070626133317.GH1164@minantech.com>
	<20070626140239.GB29602@mellanox.co.il>
	<20070626141349.GJ1164@minantech.com>
	<20070626193512.GC6426@mellanox.co.il>
Message-ID: <20070626195401.GH26637@minantech.com>

On Tue, Jun 26, 2007 at 10:35:12PM +0300, Michael S. Tsirkin wrote:
> > > applications *that's what currently happens*. If multiple threads poll a CQ,
> > > which one gets which completion is currently unspecified. Are you
> > > worried about this? If not, why are you worried when multiple
> > > processes do this?
> > You've missed my sentence about difference between multithreaded
> > application and what you propose. The difference is HUGE (I can't write
> > bigger letters sorry about that). I can design a multithreaded MPI so
> > that each thread will be capable to progress MPI send/recv request (and then
> > I don't care what thread gets which completion. I can't do it with multiprocess
> > scenario.
> 
> Well, with shared memory, the difference between thread and process is not that huge.
> And with the proposed API, you will be able to do just that.
> 
With your logic kernel can send signal to any process no matter which
process actually caused it. After all this is what it does with threads.
You are thinking about syntactic benchmark that just send random data to
a peer and free it on completion. The real program has much more state
associated with each operation and corespondent completion. And received
data have to be actually processed by a process it was send to and not
just by any process. Unless you'll stop repeating your mantra that
threads are just like processes with shared memory segment we will not be able
to address shortcomings of your proposal.

--
			Gleb.


From halr at voltaire.com  Tue Jun 26 13:21:53 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 26 Jun 2007 16:21:53 -0400
Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement
In-Reply-To: <20070626102045.GS15343@mellanox.co.il>
References: <20070626102045.GS15343@mellanox.co.il>
Message-ID: <1182889307.28870.4809.camel@hal.voltaire.com>

On Tue, 2007-06-26 at 06:20, Michael S. Tsirkin wrote:
> Some management headers use uint type which (on my system) is described as "old
> compatibility name for C type".  This type might not defined e.g. if
> __STRICT_ANSI__ is set, so it is best to avoid its usage at least in headers.
> Replace by unsigned in all headers.
> 
> Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>

Thanks. Applied (to master only so far but it does seem since a goal of
OFED 1.2 is to support SLES 10 that is should be provided there as well.
That will be forthcoming.)

Also, I am working on updating the management library sources similarly
although I don't see an imperative to move those changes to OFED 1.2.

-- Hal


From rdreier at cisco.com  Tue Jun 26 14:15:33 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Jun 2007 14:15:33 -0700
Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <20070626070641.GM15343@mellanox.co.il> (Michael S. Tsirkin's
	message of "Tue, 26 Jun 2007 10:06:41 +0300")
References: <20070625130604.GH15343@mellanox.co.il>
	<aday7i7wye1.fsf@cisco.com> <20070626070641.GM15343@mellanox.co.il>
Message-ID: <adahcouv2mi.fsf@cisco.com>

 > This is not directly related to SRC: this is an effort
 > to make it possible to share QPs, CQ etc across processes
 > in the same way as they can be currently shared across threads.
 > So assuming that we want multiple processes to post to
 > the same QP, how can we support this?

This looks like a lot of work for an unknown gain.  Who is going to
really use this?  ie is it worth the trouble?

 > >  - Given that everything shared is in shared memory,
 > 
 > I think we should try and keep shared memory usage to minimum.
 > For example, in mthca mr object just needs a key: we could
 > keep it in non-shared memory, just pass the key around
 > and save on sahred memory usage.

This comment made me realize there are a few more problems here.  What
happens if I do ibv_reg_mr() in one process, pass the MR to another
process, and then do ibv_dereg_mr() in the second process?  What about
if someone registers a region in shared memory -- are there any
fork/copy-on-write issues with that?  I think there are probably bugs
in the locked_vm accounting in the kernel right now -- it doesn't take
into account the possibility of passing context fds from one process
to another.

In general what do you think the rules for destroying objects should
be?  What if process A creates a QP, passes it to process B, and then
process A dies?  Should the QP still be usable?  Should process B be
able to destroy it?  What if process A is still alive -- should
process B be able to destroy the QP?

 > We need to share file descriptors too. Is there a way to pass these
 > around besides unix domain sockets?

I guess we need this to be able to re-mmap doorbell pages etc, right?
I wonder if there's a better way around that... maybe extending the
kernel interface so that unrelated processes can share a context, eg
by putting contexts in a filesystem or something like that.

 > But are you sure we want to break API for all users just to add
 > a new capability for a minority that wants shared memory support?

Yes, you're right... better to be backward compatible and have a new
API for shared stuff.

 - R.


From rdreier at cisco.com  Tue Jun 26 15:11:28 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Jun 2007 15:11:28 -0700
Subject: [ofa-general] Re: [PATCH 01/28] IB/ipath: include <linux/vmalloc.h>
	to fix ppc64 build
In-Reply-To: <20070619234035.3794.7544.stgit@bauxite.internal.keyresearch.com>
	(Arthur Jones's message of "Tue, 19 Jun 2007 16:40:35 -0700")
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234035.3794.7544.stgit@bauxite.internal.keyresearch.com>
Message-ID: <adavedatlgv.fsf@cisco.com>

Thanks, I applied all of these patches except 15/28 (waiting for a
revised version with comments for the barriers) and {26,27}/28 (see
separate replies).

Also it would be great to get a MAINTAINERS update soon...

 - R.


From rdreier at cisco.com  Tue Jun 26 15:13:11 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Jun 2007 15:13:11 -0700
Subject: [ofa-general] Re: [PATCH 26/28] IB/ipath - print warning if LID not
	acquired and link ACTIVE within one minute
In-Reply-To: <20070619234303.3794.75856.stgit@bauxite.internal.keyresearch.com>
	(Arthur Jones's message of "Tue, 19 Jun 2007 16:43:04 -0700")
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234303.3794.75856.stgit@bauxite.internal.keyresearch.com>
Message-ID: <adar6nytle0.fsf@cisco.com>

This has come up before -- the feeling was that this checking
shouldn't be in a low-level driver.  Either warning for no LID makes
sense for any IB device and therefore should be in the IB midlayer, or
it doesn't make sense and ipath shouldn't do it.

 - R.


From rdreier at cisco.com  Tue Jun 26 15:13:44 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Jun 2007 15:13:44 -0700
Subject: [ofa-general] Re: [PATCH 27/28] IB/ipath - when we check for LID
	availability, check for lack of interrupts too.
In-Reply-To: <20070619234309.3794.784.stgit@bauxite.internal.keyresearch.com>
	(Arthur Jones's message of "Tue, 19 Jun 2007 16:43:10 -0700")
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234309.3794.784.stgit@bauxite.internal.keyresearch.com>
Message-ID: <adamyymtld3.fsf@cisco.com>

I didn't apply this either because it depends on 26/28 and I held off
on that one.  I think checking for interrupts in a low-level driver
*is* sane though...


From arthur.jones at qlogic.com  Tue Jun 26 15:16:57 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 26 Jun 2007 15:16:57 -0700
Subject: [ofa-general] Re: [PATCH 01/28] IB/ipath: include <linux/vmalloc.h>
	to fix ppc64 build
In-Reply-To: <adavedatlgv.fsf@cisco.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234035.3794.7544.stgit@bauxite.internal.keyresearch.com>
	<adavedatlgv.fsf@cisco.com>
Message-ID: <20070626221657.GO29798@bauxite.pathscale.com>

hi roland, ...

On Tue, Jun 26, 2007 at 03:11:28PM -0700, Roland Dreier wrote:
> Thanks, I applied all of these patches except 15/28 (waiting for a
> revised version with comments for the barriers) and {26,27}/28 (see
> separate replies).

thanks...

> Also it would be great to get a MAINTAINERS update soon...

ok, i have the patch in my tree (along w/ a couple
others), i was holding onto them until i got a chance
to test them.  shall i send off the MAINTAINERS patch
separately?  i expect to be able to get to testing by
the end of this week...

arthur


From arthur.jones at qlogic.com  Tue Jun 26 15:25:56 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 26 Jun 2007 15:25:56 -0700
Subject: [ofa-general] Re: [PATCH 26/28] IB/ipath - print warning if LID not
	acquired and link ACTIVE within one minute
In-Reply-To: <adar6nytle0.fsf@cisco.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234303.3794.75856.stgit@bauxite.internal.keyresearch.com>
	<adar6nytle0.fsf@cisco.com>
Message-ID: <20070626222556.GP29798@bauxite.pathscale.com>

sorry to have missed the fun.

does this mean that there's a patch pending
to remove the gazillion link down messages
in drivers/net?

anyway, do we want it in the IB midlayer?  i'd
definitely like it somewhere, user space is a bit
cumbersome for a such a simple check...

arthur

On Tue, Jun 26, 2007 at 03:13:11PM -0700, Roland Dreier wrote:
> This has come up before -- the feeling was that this checking
> shouldn't be in a low-level driver.  Either warning for no LID makes
> sense for any IB device and therefore should be in the IB midlayer, or
> it doesn't make sense and ipath shouldn't do it.
> 
>  - R.


From rdreier at cisco.com  Tue Jun 26 15:26:04 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Jun 2007 15:26:04 -0700
Subject: [ofa-general] [PATCH/RFC] IB/mthca: Remove MSI support
Message-ID: <adaejjytksj.fsf@cisco.com>

Is there any point in having MSI support in mthca, given that the
hardware also does MSI-X, which is much more useful?  Is anyone using
MSI instead of MSI-X, and if so why?

What do people think about applying this for 2.6.23?

diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h
index 9bae3cc..1002482 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -60,7 +60,6 @@
 enum {
 	MTHCA_FLAG_DDR_HIDDEN = 1 << 1,
 	MTHCA_FLAG_SRQ        = 1 << 2,
-	MTHCA_FLAG_MSI        = 1 << 3,
 	MTHCA_FLAG_MSI_X      = 1 << 4,
 	MTHCA_FLAG_NO_LAM     = 1 << 5,
 	MTHCA_FLAG_FMR        = 1 << 6,
diff --git a/drivers/infiniband/hw/mthca/mthca_eq.c b/drivers/infiniband/hw/mthca/mthca_eq.c
index 8ec9fa1..a6ae4d9 100644
--- a/drivers/infiniband/hw/mthca/mthca_eq.c
+++ b/drivers/infiniband/hw/mthca/mthca_eq.c
@@ -842,8 +842,7 @@ int mthca_init_eq_table(struct mthca_dev *dev)
 	if (err)
 		goto err_out_free;
 
-	if (dev->mthca_flags & MTHCA_FLAG_MSI ||
-	    dev->mthca_flags & MTHCA_FLAG_MSI_X) {
+	if (dev->mthca_flags & MTHCA_FLAG_MSI_X) {
 		dev->eq_table.clr_mask = 0;
 	} else {
 		dev->eq_table.clr_mask =
@@ -854,8 +853,7 @@ int mthca_init_eq_table(struct mthca_dev *dev)
 
 	dev->eq_table.arm_mask = 0;
 
-	intr = (dev->mthca_flags & MTHCA_FLAG_MSI) ?
-		128 : dev->eq_table.inta_pin;
+	intr = dev->eq_table.inta_pin;
 
 	err = mthca_create_eq(dev, dev->limits.num_cqs + MTHCA_NUM_SPARE_EQE,
 			      (dev->mthca_flags & MTHCA_FLAG_MSI_X) ? 128 : intr,
diff --git a/drivers/infiniband/hw/mthca/mthca_main.c b/drivers/infiniband/hw/mthca/mthca_main.c
index aa563e6..f5abdbf 100644
--- a/drivers/infiniband/hw/mthca/mthca_main.c
+++ b/drivers/infiniband/hw/mthca/mthca_main.c
@@ -67,7 +67,7 @@ MODULE_PARM_DESC(msi_x, "attempt to use MSI-X if nonzero");
 
 static int msi = 0;
 module_param(msi, int, 0444);
-MODULE_PARM_DESC(msi, "attempt to use MSI if nonzero");
+MODULE_PARM_DESC(msi, "(MSI support has been removed; ignored)");
 
 #else /* CONFIG_PCI_MSI */
 
@@ -837,7 +837,7 @@ static int mthca_setup_hca(struct mthca_dev *dev)
 			  dev->mthca_flags & MTHCA_FLAG_MSI_X ?
 			  dev->eq_table.eq[MTHCA_EQ_CMD].msi_x_vector :
 			  dev->pdev->irq);
-		if (dev->mthca_flags & (MTHCA_FLAG_MSI | MTHCA_FLAG_MSI_X))
+		if (dev->mthca_flags & MTHCA_FLAG_MSI_X)
 			mthca_err(dev, "Try again with MSI/MSI-X disabled.\n");
 		else
 			mthca_err(dev, "BIOS or ACPI interrupt routing problem?\n");
@@ -1117,9 +1117,8 @@ static int __mthca_init_one(struct pci_dev *pdev, int hca_type)
 
 	if (msi_x && !mthca_enable_msi_x(mdev))
 		mdev->mthca_flags |= MTHCA_FLAG_MSI_X;
-	if (msi && !(mdev->mthca_flags & MTHCA_FLAG_MSI_X) &&
-	    !pci_enable_msi(pdev))
-		mdev->mthca_flags |= MTHCA_FLAG_MSI;
+	if (msi)
+		mthca_warn(mdev, "MSI support has been removed; msi flag is ignored.\n");
 
 	if (mthca_cmd_init(mdev)) {
 		mthca_err(mdev, "Failed to init command interface, aborting.\n");
@@ -1188,8 +1187,6 @@ err_cmd:
 err_free_dev:
 	if (mdev->mthca_flags & MTHCA_FLAG_MSI_X)
 		pci_disable_msix(pdev);
-	if (mdev->mthca_flags & MTHCA_FLAG_MSI)
-		pci_disable_msi(pdev);
 
 	ib_dealloc_device(&mdev->ib_dev);
 
@@ -1236,8 +1233,6 @@ static void __mthca_remove_one(struct pci_dev *pdev)
 
 		if (mdev->mthca_flags & MTHCA_FLAG_MSI_X)
 			pci_disable_msix(pdev);
-		if (mdev->mthca_flags & MTHCA_FLAG_MSI)
-			pci_disable_msi(pdev);
 
 		ib_dealloc_device(&mdev->ib_dev);
 		mthca_release_regions(pdev, mdev->mthca_flags &


From rdreier at cisco.com  Tue Jun 26 15:27:56 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Jun 2007 15:27:56 -0700
Subject: [ofa-general] Re: [PATCH 01/28] IB/ipath: include <linux/vmalloc.h>
	to fix ppc64 build
In-Reply-To: <20070626221657.GO29798@bauxite.pathscale.com> (Arthur Jones's
	message of "Tue, 26 Jun 2007 15:16:57 -0700")
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234035.3794.7544.stgit@bauxite.internal.keyresearch.com>
	<adavedatlgv.fsf@cisco.com>
	<20070626221657.GO29798@bauxite.pathscale.com>
Message-ID: <ada645atkpf.fsf@cisco.com>

 > ok, i have the patch in my tree (along w/ a couple
 > others), i was holding onto them until i got a chance
 > to test them.  shall i send off the MAINTAINERS patch
 > separately?  i expect to be able to get to testing by
 > the end of this week...

whatever is fine, as long as the MAINTAINERS update is in the pipeline
it's not particularly urgent.


From arthur.jones at qlogic.com  Tue Jun 26 15:29:21 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Tue, 26 Jun 2007 15:29:21 -0700
Subject: [ofa-general] Re: [PATCH 27/28] IB/ipath - when we check for LID
	availability, check for lack of interrupts too.
In-Reply-To: <adamyymtld3.fsf@cisco.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234309.3794.784.stgit@bauxite.internal.keyresearch.com>
	<adamyymtld3.fsf@cisco.com>
Message-ID: <20070626222921.GQ29798@bauxite.pathscale.com>

hi roland, ...

On Tue, Jun 26, 2007 at 03:13:44PM -0700, Roland Dreier wrote:
> I didn't apply this either because it depends on 26/28 and I held off
> on that one.  I think checking for interrupts in a low-level driver
> *is* sane though...

yeah, me too, if you _really_ don't want the ipath
LID check, i can respin this one to combine them and
lv out the LID check.  but then the LID check is only
gonna be a few lines, it'll seem even sillier to lv it
out...

arthur


From elsen_david at hotmail.com  Tue Jun 26 17:04:11 2007
From: elsen_david at hotmail.com (david elsen)
Date: Tue, 26 Jun 2007 17:04:11 -0700
Subject: [ofa-general] Open Fabrics iWARP Driver for Chesio T3 card
Message-ID: <BAY118-F277AE0172E19EE5FA0BECA9F0A0@phx.gbl>

Can someone please let me know:

1. What is the latest Open Fabrics Driver for the Chesio T3 cards?

2. Is there any documentation there on The Open Fabrics website to install 
the iWARP driver for the T3 card?

3. Is there any documentation describing how to set the iWARP and Network 
interface for the T3 cards?

David

_________________________________________________________________
Make every IM count. Download Messenger and join the i�m Initiative now. 
It�s free. http://im.live.com/messenger/im/home/?source=TAGHM_June07


From ogerlitz at voltaire.com  Tue Jun 26 22:01:33 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 27 Jun 2007 08:01:33 +0300
Subject: [ewg] Re: [ofa-general] Toward next OFED release (1.3)
In-Reply-To: <adar6nyv7bh.fsf@cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>	<20070626172130.GB26637@minantech.com>
	<ada8xa6wqu3.fsf@cisco.com>	<20070626175343.GB5951@sgi.com>
	<ada4pkuwq5w.fsf@cisco.com>	<468166D4.20204@mellanox.co.il>
	<adar6nyv7bh.fsf@cisco.com>
Message-ID: <4681EF2D.3010002@voltaire.com>

Roland Dreier wrote:
>  > This comment  is aligned with OFED development methodology.
>  > Regarding all kernel modules that are part of Linux: we first push the
>  > change to the kernel and base OFED on this code.
>  > We take kernel patches for bug fixes and portions that are targeted
>  > for the kernel inclusion.
>  > OFED does not come to be a bypass for the Linux kernel development process.
> 
> Right, I think we agree on things here.  I just want to emphasize that
> the best and easiest way to get things into OFED is to get them into
> upstream sources.  And I hope OFED maintainers will start to push back
> on patch submissions to OFED that have not at least been submitted for
> upstream inclusion.

Note that not that OFED 1.1 and 1.2 only include kernel drivers which 
are not upstream, some of them (eg SDP, RDS) never passed any --review-- 
cycle at the relevant mailing lists (openib,netdev,lkml). Now, for OFED 
1.3 there's a suggestion to add rNFS which was also never reviewed.

So "we agree on things here" but it does not happen, do people have 
suggestions how to move forward?

Or.


From jgunthorpe at obsidianresearch.com  Tue Jun 26 22:33:27 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Tue, 26 Jun 2007 23:33:27 -0600
Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement
In-Reply-To: <1182885534.28870.527.camel@hal.voltaire.com>
References: <20070626102045.GS15343@mellanox.co.il>
	<1182862966.10379.425353.camel@hal.voltaire.com>
	<20070626132457.GA29602@mellanox.co.il>
	<20070626180157.GM25653@sashak.voltaire.com>
	<1182885534.28870.527.camel@hal.voltaire.com>
Message-ID: <20070627053327.GH10225@obsidianresearch.com>

On Tue, Jun 26, 2007 at 03:18:55PM -0400, Hal Rosenstock wrote:

> > > > > compatibility name for C type".  This type might not defined e.g. if
> > > > > __STRICT_ANSI__ is set,
> > > > 
> > > > Is strict ANSI a requirement ?
> > 
> > Even if not,
> 
> I was just trying to determine how much further we needed to go down
> this path.

As a general rule if you can compile each of your public headers files
with:

echo '#include "foo.h"' > t.c
gcc -Wall -ansi t.c

You are doing OK. What is in your private .c files isn't that
important (and I'd advocate using -std=gnu99, but I never compile with
VC++ :P).

'gcc -ansi -D_POSIX_SOURCE_' as a minimum is also pretty good.

Jason


From jgunthorpe at obsidianresearch.com  Tue Jun 26 22:40:06 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Tue, 26 Jun 2007 23:40:06 -0600
Subject: [ofa-general] Re: [PATCH 26/28] IB/ipath - print warning if LID
	not acquired and link ACTIVE within one minute
In-Reply-To: <20070626222556.GP29798@bauxite.pathscale.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234303.3794.75856.stgit@bauxite.internal.keyresearch.com>
	<adar6nytle0.fsf@cisco.com>
	<20070626222556.GP29798@bauxite.pathscale.com>
Message-ID: <20070627054006.GI10225@obsidianresearch.com>

On Tue, Jun 26, 2007 at 03:25:56PM -0700, Arthur Jones wrote:

> does this mean that there's a patch pending
> to remove the gazillion link down messages
> in drivers/net?

These days alot of the ethernet drivers use one of the mii phy general
codes that cause those messages to be printed..

The ethernet drivers are a bit of a bad example because there is alot
of variations of the code to monitor the phy state machines so for
consistency with the general mii stuff they have to print the message
on their own. :|

Jason


From tziporet at mellanox.co.il  Tue Jun 26 23:25:00 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Wed, 27 Jun 2007 09:25:00 +0300
Subject: [ewg] RE: [ofa-general] Toward next OFED release (1.3)
References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com><4FB1BCCAE6CAED44A1DC005B1DE06119291460@EPEXCH2.qlogic.org><adamyymv7ah.fsf@cisco.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303C16878@xmb-sjc-216.amer.cisco.com>
Message-ID: <6C2C79E72C305246B504CBA17B5500C9015637A1@mtlexch01.mtl.com>

I think we should try

Tziporet 

-----Original Message-----
From: ewg-bounces at lists.openfabrics.org
[mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Scott
Weitzenkamp (sweitzen)
Sent: Tuesday, June 26, 2007 10:49 PM
To: Roland Dreier (rdreier); Lakshmanan, Madhu
Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org
Subject: RE: [ewg] RE: [ofa-general] Toward next OFED release (1.3)


> I hope there will be some attempt to get these drivers merged 
> upstream too.

How about SDP, are we ready to try to merge it upstream?

Scott
_______________________________________________
ewg mailing list
ewg at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


From dotanb at dev.mellanox.co.il  Wed Jun 27 01:29:14 2007
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Wed, 27 Jun 2007 11:29:14 +0300
Subject: [ofa-general] The low level driver of mlx4 kmalloc 0 bytes in QP
	creation
Message-ID: <46821FDA.5030900@dev.mellanox.co.il>

Hi Roland.

If one creates a QP with 0 WR in the RQ in the kernel level, the low 
level driver of the mlx4
will kmalloc 0 bytes (for the WR IDs of the RQ).
(for example, the IPoIB CM creates such a QP)

Is this is an error?

thanks
Dotan


From vlad at lists.openfabrics.org  Wed Jun 27 02:42:14 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Wed, 27 Jun 2007 02:42:14 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_c_kernel 20070627-0200 daily build status
Message-ID: <20070627094215.51612E608C0@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-mlx4-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.15
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.20
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on powerpc with linux-2.6.17
Passed on ia64 with linux-2.6.13
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.13
Passed on ia64 with linux-2.6.12
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.15
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.16
Passed on powerpc with linux-2.6.13
Passed on x86_64 with linux-2.6.17
Passed on ia64 with linux-2.6.18
Passed on powerpc with linux-2.6.12
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on x86_64 with linux-2.6.14
Passed on ppc64 with linux-2.6.14
Passed on ia64 with linux-2.6.14
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.18
Passed on powerpc with linux-2.6.15
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.16
Passed on ia64 with linux-2.6.17
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.17
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on ppc64 with linux-2.6.13
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on ppc64 with linux-2.6.18-8.el5
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-34.ELsmp
Passed on x86_64 with linux-2.6.18-8.el5
Passed on x86_64 with linux-2.6.18-1.2798.fc6

Failed:


From halr at voltaire.com  Wed Jun 27 04:01:09 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 27 Jun 2007 07:01:09 -0400
Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement
In-Reply-To: <20070627053327.GH10225@obsidianresearch.com>
References: <20070626102045.GS15343@mellanox.co.il>
	<1182862966.10379.425353.camel@hal.voltaire.com>
	<20070626132457.GA29602@mellanox.co.il>
	<20070626180157.GM25653@sashak.voltaire.com>
	<1182885534.28870.527.camel@hal.voltaire.com>
	<20070627053327.GH10225@obsidianresearch.com>
Message-ID: <1182942065.28870.65696.camel@hal.voltaire.com>

On Wed, 2007-06-27 at 01:33, Jason Gunthorpe wrote:
> On Tue, Jun 26, 2007 at 03:18:55PM -0400, Hal Rosenstock wrote:
> 
> > > > > > compatibility name for C type".  This type might not defined e.g. if
> > > > > > __STRICT_ANSI__ is set,
> > > > > 
> > > > > Is strict ANSI a requirement ?
> > > 
> > > Even if not,
> > 
> > I was just trying to determine how much further we needed to go down
> > this path.
> 
> As a general rule if you can compile each of your public headers files
> with:
> 
> echo '#include "foo.h"' > t.c
> gcc -Wall -ansi t.c
> 
> You are doing OK. What is in your private .c files isn't that
> important 

That's what I wasn't sure about. Thanks.

> (and I'd advocate using -std=gnu99, but I never compile with
> VC++ :P).
> 
> 'gcc -ansi -D_POSIX_SOURCE_' as a minimum is also pretty good.

How about:

gcc -Wall -D_XOPEN_SOURCE=600

-- Hal

> Jason


From Mark.Seger at hp.com  Wed Jun 27 06:17:36 2007
From: Mark.Seger at hp.com (Mark Seger)
Date: Wed, 27 Jun 2007 09:17:36 -0400
Subject: [ofa-general] IB performance stats (revisited)
Message-ID: <46826370.4090602@hp.com>

I had posted something about this some time last year but now actually 
have some data to present.
My problem statement with IB is there is no efficient way to get 
time-oriented performance numbers for all types of IB traffic.   As far 
as I know nothing is available for all types of traffic, such as MPI.  
This is further complicated because IB counters do not wrap and as a 
result when the counters are integers, they end up latching in <30 
seconds when under load.  The only way I am aware to do what I want to 
do is by running perfquery AND then clearing the counters after each 
request which by definition prevents anyone else from accessing the 
counters including multiple instances of my program.

To give people a better idea of what I'm talking about, below is an 
extract from a utility I've written called 'collectl' which has been in 
use on HP systems for about 4 years and which we've now Open Sourced at 
http://sourceforge.net/projects/collectl [shameless plug].  In the 
following sample I've requested cpu, network and IB stats (there are 
actually a whole lot of other things you can examine and you can learn 
more at http://collectl.sourceforge.net/index.html).  Anyhow, what 
you're seeing below is a sample taken every second.  At first there is 
no IB traffic.  Then I start a 'netperf' and you can see the IB stats 
jump.  A few seconds later I do a 'ping -f -s50000' to the ib interface 
and you can now see an increase in the network traffic.

#         
<--------CPU--------><-----------Network----------><----------InfiniBand---------->
#Time     cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   KBin  
pktIn  KBOut pktOut Errs
08:48:19    0   0  1046    137      0      4       0       2      0      
0      0      0    0
08:48:20    2   2 18659    170      0     10       0       5    925  
10767  80478  41636    0
08:48:21   14  14 92368   1882      0      9       1      10   3403  
39599 463892 235588    0
08:48:22   14  14 92167   2243      0      8       0       4   3186  
37081 471246 238743    0
08:48:23   12  12 92131   2382      0      3       0       2   4456  
37323 470766 238488    0
08:48:24   13  13 91708   2691      7    106      12     104   7300  
38542 466580 236450    0
08:48:25   14  14 91675   2763     11    175      20     175   7434  
38417 463952 235146    0
08:48:26   13  13 91712   2716     11    174      20     175   7486  
38464 465195 235767    0
08:48:27   14  14 91755   2742     11    171      19     171   7502  
38656 465079 235720    0
08:48:28   13  13 90131   2126     12    178      20     179   8257  
44080 424930 217067    0
08:48:29   13  13 89974   2389     13    191      22     191   7801  
37094 457082 231523    0

here's another display option where you can see just the ipoib traffic 
along with other network stats

# NETWORK STATISTICS (/sec)
#         Num    Name  InPck  InErr OutPck OutErr   Mult   ICmp   
OCmp    IKB    OKB
09:04:51    0     lo:      0      0      0      0      0      0      
0      0      0
09:04:51    1   eth0:     23      0      4      0      0      0      
0      1      0
09:04:51    2   eth1:      0      0      0      0      0      0      
0      0      0
09:04:51    3    ib0:    900      0    900      0      0      0      0   
1775   1779
09:04:51    4   sit0:      0      0      0      0      0      0      
0      0      0
09:04:52    0     lo:      0      0      0      0      0      0      
0      0      0
09:04:52    1   eth0:    127      0    126      0      0      0      
0      8     15
09:04:52    2   eth1:      0      0      0      0      0      0      
0      0      0
09:04:52    3    ib0:   2275      0   2275      0      0      0      0   
4488   4497
09:04:52    4   sit0:      0      0      0      0      0      0      
0      0      0

While this is a relatively light-weight operation (collectl uses <0.1% 
of the cpu), I still do have to call perfquery every second and that 
does generate a little overhead.  Furthermore, since I'm continuously 
resetting the counters multiple instances of my tool or any other tool 
that relies on these counters won't work correctly!

One solution that had been implemented in the Voltaire stack worked 
quite well and that was a loadable module that read/cleared the HCA 
counters, but exported them as wrapping counters in /proc.  That way 
utilities could access the counters in /proc without stepping on each 
others toes.  While still not the best solution, as long as the counters 
don't wrap in the HCA, read/clear is the only way to do what it is I'm 
trying to do, unless of course someone has a better solution.  I also 
realize with 64 bit counters this becomes a non-issue but I'm trying to 
solve the more general case.

comments?  flames?  8-)

-mark


From halr at voltaire.com  Wed Jun 27 06:32:51 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 27 Jun 2007 09:32:51 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <46826370.4090602@hp.com>
References: <46826370.4090602@hp.com>
Message-ID: <1182951169.28870.75880.camel@hal.voltaire.com>

On Wed, 2007-06-27 at 09:17, Mark Seger wrote:
> I had posted something about this some time last year but now actually 
> have some data to present.
> My problem statement with IB is there is no efficient way to get 
> time-oriented performance numbers for all types of IB traffic.   As far 
> as I know nothing is available for all types of traffic, such as MPI. 

Not sure what you mean here. Are you looking for MPI counters ?
 
> This is further complicated because IB counters do not wrap and as a 
> result when the counters are integers, they end up latching in <30 
> seconds when under load.

This is mostly a problem for the data counters. This is what the
extended counters are for.

> The only way I am aware to do what I want to 
> do is by running perfquery AND then clearing the counters after each 
> request which by definition prevents anyone else from accessing the 
> counters including multiple instances of my program.

Yes, it is _bad_ if there are essentially multiple performance managers
resetting the counters.

There's now an experimental performance manager which has been discussed
on the list. The performance data collected can be accessed.

> To give people a better idea of what I'm talking about, below is an 
> extract from a utility I've written called 'collectl' which has been in 
> use on HP systems for about 4 years and which we've now Open Sourced at 
> http://sourceforge.net/projects/collectl [shameless plug].  In the 
> following sample I've requested cpu, network and IB stats (there are 
> actually a whole lot of other things you can examine and you can learn 
> more at http://collectl.sourceforge.net/index.html).

So you are looking for packets/bytes in/out only.

> Anyhow, what 
> you're seeing below is a sample taken every second.  At first there is 
> no IB traffic.  Then I start a 'netperf' and you can see the IB stats 
> jump.  A few seconds later I do a 'ping -f -s50000' to the ib interface 
> and you can now see an increase in the network traffic.
> 
> #         
> <--------CPU--------><-----------Network----------><----------InfiniBand---------->
> #Time     cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   KBin  
> pktIn  KBOut pktOut Errs
> 08:48:19    0   0  1046    137      0      4       0       2      0      
> 0      0      0    0
> 08:48:20    2   2 18659    170      0     10       0       5    925  
> 10767  80478  41636    0
> 08:48:21   14  14 92368   1882      0      9       1      10   3403  
> 39599 463892 235588    0
> 08:48:22   14  14 92167   2243      0      8       0       4   3186  
> 37081 471246 238743    0
> 08:48:23   12  12 92131   2382      0      3       0       2   4456  
> 37323 470766 238488    0
> 08:48:24   13  13 91708   2691      7    106      12     104   7300  
> 38542 466580 236450    0
> 08:48:25   14  14 91675   2763     11    175      20     175   7434  
> 38417 463952 235146    0
> 08:48:26   13  13 91712   2716     11    174      20     175   7486  
> 38464 465195 235767    0
> 08:48:27   14  14 91755   2742     11    171      19     171   7502  
> 38656 465079 235720    0
> 08:48:28   13  13 90131   2126     12    178      20     179   8257  
> 44080 424930 217067    0
> 08:48:29   13  13 89974   2389     13    191      22     191   7801  
> 37094 457082 231523    0
> 
> here's another display option where you can see just the ipoib traffic 
> along with other network stats
> 
> # NETWORK STATISTICS (/sec)
> #         Num    Name  InPck  InErr OutPck OutErr   Mult   ICmp   
> OCmp    IKB    OKB
> 09:04:51    0     lo:      0      0      0      0      0      0      
> 0      0      0
> 09:04:51    1   eth0:     23      0      4      0      0      0      
> 0      1      0
> 09:04:51    2   eth1:      0      0      0      0      0      0      
> 0      0      0
> 09:04:51    3    ib0:    900      0    900      0      0      0      0   
> 1775   1779
> 09:04:51    4   sit0:      0      0      0      0      0      0      
> 0      0      0
> 09:04:52    0     lo:      0      0      0      0      0      0      
> 0      0      0
> 09:04:52    1   eth0:    127      0    126      0      0      0      
> 0      8     15
> 09:04:52    2   eth1:      0      0      0      0      0      0      
> 0      0      0
> 09:04:52    3    ib0:   2275      0   2275      0      0      0      0   
> 4488   4497
> 09:04:52    4   sit0:      0      0      0      0      0      0      
> 0      0      0
> 
> While this is a relatively light-weight operation (collectl uses <0.1% 
> of the cpu), I still do have to call perfquery every second and that 
> does generate a little overhead.  Furthermore, since I'm continuously 
> resetting the counters multiple instances of my tool or any other tool 
> that relies on these counters won't work correctly!
> 
> One solution that had been implemented in the Voltaire stack worked 
> quite well and that was a loadable module that read/cleared the HCA 
> counters, but exported them as wrapping counters in /proc.  That way 
> utilities could access the counters in /proc without stepping on each 
> others toes.  

Once in /proc, how are they all collected up ? Via IPoIB or out of band
ethernet ?

> While still not the best solution, as long as the counters 
> don't wrap in the HCA, read/clear is the only way to do what it is I'm 
> trying to do, unless of course someone has a better solution.

Doesn't have the same problem as doing it the PMA way ? Doesn't this
impact other performance managers ?

> I also 
> realize with 64 bit counters this becomes a non-issue but I'm trying to 
> solve the more general case.

More devices are supporting these and it should be easier to do so with
IBA 1.2.1

-- Hal

> comments?  flames?  8-)
> 
> -mark
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From Mark.Seger at hp.com  Wed Jun 27 07:10:00 2007
From: Mark.Seger at hp.com (Mark Seger)
Date: Wed, 27 Jun 2007 10:10:00 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <1182951169.28870.75880.camel@hal.voltaire.com>
References: <46826370.4090602@hp.com>
	<1182951169.28870.75880.camel@hal.voltaire.com>
Message-ID: <46826FB8.10904@hp.com>

btw - I've cc'd Ed on this so be sure to include him in your replies.

Hal Rosenstock wrote:
> On Wed, 2007-06-27 at 09:17, Mark Seger wrote:
>   
>> I had posted something about this some time last year but now actually 
>> have some data to present.
>> My problem statement with IB is there is no efficient way to get 
>> time-oriented performance numbers for all types of IB traffic.   As far 
>> as I know nothing is available for all types of traffic, such as MPI. 
>>     
>
> Not sure what you mean here. Are you looking for MPI counters ?
>   
sorry for not being clearer.  I'm looking for total aggregate I/O.
>> This is further complicated because IB counters do not wrap and as a 
>> result when the counters are integers, they end up latching in <30 
>> seconds when under load.
>>     
>
> This is mostly a problem for the data counters. This is what the
> extended counters are for
>   
but it's the data counters I'm interested in.
>> The only way I am aware to do what I want to 
>> do is by running perfquery AND then clearing the counters after each 
>> request which by definition prevents anyone else from accessing the 
>> counters including multiple instances of my program.
>>     
>
> Yes, it is _bad_ if there are essentially multiple performance managers
> resetting the counters.
>   
I realize it's bad but since the counters don't wrap I have no alternative.
> There's now an experimental performance manager which has been discussed
> on the list. The performance data collected can be accessed.
>   
alas, since I use this tool on commercial systems, I can't run it 
against experimental code.  perhaps when the experimental becomes real I 
can.  I'll try to find the notes in the archives.
>> To give people a better idea of what I'm talking about, below is an 
>> extract from a utility I've written called 'collectl' which has been in 
>> use on HP systems for about 4 years and which we've now Open Sourced at 
>> http://sourceforge.net/projects/collectl [shameless plug].  In the 
>> following sample I've requested cpu, network and IB stats (there are 
>> actually a whole lot of other things you can examine and you can learn 
>> more at http://collectl.sourceforge.net/index.html).
>>     
>
> So you are looking for packets/bytes in/out only.
>   
That's a good start.  Since I'm using perfquery I'm also reporting 
aggregate error counts as well as you can see in my program output 
below.  The theory is these should rarely be set and if they are, their 
total should be sufficient to highly a problem without taking up a lot 
of screen real estate.
>> Anyhow, what 
>> you're seeing below is a sample taken every second.  At first there is 
>> no IB traffic.  Then I start a 'netperf' and you can see the IB stats 
>> jump.  A few seconds later I do a 'ping -f -s50000' to the ib interface 
>> and you can now see an increase in the network traffic.
>>
>> #         
>> <--------CPU--------><-----------Network----------><----------InfiniBand---------->
>> #Time     cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   KBin  
>> pktIn  KBOut pktOut Errs
>> 08:48:19    0   0  1046    137      0      4       0       2      0      
>> 0      0      0    0
>> 08:48:20    2   2 18659    170      0     10       0       5    925  
>> 10767  80478  41636    0
>> 08:48:21   14  14 92368   1882      0      9       1      10   3403  
>> 39599 463892 235588    0
>> 08:48:22   14  14 92167   2243      0      8       0       4   3186  
>> 37081 471246 238743    0
>> 08:48:23   12  12 92131   2382      0      3       0       2   4456  
>> 37323 470766 238488    0
>> 08:48:24   13  13 91708   2691      7    106      12     104   7300  
>> 38542 466580 236450    0
>> 08:48:25   14  14 91675   2763     11    175      20     175   7434  
>> 38417 463952 235146    0
>> 08:48:26   13  13 91712   2716     11    174      20     175   7486  
>> 38464 465195 235767    0
>> 08:48:27   14  14 91755   2742     11    171      19     171   7502  
>> 38656 465079 235720    0
>> 08:48:28   13  13 90131   2126     12    178      20     179   8257  
>> 44080 424930 217067    0
>> 08:48:29   13  13 89974   2389     13    191      22     191   7801  
>> 37094 457082 231523    0
>>
>> here's another display option where you can see just the ipoib traffic 
>> along with other network stats
>>
>> # NETWORK STATISTICS (/sec)
>> #         Num    Name  InPck  InErr OutPck OutErr   Mult   ICmp   
>> OCmp    IKB    OKB
>> 09:04:51    0     lo:      0      0      0      0      0      0      
>> 0      0      0
>> 09:04:51    1   eth0:     23      0      4      0      0      0      
>> 0      1      0
>> 09:04:51    2   eth1:      0      0      0      0      0      0      
>> 0      0      0
>> 09:04:51    3    ib0:    900      0    900      0      0      0      0   
>> 1775   1779
>> 09:04:51    4   sit0:      0      0      0      0      0      0      
>> 0      0      0
>> 09:04:52    0     lo:      0      0      0      0      0      0      
>> 0      0      0
>> 09:04:52    1   eth0:    127      0    126      0      0      0      
>> 0      8     15
>> 09:04:52    2   eth1:      0      0      0      0      0      0      
>> 0      0      0
>> 09:04:52    3    ib0:   2275      0   2275      0      0      0      0   
>> 4488   4497
>> 09:04:52    4   sit0:      0      0      0      0      0      0      
>> 0      0      0
>>
>> While this is a relatively light-weight operation (collectl uses <0.1% 
>> of the cpu), I still do have to call perfquery every second and that 
>> does generate a little overhead.  Furthermore, since I'm continuously 
>> resetting the counters multiple instances of my tool or any other tool 
>> that relies on these counters won't work correctly!
>>
>> One solution that had been implemented in the Voltaire stack worked 
>> quite well and that was a loadable module that read/cleared the HCA 
>> counters, but exported them as wrapping counters in /proc.  That way 
>> utilities could access the counters in /proc without stepping on each 
>> others toes.  
>>     
>
> Once in /proc, how are they all collected up ? Via IPoIB or out of band
> ethernet ?
>   
Not sure I understand the question.  They're written to /proc via a 
module.  They're collected up via my tool simply reading them back and 
parsing the return string which looks like

ib0-1: 1 0 1 0x0000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

This is essentially the same data reported by get_pcounter reformatted 
to a single line for easier/faster parsing by collectl
>> While still not the best solution, as long as the counters 
>> don't wrap in the HCA, read/clear is the only way to do what it is I'm 
>> trying to do, unless of course someone has a better solution.
>>     
>
> Doesn't have the same problem as doing it the PMA way ? Doesn't this
> impact other performance managers ?
>   
Good point, but I guess I'm between a rock and a hard place.  imho: as 
long as the counters don't wrap this problem will never be solved.

I'm trying to address a specific monitoring scenario, one which collects 
data locally for analysis after a system problem occurs.  I discovered 
long ago that central management solutions may work fine when trying to 
assess the health of many systems, but when something goes wrong with 
the network the only data that can tell you what's going wrong can't get 
back to the management station over the now broken network.  My 
philosophy is if you want to continuously collect reliable performance 
metrics you need to use minimal system resources to do so and that means 
no network communications.  I guess that means people need to decide if 
they want to use collectl to gather local IB stats they have to forego 
doing it globally.

So what is the chance of ever seeing wrapping IB counters?  Probably 
none, right?  8-(

>> I also 
>> realize with 64 bit counters this becomes a non-issue but I'm trying to 
>> solve the more general case.
>>     
>
> More devices are supporting these and it should be easier to do so with
> IBA 1.2.1
>   
Is there an easy way to tell how wide the counters are via software?  Do 
any utilities currently report this?
> -- Hal
>
>   
>> comments?  flames?  8-)
>>
>> -mark
>>
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>>     


From Mark.Seger at hp.com  Wed Jun 27 08:00:48 2007
From: Mark.Seger at hp.com (Mark Seger)
Date: Wed, 27 Jun 2007 11:00:48 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <46826FB8.10904@hp.com>
References: <46826370.4090602@hp.com>
	<1182951169.28870.75880.camel@hal.voltaire.com>
	<46826FB8.10904@hp.com>
Message-ID: <46827BA0.6070008@hp.com>


>>
>> Doesn't have the same problem as doing it the PMA way ? Doesn't this
>> impact other performance managers ?
>
It just occurred to me, how can other performance managers report 
aggregate throughput if the counters don't wrap?  They'll have exactly 
the same problem as me unless they're getting the counters elsewhere.  I 
do recall some switch vendors recommending I ask the switch for the 
counters which they maintain locally but I find that to expensive AND I 
don't want to have to rely on the network as I'd mentioned in my 
previous reply.
-mark


From halr at voltaire.com  Wed Jun 27 08:21:44 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 27 Jun 2007 11:21:44 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <46827BA0.6070008@hp.com>
References: <46826370.4090602@hp.com>
	<1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com>
	<46827BA0.6070008@hp.com>
Message-ID: <1182957688.28870.83013.camel@hal.voltaire.com>

On Wed, 2007-06-27 at 11:00, Mark Seger wrote:
> >>
> >> Doesn't have the same problem as doing it the PMA way ? Doesn't this
> >> impact other performance managers ?
> >
> It just occurred to me, how can other performance managers report 
> aggregate throughput if the counters don't wrap?  They'll have exactly 
> the same problem as me unless they're getting the counters elsewhere.  I 
> do recall some switch vendors recommending I ask the switch for the 
> counters which they maintain locally but I find that to expensive AND I 
> don't want to have to rely on the network as I'd mentioned in my 
> previous reply.

The performance managers deal with the counter stickiness (by resetting
them when they think they need to). They typically export their data
although this is not specified by IBA so it is in a vendor proprietary
manner.

-- Hal

> -mark
> 
> 


From paulvidrine at charter.net  Wed Jun 27 08:24:24 2007
From: paulvidrine at charter.net (BRITISHWEBLOTTERY6/49)
Date: Wed, 27 Jun 2007 8:24:24 -0700
Subject: [ofa-general] ACKNOWLEDGE RECEIPT
Message-ID: <690511366.1182957866127.JavaMail.root@fepweb09>


BRITISH LOTTERY6/49
12 Whitehall , London SW1A 2DY, United Kingdom.
27th June 2007.

Dear Recipient


We wish to congratulate you over your email success in our FREE BRITISH
WEB LOTTERY computer balloting held on 26Th June, 2007. This 
is a Millennium Scientific Computer Game in which email addresses were
used. It is our promotional lottery program aimed at encouraging internet 
users; therefore you do not need to buy ticket to enter for it. You have
been approved for the star prize of 1,006.168 GBP(One million six 
thousand one hundred and sixty-eight Pound Sterling) 

To claim your winning prize you are to contact the appointed agent as 
soon as possible for the immediate release of your winnings: 

Ticket no: 025-1146-1992-750
Serial no:2113-05
Lucky no: 13-15-22-37-39-43
REF NO:BRLFGP2551256/03
Amount won: £1,006,168.00

You are to contact: Mr. Richard Parker
Email:agentrichard_parker at yahoo.co.uk

You must contact the appointed agent with your Full Names, Contact 
Telephone Numbers (Home, Office and Mobile Number and also Fax Number) 
via email to process the immediate payment of your prize. The Validity 
period of the winnings is for 30 working days hence you are expected 
to make your claims immediately. 

Once again congratulations!!!


Sincerely,
Mr. George Scherrer


From halr at voltaire.com  Wed Jun 27 08:30:00 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 27 Jun 2007 11:30:00 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <46826FB8.10904@hp.com>
References: <46826370.4090602@hp.com>
	<1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com>
Message-ID: <1182958191.28870.83536.camel@hal.voltaire.com>

On Wed, 2007-06-27 at 10:10, Mark Seger wrote:
> btw - I've cc'd Ed on this so be sure to include him in your replies.
> 
> Hal Rosenstock wrote:
> > On Wed, 2007-06-27 at 09:17, Mark Seger wrote:
> >   
> >> I had posted something about this some time last year but now actually 
> >> have some data to present.
> >> My problem statement with IB is there is no efficient way to get 
> >> time-oriented performance numbers for all types of IB traffic.   As far 
> >> as I know nothing is available for all types of traffic, such as MPI. 
> >>     
> >
> > Not sure what you mean here. Are you looking for MPI counters ?
> >   
> sorry for not being clearer.  I'm looking for total aggregate I/O.
> >> This is further complicated because IB counters do not wrap and as a 
> >> result when the counters are integers, they end up latching in <30 
> >> seconds when under load.
> >>     
> >
> > This is mostly a problem for the data counters. This is what the
> > extended counters are for
> >   
> but it's the data counters I'm interested in.

Yes, there are data counters in both PortCounters and
PortCountersExtended. The latter is an optional attribute.

> >> The only way I am aware to do what I want to 
> >> do is by running perfquery AND then clearing the counters after each 
> >> request which by definition prevents anyone else from accessing the 
> >> counters including multiple instances of my program.
> >>     
> >
> > Yes, it is _bad_ if there are essentially multiple performance managers
> > resetting the counters.
> >   
> I realize it's bad but since the counters don't wrap I have no alternative.
> > There's now an experimental performance manager which has been discussed
> > on the list. The performance data collected can be accessed.
> >   
> alas, since I use this tool on commercial systems, I can't run it 
> against experimental code.  perhaps when the experimental becomes real I 
> can.

It should be in the OFED 1.3 timeframe. Also, there are vendor
Performance Managers too.

>   I'll try to find the notes in the archives.

I can send you this if you can't find it.

> >> To give people a better idea of what I'm talking about, below is an 
> >> extract from a utility I've written called 'collectl' which has been in 
> >> use on HP systems for about 4 years and which we've now Open Sourced at 
> >> http://sourceforge.net/projects/collectl [shameless plug].  In the 
> >> following sample I've requested cpu, network and IB stats (there are 
> >> actually a whole lot of other things you can examine and you can learn 
> >> more at http://collectl.sourceforge.net/index.html).
> >>     
> >
> > So you are looking for packets/bytes in/out only.
> >   
> That's a good start.  Since I'm using perfquery I'm also reporting 
> aggregate error counts as well as you can see in my program output 
> below.  The theory is these should rarely be set and if they are, their 
> total should be sufficient to highly a problem without taking up a lot 
> of screen real estate.
> >> Anyhow, what 
> >> you're seeing below is a sample taken every second.  At first there is 
> >> no IB traffic.  Then I start a 'netperf' and you can see the IB stats 
> >> jump.  A few seconds later I do a 'ping -f -s50000' to the ib interface 
> >> and you can now see an increase in the network traffic.
> >>
> >> #         
> >> <--------CPU--------><-----------Network----------><----------InfiniBand---------->
> >> #Time     cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   KBin  
> >> pktIn  KBOut pktOut Errs
> >> 08:48:19    0   0  1046    137      0      4       0       2      0      
> >> 0      0      0    0
> >> 08:48:20    2   2 18659    170      0     10       0       5    925  
> >> 10767  80478  41636    0
> >> 08:48:21   14  14 92368   1882      0      9       1      10   3403  
> >> 39599 463892 235588    0
> >> 08:48:22   14  14 92167   2243      0      8       0       4   3186  
> >> 37081 471246 238743    0
> >> 08:48:23   12  12 92131   2382      0      3       0       2   4456  
> >> 37323 470766 238488    0
> >> 08:48:24   13  13 91708   2691      7    106      12     104   7300  
> >> 38542 466580 236450    0
> >> 08:48:25   14  14 91675   2763     11    175      20     175   7434  
> >> 38417 463952 235146    0
> >> 08:48:26   13  13 91712   2716     11    174      20     175   7486  
> >> 38464 465195 235767    0
> >> 08:48:27   14  14 91755   2742     11    171      19     171   7502  
> >> 38656 465079 235720    0
> >> 08:48:28   13  13 90131   2126     12    178      20     179   8257  
> >> 44080 424930 217067    0
> >> 08:48:29   13  13 89974   2389     13    191      22     191   7801  
> >> 37094 457082 231523    0
> >>
> >> here's another display option where you can see just the ipoib traffic 
> >> along with other network stats
> >>
> >> # NETWORK STATISTICS (/sec)
> >> #         Num    Name  InPck  InErr OutPck OutErr   Mult   ICmp   
> >> OCmp    IKB    OKB
> >> 09:04:51    0     lo:      0      0      0      0      0      0      
> >> 0      0      0
> >> 09:04:51    1   eth0:     23      0      4      0      0      0      
> >> 0      1      0
> >> 09:04:51    2   eth1:      0      0      0      0      0      0      
> >> 0      0      0
> >> 09:04:51    3    ib0:    900      0    900      0      0      0      0   
> >> 1775   1779
> >> 09:04:51    4   sit0:      0      0      0      0      0      0      
> >> 0      0      0
> >> 09:04:52    0     lo:      0      0      0      0      0      0      
> >> 0      0      0
> >> 09:04:52    1   eth0:    127      0    126      0      0      0      
> >> 0      8     15
> >> 09:04:52    2   eth1:      0      0      0      0      0      0      
> >> 0      0      0
> >> 09:04:52    3    ib0:   2275      0   2275      0      0      0      0   
> >> 4488   4497
> >> 09:04:52    4   sit0:      0      0      0      0      0      0      
> >> 0      0      0
> >>
> >> While this is a relatively light-weight operation (collectl uses <0.1% 
> >> of the cpu), I still do have to call perfquery every second and that 
> >> does generate a little overhead.  Furthermore, since I'm continuously 
> >> resetting the counters multiple instances of my tool or any other tool 
> >> that relies on these counters won't work correctly!
> >>
> >> One solution that had been implemented in the Voltaire stack worked 
> >> quite well and that was a loadable module that read/cleared the HCA 
> >> counters, but exported them as wrapping counters in /proc.  That way 
> >> utilities could access the counters in /proc without stepping on each 
> >> others toes.  
> >>     
> >
> > Once in /proc, how are they all collected up ? Via IPoIB or out of band
> > ethernet ?
> >   
> Not sure I understand the question.  They're written to /proc via a 
> module.  They're collected up via my tool simply reading them back and 
> parsing the return string which looks like
> 
> ib0-1: 1 0 1 0x0000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 
> This is essentially the same data reported by get_pcounter reformatted 
> to a single line for easier/faster parsing by collectl

I was thinking your tool collects this info from all nodes in the
network somehow.

> >> While still not the best solution, as long as the counters 
> >> don't wrap in the HCA, read/clear is the only way to do what it is I'm 
> >> trying to do, unless of course someone has a better solution.
> >>     
> >
> > Doesn't have the same problem as doing it the PMA way ? Doesn't this
> > impact other performance managers ?
> >   
> Good point, but I guess I'm between a rock and a hard place.  imho: as 
> long as the counters don't wrap this problem will never be solved.

It's the IBTA standard (rather than IETF style counters). I don't think
it's going to change.

> I'm trying to address a specific monitoring scenario, one which collects 
> data locally for analysis after a system problem occurs.  I discovered 
> long ago that central management solutions may work fine when trying to 
> assess the health of many systems, but when something goes wrong with 
> the network the only data that can tell you what's going wrong can't get 
> back to the management station over the now broken network.  My 
> philosophy is if you want to continuously collect reliable performance 
> metrics you need to use minimal system resources to do so and that means 
> no network communications.  I guess that means people need to decide if 
> they want to use collectl to gather local IB stats they have to forego 
> doing it globally.

Guess that's a tradeoff that customers will may need to make. In your
environment, sounds like one turns the performance manager off.

As the PerfMgr is an unarchitected IBA component, there are no events
defined which might help with coordinating this. So either this would
need to be vendor specific, or the two tools will interfere with each
other.

> So what is the chance of ever seeing wrapping IB counters?  Probably 
> none, right?  8-(
> 
> >> I also 
> >> realize with 64 bit counters this becomes a non-issue but I'm trying to 
> >> solve the more general case.
> >>     
> >
> > More devices are supporting these and it should be easier to do so with
> > IBA 1.2.1
> >   
> Is there an easy way to tell how wide the counters are via software?  Do 
> any utilities currently report this?

Yes via the PMA it can be done with some extra queries.

-- Hal

> > -- Hal
> >
> >   
> >> comments?  flames?  8-)
> >>
> >> -mark
> >>
> >> _______________________________________________
> >> general mailing list
> >> general at lists.openfabrics.org
> >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >>
> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >>     
> 


From swise at opengridcomputing.com  Wed Jun 27 08:51:19 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 27 Jun 2007 10:51:19 -0500
Subject: [ofa-general] [PATCH 0/6] iw_cxgb3: Bug Fixes for 2.6.23
Message-ID: <20070627155119.24944.44172.stgit@dell3.ogc.int>


Hey Roland,

Here are some bug fixes to the iw_cxgb3 driver that I'd like included
for 2.6.23. NOTE: Patch 1 requires a firmware interface change, so
there is a version bump to 4.3 included in that patch that hits cxgb3.
This will likely conflict with a previous version change that is in
Jeff's upstream branch.  The net is: we need the firmware version bumped
to 4.3 with these iw_cxgb3 changes.

Thanks,

Steve.

Shortlog:
      iw_cxgb3: Streaming -> RDMA mode transition fixes.
      iw_cxgb3: TERMINATE WRs can hang the tx ofld queue.
      iw_cxgb3: Don't count neg_adv abort_req_rss messages as real aborts.
      iw_cxgb3: ctrl-qp init/clear shouldn't set the gen bit.
      iw_cxgb3: Don't post TID_RELEASE message.
      iw_cxgb3: Don't abort after failures sending the mpa reply.


From swise at opengridcomputing.com  Wed Jun 27 08:51:25 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 27 Jun 2007 10:51:25 -0500
Subject: [ofa-general] [PATCH 1/6] iw_cxgb3: Streaming -> RDMA mode
	transition fixes.
In-Reply-To: <20070627155119.24944.44172.stgit@dell3.ogc.int>
References: <20070627155119.24944.44172.stgit@dell3.ogc.int>
Message-ID: <20070627155124.24944.26940.stgit@dell3.ogc.int>


iw_cxgb3: Streaming -> RDMA mode transition fixes.

Due to a HW issue, our current scheme to transition the connection from
streaming to rdma mode is broken on the passive side.  The firmware
and driver now support a new transition scheme for the passive side:

- driver posts rdma_init_wr (now including the initial receive seqno)

- driver posts last streaming message via TX_DATA message (MPA start
response)

- uP atomically sends the last streaming message and transitions the
tcb to rdma mode.

- driver waits for wr_ack indicating the last streaming message was ACKed.

NOTE: This change also bumps the required firmware version to 4.3.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/cxio_hal.c |    2 -
 drivers/infiniband/hw/cxgb3/cxio_wr.h  |    3 +
 drivers/infiniband/hw/cxgb3/iwch_cm.c  |   82 ++++++++++++--------------------
 drivers/infiniband/hw/cxgb3/iwch_cm.h  |    1 
 drivers/infiniband/hw/cxgb3/iwch_qp.c  |    1 
 drivers/net/cxgb3/version.h            |    2 -
 6 files changed, 38 insertions(+), 53 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c
index 76049af..215bbe5 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_hal.c
+++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c
@@ -833,7 +833,7 @@ int cxio_rdma_init(struct cxio_rdev *rde
 	wqe->ird = cpu_to_be32(attr->ird);
 	wqe->qp_dma_addr = cpu_to_be64(attr->qp_dma_addr);
 	wqe->qp_dma_size = cpu_to_be32(attr->qp_dma_size);
-	wqe->rsvd = 0;
+	wqe->irs = cpu_to_be32(attr->irs);
 	skb->priority = 0;	/* 0=>ToeQ; 1=>CtrlQ */
 	return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb));
 }
diff --git a/drivers/infiniband/hw/cxgb3/cxio_wr.h b/drivers/infiniband/hw/cxgb3/cxio_wr.h
index ff7290e..c84d4ac 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_wr.h
+++ b/drivers/infiniband/hw/cxgb3/cxio_wr.h
@@ -294,6 +294,7 @@ struct t3_rdma_init_attr {
 	u64 qp_dma_addr;
 	u32 qp_dma_size;
 	u32 flags;
+	u32 irs;
 };
 
 struct t3_rdma_init_wr {
@@ -314,7 +315,7 @@ struct t3_rdma_init_wr {
 	__be32 ird;
 	__be64 qp_dma_addr;	/* 7 */
 	__be32 qp_dma_size;	/* 8 */
-	u32 rsvd;
+	u32 irs;
 };
 
 struct t3_genbit {
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index b2faff5..7b8d5aa 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -515,7 +515,7 @@ static void send_mpa_req(struct iwch_ep 
 	req->len = htonl(len);
 	req->param = htonl(V_TX_PORT(ep->l2t->smt_idx) |
 			   V_TX_SNDBUF(snd_win>>15));
-	req->flags = htonl(F_TX_IMM_ACK|F_TX_INIT);
+	req->flags = htonl(F_TX_INIT);
 	req->sndseq = htonl(ep->snd_seq);
 	BUG_ON(ep->mpa_skb);
 	ep->mpa_skb = skb;
@@ -566,7 +566,7 @@ static int send_mpa_reject(struct iwch_e
 	req->len = htonl(mpalen);
 	req->param = htonl(V_TX_PORT(ep->l2t->smt_idx) |
 			   V_TX_SNDBUF(snd_win>>15));
-	req->flags = htonl(F_TX_IMM_ACK|F_TX_INIT);
+	req->flags = htonl(F_TX_INIT);
 	req->sndseq = htonl(ep->snd_seq);
 	BUG_ON(ep->mpa_skb);
 	ep->mpa_skb = skb;
@@ -618,7 +618,7 @@ static int send_mpa_reply(struct iwch_ep
 	req->len = htonl(len);
 	req->param = htonl(V_TX_PORT(ep->l2t->smt_idx) |
 			   V_TX_SNDBUF(snd_win>>15));
-	req->flags = htonl(F_TX_MORE | F_TX_IMM_ACK | F_TX_INIT);
+	req->flags = htonl(F_TX_INIT);
 	req->sndseq = htonl(ep->snd_seq);
 	ep->mpa_skb = skb;
 	state_set(&ep->com, MPA_REP_SENT);
@@ -641,6 +641,7 @@ static int act_establish(struct t3cdev *
 	cxgb3_insert_tid(ep->com.tdev, &t3c_client, ep, tid);
 
 	ep->snd_seq = ntohl(req->snd_isn);
+	ep->rcv_seq = ntohl(req->rcv_isn);
 
 	set_emss(ep, ntohs(req->tcp_opt));
 
@@ -1023,6 +1024,9 @@ static int rx_data(struct t3cdev *tdev, 
 	skb_pull(skb, sizeof(*hdr));
 	skb_trim(skb, dlen);
 
+	ep->rcv_seq += dlen;
+	BUG_ON(ep->rcv_seq != (ntohl(hdr->seq) + dlen));
+
 	switch (state_read(&ep->com)) {
 	case MPA_REQ_SENT:
 		process_mpa_reply(ep, skb);
@@ -1060,7 +1064,6 @@ static int tx_ack(struct t3cdev *tdev, s
 	struct iwch_ep *ep = ctx;
 	struct cpl_wr_ack *hdr = cplhdr(skb);
 	unsigned int credits = ntohs(hdr->credits);
-	enum iwch_qp_attr_mask  mask;
 
 	PDBG("%s ep %p credits %u\n", __FUNCTION__, ep, credits);
 
@@ -1072,30 +1075,6 @@ static int tx_ack(struct t3cdev *tdev, s
 	ep->mpa_skb = NULL;
 	dst_confirm(ep->dst);
 	if (state_read(&ep->com) == MPA_REP_SENT) {
-		struct iwch_qp_attributes attrs;
-
-		/* bind QP to EP and move to RTS */
-		attrs.mpa_attr = ep->mpa_attr;
-		attrs.max_ird = ep->ord;
-		attrs.max_ord = ep->ord;
-		attrs.llp_stream_handle = ep;
-		attrs.next_state = IWCH_QP_STATE_RTS;
-
-		/* bind QP and TID with INIT_WR */
-		mask = IWCH_QP_ATTR_NEXT_STATE |
-				     IWCH_QP_ATTR_LLP_STREAM_HANDLE |
-				     IWCH_QP_ATTR_MPA_ATTR |
-				     IWCH_QP_ATTR_MAX_IRD |
-				     IWCH_QP_ATTR_MAX_ORD;
-
-		ep->com.rpl_err = iwch_modify_qp(ep->com.qp->rhp,
-				     ep->com.qp, mask, &attrs, 1);
-
-		if (!ep->com.rpl_err) {
-			state_set(&ep->com, FPDU_MODE);
-			established_upcall(ep);
-		}
-
 		ep->com.rpl_done = 1;
 		PDBG("waking up ep %p\n", ep);
 		wake_up(&ep->com.waitq);
@@ -1378,6 +1357,7 @@ static int pass_establish(struct t3cdev 
 
 	PDBG("%s ep %p\n", __FUNCTION__, ep);
 	ep->snd_seq = ntohl(req->snd_isn);
+	ep->rcv_seq = ntohl(req->rcv_isn);
 
 	set_emss(ep, ntohs(req->tcp_opt));
 
@@ -1732,10 +1712,8 @@ int iwch_accept_cr(struct iw_cm_id *cm_i
 	struct iwch_qp *qp = get_qhp(h, conn_param->qpn);
 
 	PDBG("%s ep %p tid %u\n", __FUNCTION__, ep, ep->hwtid);
-	if (state_read(&ep->com) == DEAD) {
-		put_ep(&ep->com);
+	if (state_read(&ep->com) == DEAD)
 		return -ECONNRESET;
-	}
 
 	BUG_ON(state_read(&ep->com) != MPA_REQ_RCVD);
 	BUG_ON(!qp);
@@ -1755,17 +1733,8 @@ int iwch_accept_cr(struct iw_cm_id *cm_i
 	ep->ird = conn_param->ird;
 	ep->ord = conn_param->ord;
 	PDBG("%s %d ird %d ord %d\n", __FUNCTION__, __LINE__, ep->ird, ep->ord);
+
 	get_ep(&ep->com);
-	err = send_mpa_reply(ep, conn_param->private_data,
-			     conn_param->private_data_len);
-	if (err) {
-		ep->com.cm_id = NULL;
-		ep->com.qp = NULL;
-		cm_id->rem_ref(cm_id);
-		abort_connection(ep, NULL, GFP_KERNEL);
-		put_ep(&ep->com);
-		return err;
-	}
 
 	/* bind QP to EP and move to RTS */
 	attrs.mpa_attr = ep->mpa_attr;
@@ -1783,16 +1752,29 @@ int iwch_accept_cr(struct iw_cm_id *cm_i
 
 	err = iwch_modify_qp(ep->com.qp->rhp,
 			     ep->com.qp, mask, &attrs, 1);
+	if (err)
+		goto err;
 
-	if (err) {
-		ep->com.cm_id = NULL;
-		ep->com.qp = NULL;
-		cm_id->rem_ref(cm_id);
-		abort_connection(ep, NULL, GFP_KERNEL);
-	} else {
-		state_set(&ep->com, FPDU_MODE);
-		established_upcall(ep);
-	}
+	err = send_mpa_reply(ep, conn_param->private_data,
+			     conn_param->private_data_len);
+	if (err)
+		goto err;
+
+	/* wait for wr_ack */
+	wait_event(ep->com.waitq, ep->com.rpl_done);
+	err = ep->com.rpl_err;
+	if (err)
+		goto err;
+
+	state_set(&ep->com, FPDU_MODE);
+	established_upcall(ep);
+	put_ep(&ep->com);
+	return 0;
+err:
+	ep->com.cm_id = NULL;
+	ep->com.qp = NULL;
+	cm_id->rem_ref(cm_id);
+	abort_connection(ep, NULL, GFP_KERNEL);
 	put_ep(&ep->com);
 	return err;
 }
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.h b/drivers/infiniband/hw/cxgb3/iwch_cm.h
index 21a388c..6107e7c 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.h
@@ -175,6 +175,7 @@ struct iwch_ep {
 	unsigned int atid;
 	u32 hwtid;
 	u32 snd_seq;
+	u32 rcv_seq;
 	struct l2t_entry *l2t;
 	struct dst_entry *dst;
 	struct sk_buff *mpa_skb;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index 714dddb..679b7c1 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -732,6 +732,7 @@ #endif
 	init_attr.qp_dma_addr = qhp->wq.dma_addr;
 	init_attr.qp_dma_size = (1UL << qhp->wq.size_log2);
 	init_attr.flags = rqes_posted(qhp) ? RECVS_POSTED : 0;
+	init_attr.irs = qhp->ep->rcv_seq;
 	PDBG("%s init_attr.rq_addr 0x%x init_attr.rq_size = %d "
 	     "flags 0x%x qpcaps 0x%x\n", __FUNCTION__,
 	     init_attr.rq_addr, init_attr.rq_size,
diff --git a/drivers/net/cxgb3/version.h b/drivers/net/cxgb3/version.h
index b112317..eb508bf 100644
--- a/drivers/net/cxgb3/version.h
+++ b/drivers/net/cxgb3/version.h
@@ -39,6 +39,6 @@ #define DRV_VERSION "1.0-ko"
 
 /* Firmware version */
 #define FW_VERSION_MAJOR 4
-#define FW_VERSION_MINOR 0
+#define FW_VERSION_MINOR 3
 #define FW_VERSION_MICRO 0
 #endif				/* __CHELSIO_VERSION_H */


From swise at opengridcomputing.com  Wed Jun 27 08:51:30 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 27 Jun 2007 10:51:30 -0500
Subject: [ofa-general] [PATCH 2/6] iw_cxgb3: TERMINATE WRs can hang the tx
	ofld queue.
In-Reply-To: <20070627155119.24944.44172.stgit@dell3.ogc.int>
References: <20070627155119.24944.44172.stgit@dell3.ogc.int>
Message-ID: <20070627155130.24944.55771.stgit@dell3.ogc.int>


iw_cxgb3: TERMINATE WRs can hang the tx ofld queue.

Don't set the gen bits nor length bits in the terminate wr.  This is
done by the LLD driver.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_qp.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index 679b7c1..dd89b6b 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -628,9 +628,9 @@ int iwch_post_terminate(struct iwch_qp *
 	/* immediate data starts here. */
 	term = (struct terminate_message *)wqe->send.sgl;
 	build_term_codes(rsp_msg, &term->layer_etype, &term->ecode);
-	build_fw_riwrh((void *)wqe, T3_WR_SEND,
-		       T3_COMPLETION_FLAG | T3_NOTIFY_FLAG, 1,
-		       qhp->ep->hwtid, 5);
+	wqe->send.wrh.op_seop_flags = cpu_to_be32(V_FW_RIWR_OP(T3_WR_SEND) |
+			 V_FW_RIWR_FLAGS(T3_COMPLETION_FLAG | T3_NOTIFY_FLAG));
+	wqe->send.wrh.gen_tid_len = cpu_to_be32(V_FW_RIWR_TID(qhp->ep->hwtid));
 	skb->priority = CPL_PRIORITY_DATA;
 	return cxgb3_ofld_send(qhp->rhp->rdev.t3cdev_p, skb);
 }


From swise at opengridcomputing.com  Wed Jun 27 08:51:35 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 27 Jun 2007 10:51:35 -0500
Subject: [ofa-general] [PATCH 3/6] iw_cxgb3: Don't count neg_adv
	abort_req_rss messages as real aborts.
In-Reply-To: <20070627155119.24944.44172.stgit@dell3.ogc.int>
References: <20070627155119.24944.44172.stgit@dell3.ogc.int>
Message-ID: <20070627155135.24944.44327.stgit@dell3.ogc.int>


iw_cxgb3: Don't count neg_adv abort_req_rss messages as real aborts.

negative advice messages should _not_ count toward the 2 abort requests
needed to indicate an abort request.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_cm.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index 7b8d5aa..4d7c277 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -1465,6 +1465,13 @@ static int peer_abort(struct t3cdev *tde
 	int ret;
 	int state;
 
+	if (is_neg_adv_abort(req->status)) {
+		PDBG("%s neg_adv_abort ep %p tid %d\n", __FUNCTION__, ep,
+		     ep->hwtid);
+		t3_l2t_send_event(ep->com.tdev, ep->l2t);
+		return CPL_RET_BUF_DONE;
+	}
+
 	/*
 	 * We get 2 peer aborts from the HW.  The first one must
 	 * be ignored except for scribbling that we need one more.
@@ -1474,13 +1481,6 @@ static int peer_abort(struct t3cdev *tde
 		return CPL_RET_BUF_DONE;
 	}
 
-	if (is_neg_adv_abort(req->status)) {
-		PDBG("%s neg_adv_abort ep %p tid %d\n", __FUNCTION__, ep,
-		     ep->hwtid);
-		t3_l2t_send_event(ep->com.tdev, ep->l2t);
-		return CPL_RET_BUF_DONE;
-	}
-
 	state = state_read(&ep->com);
 	PDBG("%s ep %p state %u\n", __FUNCTION__, ep, state);
 	switch (state) {


From swise at opengridcomputing.com  Wed Jun 27 08:51:40 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 27 Jun 2007 10:51:40 -0500
Subject: [ofa-general] [PATCH 4/6] iw_cxgb3: ctrl-qp init/clear shouldn't set
	the gen bit.
In-Reply-To: <20070627155119.24944.44172.stgit@dell3.ogc.int>
References: <20070627155119.24944.44172.stgit@dell3.ogc.int>
Message-ID: <20070627155140.24944.61647.stgit@dell3.ogc.int>


iw_cxgb3: ctrl-qp init/clear shouldn't set the gen bit.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/cxio_hal.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c
index 215bbe5..1518b41 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_hal.c
+++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c
@@ -144,7 +144,7 @@ static int cxio_hal_clear_qp_ctx(struct 
 	}
 	wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe));
 	memset(wqe, 0, sizeof(*wqe));
-	build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 3, 1, qpid, 7);
+	build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 3, 0, qpid, 7);
 	wqe->flags = cpu_to_be32(MODQP_WRITE_EC);
 	sge_cmd = qpid << 8 | 3;
 	wqe->sge_cmd = cpu_to_be64(sge_cmd);
@@ -548,7 +548,7 @@ static int cxio_hal_init_ctrl_qp(struct 
 			V_EC_UP_TOKEN(T3_CTL_QP_TID) | F_EC_VALID)) << 32;
 	wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe));
 	memset(wqe, 0, sizeof(*wqe));
-	build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 0, 1,
+	build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 0, 0,
 		       T3_CTL_QP_TID, 7);
 	wqe->flags = cpu_to_be32(MODQP_WRITE_EC);
 	sge_cmd = (3ULL << 56) | FW_RI_SGEEC_START << 8 | 3;


From swise at opengridcomputing.com  Wed Jun 27 08:51:45 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 27 Jun 2007 10:51:45 -0500
Subject: [ofa-general] [PATCH 5/6] iw_cxgb3: Don't post TID_RELEASE message.
In-Reply-To: <20070627155119.24944.44172.stgit@dell3.ogc.int>
References: <20070627155119.24944.44172.stgit@dell3.ogc.int>
Message-ID: <20070627155145.24944.64064.stgit@dell3.ogc.int>


iw_cxgb3: Don't post TID_RELEASE message.

The LLD does this for us in cxgb3_remove_tid().

Also fixed active open failure cases where we shouldn't
be releasing the TID as well.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_cm.c |   13 ++++++++++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index 4d7c277..228721f 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -254,8 +254,6 @@ static void release_ep_resources(struct 
 	cxgb3_remove_tid(ep->com.tdev, (void *)ep, ep->hwtid);
 	dst_release(ep->dst);
 	l2t_release(L2DATA(ep->com.tdev), ep->l2t);
-	if (ep->com.tdev->type == T3B)
-		release_tid(ep->com.tdev, ep->hwtid, NULL);
 	put_ep(&ep->com);
 }
 
@@ -1103,6 +1101,15 @@ static int abort_rpl(struct t3cdev *tdev
 	return CPL_RET_BUF_DONE;
 }
 
+/*
+ * Return whether a failed active open has allocated a TID
+ */
+static inline int act_open_has_tid(int status)
+{
+	return status != CPL_ERR_TCAM_FULL && status != CPL_ERR_CONN_EXIST &&
+	       status != CPL_ERR_ARP_MISS;
+}
+
 static int act_open_rpl(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
 {
 	struct iwch_ep *ep = ctx;
@@ -1112,7 +1119,7 @@ static int act_open_rpl(struct t3cdev *t
 	     status2errno(rpl->status));
 	connect_reply_upcall(ep, status2errno(rpl->status));
 	state_set(&ep->com, DEAD);
-	if (ep->com.tdev->type == T3B)
+	if (ep->com.tdev->type == T3B && act_open_has_tid(rpl->status))
 		release_tid(ep->com.tdev, GET_TID(rpl), NULL);
 	cxgb3_free_atid(ep->com.tdev, ep->atid);
 	dst_release(ep->dst);


From swise at opengridcomputing.com  Wed Jun 27 08:51:50 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 27 Jun 2007 10:51:50 -0500
Subject: [ofa-general] [PATCH 6/6] iw_cxgb3: Don't abort after failures
	sending the mpa reply.
In-Reply-To: <20070627155119.24944.44172.stgit@dell3.ogc.int>
References: <20070627155119.24944.44172.stgit@dell3.ogc.int>
Message-ID: <20070627155150.24944.36124.stgit@dell3.ogc.int>


iw_cxgb3: Don't abort after failures sending the mpa reply.

This bug results in an abort request being sent down _after_ the tid
has been released.  If the tid happens to have been reused, then the
subsequent generation of the tid gets incorrectly aborted.

The thread running iwch_accecpt_cr() must not abort a connection if an
error is returned after being awakened.  If any errors did occur while
iwch_accept_cr() is blocked, then the connection has already been aborted
on the thread processing the error.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_cm.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index 228721f..3b41dc0 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -1781,7 +1781,6 @@ err:
 	ep->com.cm_id = NULL;
 	ep->com.qp = NULL;
 	cm_id->rem_ref(cm_id);
-	abort_connection(ep, NULL, GFP_KERNEL);
 	put_ep(&ep->com);
 	return err;
 }


From arthur.jones at qlogic.com  Wed Jun 27 09:07:44 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Wed, 27 Jun 2007 09:07:44 -0700
Subject: [ofa-general] Re: [PATCH 26/28] IB/ipath - print warning if LID
	not acquired and link ACTIVE within one minute
In-Reply-To: <20070627054006.GI10225@obsidianresearch.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234303.3794.75856.stgit@bauxite.internal.keyresearch.com>
	<adar6nytle0.fsf@cisco.com>
	<20070626222556.GP29798@bauxite.pathscale.com>
	<20070627054006.GI10225@obsidianresearch.com>
Message-ID: <20070627160744.GS29798@bauxite.pathscale.com>

hi jason, ...

On Tue, Jun 26, 2007 at 11:40:06PM -0600, Jason Gunthorpe wrote:
> On Tue, Jun 26, 2007 at 03:25:56PM -0700, Arthur Jones wrote:
> 
> > does this mean that there's a patch pending
> > to remove the gazillion link down messages
> > in drivers/net?
> 
> These days alot of the ethernet drivers use one of the mii phy general
> codes that cause those messages to be printed..
> 
> The ethernet drivers are a bit of a bad example because there is alot
> of variations of the code to monitor the phy state machines so for
> consistency with the general mii stuff they have to print the message
> on their own. :|

ok, thanks for info... so then, what kind of mii
like infrastructure can i use to print out a
message when i expect a LID and i don't get one?

i didn't see anything in the IB code, did i miss
something?

arthur


From arthur.jones at qlogic.com  Wed Jun 27 10:02:42 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Wed, 27 Jun 2007 10:02:42 -0700
Subject: [ofa-general] Re: [PATCH 24/28] IB/ipath - ipath_poll fixups and
	enhancements
In-Reply-To: <aday7id5g5c.fsf@cisco.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com>
	<ada1wg68hp0.fsf@cisco.com>
	<20070621152312.GA14817@bauxite.pathscale.com>
	<aday7id5g5c.fsf@cisco.com>
Message-ID: <20070627170242.GT29798@bauxite.pathscale.com>

hi roland, ...

On Thu, Jun 21, 2007 at 11:14:23AM -0700, Roland Dreier wrote:
>  > the port_rcvhdrttail_kvaddr is the kernel virtual address
>  > allocated in coherent memory where the header queue is updated
>  > by the chip.  we use volatile to make sure the compiler does
>  > not use stale data...
> 
> OK, fair enough, although it seems you may be missing some memory
> barriers to make sure you don't run into the CPU reordering accesses
> to the head/tail pointers.

i had a quick look at the patch and the surrounding
code and i did not catch the problem.  can you be a
little more specific about the suspect code?

thanks...

arthur


From Mark.Seger at hp.com  Wed Jun 27 10:07:26 2007
From: Mark.Seger at hp.com (Mark Seger)
Date: Wed, 27 Jun 2007 13:07:26 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <1182957688.28870.83013.camel@hal.voltaire.com>
References: <46826370.4090602@hp.com>	
	<1182951169.28870.75880.camel@hal.voltaire.com>
	<46826FB8.10904@hp.com>	 <46827BA0.6070008@hp.com>
	<1182957688.28870.83013.camel@hal.voltaire.com>
Message-ID: <4682994E.1020209@hp.com>


>The performance managers deal with the counter stickiness (by resetting
>them when they think they need to). They typically export their data
>although this is not specified by IBA so it is in a vendor proprietary
>manner.
>  
>
so I guess these guys are poor citizens as well...
the real issue as I see it then means nobody can trust the data if 
randon tools randomly reset the counters.  a real shame...
-mark


From Mark.Seger at hp.com  Wed Jun 27 10:10:40 2007
From: Mark.Seger at hp.com (Mark Seger)
Date: Wed, 27 Jun 2007 13:10:40 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <1182958191.28870.83536.camel@hal.voltaire.com>
References: <46826370.4090602@hp.com>	
	<1182951169.28870.75880.camel@hal.voltaire.com>
	<46826FB8.10904@hp.com>
	<1182958191.28870.83536.camel@hal.voltaire.com>
Message-ID: <46829A10.1030202@hp.com>


>>>There's now an experimental performance manager which has been discussed
>>>on the list. The performance data collected can be accessed.
>>>  
>>>      
>>>
>>alas, since I use this tool on commercial systems, I can't run it 
>>against experimental code.  perhaps when the experimental becomes real I 
>>can.
>>    
>>
>
>It should be in the OFED 1.3 timeframe. Also, there are vendor
>Performance Managers too.
>  
>
that would be good.  if I can detect the 1.3 stack I can change my 
monitoring accordingly

>>  I'll try to find the notes in the archives.
>>    
>>
>
>I can send you this if you can't find it
>  
>
that would be great if you can easily lay your hands on it.

>>Good point, but I guess I'm between a rock and a hard place.  imho: as 
>>long as the counters don't wrap this problem will never be solved.
>>    
>>
>
>It's the IBTA standard (rather than IETF style counters). I don't think
>it's going to change.
>  
>
yeah, but it also makes one wonder why non-wrapping counters were chosen 
when the IETF proved years ago that one needs wrapping counter to allow 
concurrent access to them.  sigh...
-mark


From halr at voltaire.com  Wed Jun 27 10:12:18 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 27 Jun 2007 13:12:18 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <4682994E.1020209@hp.com>
References: <46826370.4090602@hp.com>
	<1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com>
	<46827BA0.6070008@hp.com>
	<1182957688.28870.83013.camel@hal.voltaire.com>
	<4682994E.1020209@hp.com>
Message-ID: <1182964334.28870.90291.camel@hal.voltaire.com>

On Wed, 2007-06-27 at 13:07, Mark Seger wrote:
> >The performance managers deal with the counter stickiness (by resetting
> >them when they think they need to). They typically export their data
> >although this is not specified by IBA so it is in a vendor proprietary
> >manner.
> >  
> >
> so I guess these guys are poor citizens as well...

Not sure what you mean.

> the real issue as I see it then means nobody can trust the data if 
> randon tools randomly reset the counters.  a real shame...

I consider this to be a real rather than random app for this. Guess it
depends on what one considers random.

-- Hal

> -mark
> 
> 


From Mark.Seger at hp.com  Wed Jun 27 10:24:36 2007
From: Mark.Seger at hp.com (Mark Seger)
Date: Wed, 27 Jun 2007 13:24:36 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <1182964334.28870.90291.camel@hal.voltaire.com>
References: <46826370.4090602@hp.com>	
	<1182951169.28870.75880.camel@hal.voltaire.com>
	<46826FB8.10904@hp.com>	 <46827BA0.6070008@hp.com>
	<1182957688.28870.83013.camel@hal.voltaire.com>	
	<4682994E.1020209@hp.com>
	<1182964334.28870.90291.camel@hal.voltaire.com>
Message-ID: <46829D54.2040300@hp.com>


Hal Rosenstock wrote:

>On Wed, 2007-06-27 at 13:07, Mark Seger wrote:
>  
>
>>>The performance managers deal with the counter stickiness (by resetting
>>>them when they think they need to). They typically export their data
>>>although this is not specified by IBA so it is in a vendor proprietary
>>>manner.
>>> 
>>>
>>>      
>>>
>>so I guess these guys are poor citizens as well...
>>    
>>
>
>Not sure what you mean.
>  
>
I consider it poor form to zero counters out from someone else who might 
be in the middle of trying to read them and though that's what you mean 
when you said why I was doing was "Yes, it is _bad_ if there are 
essentially multiple performance managers resetting the counters."  I am 
most definately guilty as charged and trying real hard to get out from 
under which is why I suggested a module that exports wrapping counters 
to /proc.  Then, as long as ALL utilities rely on those numbers, the 
module can reset them all likes and nobody interfers with each other 
since there is only one program doing that.

>>the real issue as I see it then means nobody can trust the data if 
>>randon tools randomly reset the counters.  a real shame...
>>    
>>
>
>I consider this to be a real rather than random app for this. Guess it
>depends on what one considers random.
>  
>
I used the term 'random' loosely, but my point is as long as anyone can 
reset the counters and you never know if it's happening or not, you'll 
get bogus data and I'm trying to find a way to get around it.
-mark

>-- Hal
>
>  
>
>>-mark
>>
>>
>>    
>>


From jgunthorpe at obsidianresearch.com  Wed Jun 27 10:37:50 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Wed, 27 Jun 2007 11:37:50 -0600
Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement
In-Reply-To: <1182942065.28870.65696.camel@hal.voltaire.com>
References: <20070626102045.GS15343@mellanox.co.il>
	<1182862966.10379.425353.camel@hal.voltaire.com>
	<20070626132457.GA29602@mellanox.co.il>
	<20070626180157.GM25653@sashak.voltaire.com>
	<1182885534.28870.527.camel@hal.voltaire.com>
	<20070627053327.GH10225@obsidianresearch.com>
	<1182942065.28870.65696.camel@hal.voltaire.com>
Message-ID: <20070627173750.GN32050@obsidianresearch.com>

On Wed, Jun 27, 2007 at 07:01:09AM -0400, Hal Rosenstock wrote:
> > (and I'd advocate using -std=gnu99, but I never compile with
> > VC++ :P).
> > 
> > 'gcc -ansi -D_POSIX_SOURCE_' as a minimum is also pretty good.
> 
> How about:
> 
> gcc -Wall -D_XOPEN_SOURCE=600

I'd recommend -D_POSIX_C_SOURCE=200112 as the 'highest' setting for
portable code. This sould reflect IEEE 1003.1-2004 (aka SUSv3)

Most of the XPG specific stuff is not as easy to get good
documentation on, IMHO.

Jason


From halr at voltaire.com  Wed Jun 27 10:48:04 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 27 Jun 2007 13:48:04 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <46829D54.2040300@hp.com>
References: <46826370.4090602@hp.com>
	<1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com>
	<46827BA0.6070008@hp.com>
	<1182957688.28870.83013.camel@hal.voltaire.com>
	<4682994E.1020209@hp.com>
	<1182964334.28870.90291.camel@hal.voltaire.com>
	<46829D54.2040300@hp.com>
Message-ID: <1182966482.28870.92686.camel@hal.voltaire.com>

On Wed, 2007-06-27 at 13:24, Mark Seger wrote:
> Hal Rosenstock wrote:
> 
> >On Wed, 2007-06-27 at 13:07, Mark Seger wrote:
> >  
> >
> >>>The performance managers deal with the counter stickiness (by resetting
> >>>them when they think they need to). They typically export their data
> >>>although this is not specified by IBA so it is in a vendor proprietary
> >>>manner.
> >>> 
> >>>
> >>>      
> >>>
> >>so I guess these guys are poor citizens as well...
> >>    
> >>
> >
> >Not sure what you mean.
> >  
> >
> I consider it poor form to zero counters out from someone else who might 
> be in the middle of trying to read them and though that's what you mean 
> when you said why I was doing was "Yes, it is _bad_ if there are 
> essentially multiple performance managers resetting the counters."  I am 
> most definately guilty as charged and trying real hard to get out from 
> under which is why I suggested a module that exports wrapping counters 
> to /proc.  Then, as long as ALL utilities rely on those numbers, the 
> module can reset them all likes and nobody interfers with each other 
> since there is only one program doing that.

Another approach would be to have the PMA inform the kernel that the
counters were reset (perhaps including the values prior to the reset) so
that these could be factored into the local set of counters. There is
nothing in the spec that precludes this although it has not been
implemented this way. Then there would't be a reason for a local manager
to have to play these games. It would mean that there would need to be a
performance manager running in the subnet which may not be acceptable
for some installations; not sure.

> >>the real issue as I see it then means nobody can trust the data if 
> >>randon tools randomly reset the counters.  a real shame...
> >>    
> >>
> >
> >I consider this to be a real rather than random app for this. Guess it
> >depends on what one considers random.
> >  
> >
> I used the term 'random' loosely, but my point is as long as anyone can 
> reset the counters and you never know if it's happening or not, you'll 
> get bogus data 

Agreed.

> and I'm trying to find a way to get around it.

Understood.

-- Hal

> -mark
> 
> >-- Hal
> >
> >  
> >
> >>-mark
> >>
> >>
> >>    
> >>
> 


From swise at opengridcomputing.com  Wed Jun 27 11:12:16 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 27 Jun 2007 13:12:16 -0500
Subject: [ofa-general] Open Fabrics iWARP Driver for Chesio T3 card
In-Reply-To: <BAY118-F277AE0172E19EE5FA0BECA9F0A0@phx.gbl>
References: <BAY118-F277AE0172E19EE5FA0BECA9F0A0@phx.gbl>
Message-ID: <4682A880.1030001@opengridcomputing.com>

Hi David,

Answers below:

david elsen wrote:
> Can someone please let me know:
> 
> 1. What is the latest Open Fabrics Driver for the Chesio T3 cards?
>

The latest chelsio rdma driver is in the ofed-1.2 "gold" release.  That 
driver requires firmware from chelsio that is included in their latest 
software kit: cxgb3toe-1.0.104.tar.gz.  Contact chelsio to get this. 
I'll probably be pulling in a patch series for ofed-1.2 to update the 
ofed low level driver, but for now, please use the kit from Chelsio.

I suggest you install OFED-1.2.tgz and then the cxgb3toe-1.0.104 kit on 
top of ofed.  This will install the latest low level driver (used by the 
  rdma driver in the ofed release) and the latest 4.3.0 firmware.


> 2. Is there any documentation there on The Open Fabrics website to 
> install the iWARP driver for the T3 card?
> 

There is a chelsio cxgb3 release note file included in the ofed-1.2 
documentation package.

> 3. Is there any documentation describing how to set the iWARP and 
> Network interface for the T3 cards?
> 

Same release note file.

Hope this helps.

Steve.


From eitan at mellanox.co.il  Wed Jun 27 11:23:38 2007
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: Wed, 27 Jun 2007 21:23:38 +0300
Subject: [ofa-general] IB performance stats (revisited)
References: <46826370.4090602@hp.com><1182951169.28870.75880.camel@hal.voltaire.com>
	<46826FB8.10904@hp.com><46827BA0.6070008@hp.com><1182957688.28870.83013.camel@hal.voltaire.com><4682994E.1020209@hp.com>
	<1182964334.28870.90291.camel@hal.voltaire.com>
Message-ID: <6C2C79E72C305246B504CBA17B5500C901CAD7B4@mtlexch01.mtl.com>


> 
> > >  
> > >
> > so I guess these guys are poor citizens as well...
> 
> Not sure what you mean.
> 
> > the real issue as I see it then means nobody can trust the data if 
> > randon tools randomly reset the counters.  a real shame...

In IBADM ibmon we worked around this issue by inspecting the fact the 
counter value decreases without the ibmon knowledge.

> 
> I consider this to be a real rather than random app for this. 
> Guess it depends on what one considers random.
> 
> -- Hal
> 
> > -mark
> > 
> > 
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 


From jgunthorpe at obsidianresearch.com  Wed Jun 27 11:22:36 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Wed, 27 Jun 2007 12:22:36 -0600
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <1182966482.28870.92686.camel@hal.voltaire.com>
References: <46826370.4090602@hp.com>
	<1182951169.28870.75880.camel@hal.voltaire.com>
	<46826FB8.10904@hp.com> <46827BA0.6070008@hp.com>
	<1182957688.28870.83013.camel@hal.voltaire.com>
	<4682994E.1020209@hp.com>
	<1182964334.28870.90291.camel@hal.voltaire.com>
	<46829D54.2040300@hp.com>
	<1182966482.28870.92686.camel@hal.voltaire.com>
Message-ID: <20070627182236.GO32050@obsidianresearch.com>

On Wed, Jun 27, 2007 at 01:48:04PM -0400, Hal Rosenstock wrote:

> Another approach would be to have the PMA inform the kernel that the
> counters were reset (perhaps including the values prior to the reset) so
> that these could be factored into the local set of counters. There is
> nothing in the spec that precludes this although it has not been
> implemented this way. Then there would't be a reason for a local manager
> to have to play these games. It would mean that there would need to be a
> performance manager running in the subnet which may not be acceptable
> for some installations; not sure.

If you are going to play those sorts of games I think it would better
to just effectively disable the PMA in the mellanox firmware and do
the following:

- The kernel periodically fetches the performance stats and aggregates
  them into a 64 wrapping counter. The kernel sends PMA mads into the
  mellanox firmware to read and reset the counters
- The new 64 bit stats are exported via sysfs/proc/whatever as
  wrapping counters
- When a PMA packet comes in the kernel services it rather than
  passing it on to the chip firmware.

Hopefully in future we could encourage new firmware/sillicon to
support exporting non-wrapping 64 bit counters to the OS so this ugly
mess wouldn't be needed.

FWIW, I agree with Mark that the current locally accessible counters
that are exactly the same as PMA mad values are virtually useless..

Jason


From eitan at mellanox.co.il  Wed Jun 27 11:23:41 2007
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: Wed, 27 Jun 2007 21:23:41 +0300
Subject: [ofa-general] IB performance stats (revisited)
References: <46826370.4090602@hp.com><1182951169.28870.75880.camel@hal.voltaire.com>
	<46826FB8.10904@hp.com><46827BA0.6070008@hp.com><1182957688.28870.83013.camel@hal.voltaire.com><4682994E.1020209@hp.com>
	<1182964334.28870.90291.camel@hal.voltaire.com>
Message-ID: <6C2C79E72C305246B504CBA17B5500C901CAD7B7@mtlexch01.mtl.com>

In the last months it is the second time I hear people complaining the
current monitoring solution in OFA is  integrated with OpenSM.
These people do not use OpenSM but do use OFED. Another drawback if that
no naming is provided and the reporting uses GUIDs.
I also can't hold myself from saying again I think you are going to hit
the wall with the concept of doing the PMA from a single node.

Eitan Zahavi
Senior Engineering Director, Software Architect
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL

 
> -----Original Message-----
> From: general-bounces at lists.openfabrics.org 
> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of 
> Hal Rosenstock
> Sent: Wednesday, June 27, 2007 8:12 PM
> To: Mark Seger
> Cc: Finn, Ed; general at lists.openfabrics.org
> Subject: Re: [ofa-general] IB performance stats (revisited)
> 
> On Wed, 2007-06-27 at 13:07, Mark Seger wrote:
> > >The performance managers deal with the counter stickiness (by 
> > >resetting them when they think they need to). They 
> typically export 
> > >their data although this is not specified by IBA so it is 
> in a vendor 
> > >proprietary manner.
> > >  
> > >
> > so I guess these guys are poor citizens as well...
> 
> Not sure what you mean.
> 
> > the real issue as I see it then means nobody can trust the data if 
> > randon tools randomly reset the counters.  a real shame...
> 
> I consider this to be a real rather than random app for this. 
> Guess it depends on what one considers random.
> 
> -- Hal
> 
> > -mark
> > 
> > 
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 


From rdreier at cisco.com  Wed Jun 27 12:08:39 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 27 Jun 2007 12:08:39 -0700
Subject: [ofa-general] Re: The low level driver of mlx4 kmalloc 0 bytes in QP
	creation
In-Reply-To: <46821FDA.5030900@dev.mellanox.co.il> (Dotan Barak's message of
	"Wed, 27 Jun 2007 11:29:14 +0300")
References: <46821FDA.5030900@dev.mellanox.co.il>
Message-ID: <adaodj1rz9k.fsf@cisco.com>

 > If one creates a QP with 0 WR in the RQ in the kernel level, the low
 > level driver of the mlx4
 > will kmalloc 0 bytes (for the WR IDs of the RQ).
 > (for example, the IPoIB CM creates such a QP)
 > 
 > Is this is an error?

The consensus seems to be that kmalloc(0) is OK, although various
2.6.22-rc kernels printed big tracebacks when it happens.  I think
getting rid of the kmalloc(0) in mlx4 would make the code more
complicated for no real gain.

 - R.


From rdreier at cisco.com  Wed Jun 27 12:13:54 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 27 Jun 2007 12:13:54 -0700
Subject: [ofa-general] Re: [PATCH 24/28] IB/ipath - ipath_poll fixups and
	enhancements
In-Reply-To: <20070627170242.GT29798@bauxite.pathscale.com> (Arthur Jones's
	message of "Wed, 27 Jun 2007 10:02:42 -0700")
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com>
	<ada1wg68hp0.fsf@cisco.com>
	<20070621152312.GA14817@bauxite.pathscale.com>
	<aday7id5g5c.fsf@cisco.com>
	<20070627170242.GT29798@bauxite.pathscale.com>
Message-ID: <adak5tprz0t.fsf@cisco.com>

 > > OK, fair enough, although it seems you may be missing some memory
 > > barriers to make sure you don't run into the CPU reordering accesses
 > > to the head/tail pointers.
 > 
 > i had a quick look at the patch and the surrounding
 > code and i did not catch the problem.  can you be a
 > little more specific about the suspect code?

I'm not sure there's a bug there.  But the patch in question does

 > +       tail = *(volatile u64 *)pd->port_rcvhdrtail_kvaddr;

with no memory ordering.  The volatile makes sure the compiler puts
that read where you wrote it, but there's no guarantee that the CPU
executes it anywhere remotely close to where it is in the code.  Later
on you have

 > +       if (tail != head ||
 > +           test_bit(IPATH_PORT_WAITING_RCV, &pd->int_flag)) {

etc., and the CPU might speculate those test far ahead of actually
reading the port_rcvhdrttail_kvaddr value, which means you might end
up executing code based on a guess about tail != head that is not true
at the time it speculates the branch, but by the time it does get to
actually check its speculation, the guess has become true.

Just something to think about...


From rdreier at cisco.com  Wed Jun 27 12:14:44 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 27 Jun 2007 12:14:44 -0700
Subject: [ofa-general] Re: [PATCH 0/6] iw_cxgb3: Bug Fixes for 2.6.23
In-Reply-To: <20070627155119.24944.44172.stgit@dell3.ogc.int> (Steve Wise's
	message of "Wed, 27 Jun 2007 10:51:19 -0500")
References: <20070627155119.24944.44172.stgit@dell3.ogc.int>
Message-ID: <adafy4dryzf.fsf@cisco.com>

 > Here are some bug fixes to the iw_cxgb3 driver that I'd like included
 > for 2.6.23. NOTE: Patch 1 requires a firmware interface change, so
 > there is a version bump to 4.3 included in that patch that hits cxgb3.
 > This will likely conflict with a previous version change that is in
 > Jeff's upstream branch.  The net is: we need the firmware version bumped
 > to 4.3 with these iw_cxgb3 changes.

OK, I'll probably pull this into my tree and hold off on asking Linus
to pull until after he pulls Jeff's net driver tree.  Once that
happens I'll fix up any conflicts and ask Linus to pull.


From swise at opengridcomputing.com  Wed Jun 27 12:31:55 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 27 Jun 2007 14:31:55 -0500
Subject: [ofa-general] Re: [PATCH 0/6] iw_cxgb3: Bug Fixes for 2.6.23
In-Reply-To: <adafy4dryzf.fsf@cisco.com>
References: <20070627155119.24944.44172.stgit@dell3.ogc.int>
	<adafy4dryzf.fsf@cisco.com>
Message-ID: <4682BB2B.7030002@opengridcomputing.com>

Roland Dreier wrote:
>  > Here are some bug fixes to the iw_cxgb3 driver that I'd like included
>  > for 2.6.23. NOTE: Patch 1 requires a firmware interface change, so
>  > there is a version bump to 4.3 included in that patch that hits cxgb3.
>  > This will likely conflict with a previous version change that is in
>  > Jeff's upstream branch.  The net is: we need the firmware version bumped
>  > to 4.3 with these iw_cxgb3 changes.
> 
> OK, I'll probably pull this into my tree and hold off on asking Linus
> to pull until after he pulls Jeff's net driver tree.  Once that
> happens I'll fix up any conflicts and ask Linus to pull.

Sounds good.

Thanks,

Steve.


From arthur.jones at qlogic.com  Wed Jun 27 13:10:18 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Wed, 27 Jun 2007 13:10:18 -0700
Subject: [ofa-general] Re: [PATCH 24/28] IB/ipath - ipath_poll fixups and
	enhancements
In-Reply-To: <adak5tprz0t.fsf@cisco.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com>
	<ada1wg68hp0.fsf@cisco.com>
	<20070621152312.GA14817@bauxite.pathscale.com>
	<aday7id5g5c.fsf@cisco.com>
	<20070627170242.GT29798@bauxite.pathscale.com>
	<adak5tprz0t.fsf@cisco.com>
Message-ID: <20070627201018.GY29798@bauxite.pathscale.com>

hi roland, ...

On Wed, Jun 27, 2007 at 12:13:54PM -0700, Roland Dreier wrote:
>  > > OK, fair enough, although it seems you may be missing some memory
>  > > barriers to make sure you don't run into the CPU reordering accesses
>  > > to the head/tail pointers.
>  > 
>  > i had a quick look at the patch and the surrounding
>  > code and i did not catch the problem.  can you be a
>  > little more specific about the suspect code?
> 
> I'm not sure there's a bug there.  But the patch in question does
> 
>  > +       tail = *(volatile u64 *)pd->port_rcvhdrtail_kvaddr;
> 
> with no memory ordering.  The volatile makes sure the compiler puts
> that read where you wrote it, but there's no guarantee that the CPU
> executes it anywhere remotely close to where it is in the code.  Later
> on you have

agreed.

>  > +       if (tail != head ||
>  > +           test_bit(IPATH_PORT_WAITING_RCV, &pd->int_flag)) {
> 
> etc., and the CPU might speculate those test far ahead of actually
> reading the port_rcvhdrttail_kvaddr value, which means you might end
> up executing code based on a guess about tail != head that is not true
> at the time it speculates the branch, but by the time it does get to
> actually check its speculation, the guess has become true.

i agree that the &pd->int_flag result could be
valid before tail has become valid and hence
when waiting for tail to be valid we're out of
order wrt the int_flag load.  but this logic is
completely async to the head != tail test, so
the out-of-order result there can not hurt us...

arthur


From halr at voltaire.com  Wed Jun 27 14:02:09 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 27 Jun 2007 17:02:09 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C901CAD7B4@mtlexch01.mtl.com>
References: <46826370.4090602@hp.com>
	<1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com>
	<46827BA0.6070008@hp.com><1182957688.28870.83013.camel@hal.voltaire.com>
	<4682994E.1020209@hp.com>
	<1182964334.28870.90291.camel@hal.voltaire.com>
	<6C2C79E72C305246B504CBA17B5500C901CAD7B4@mtlexch01.mtl.com>
Message-ID: <1182978125.28870.105782.camel@hal.voltaire.com>

On Wed, 2007-06-27 at 14:23, Eitan Zahavi wrote:
> > 
> > > >  
> > > >
> > > so I guess these guys are poor citizens as well...
> > 
> > Not sure what you mean.
> > 
> > > the real issue as I see it then means nobody can trust the data if 
> > > randon tools randomly reset the counters.  a real shame...
> 
> In IBADM ibmon we worked around this issue by inspecting the fact the 
> counter value decreases without the ibmon knowledge.

That is detected in the current PerfMgr as well: it is an "out of band"
clear. The question is the loss of data accuracy from the last snapshot
to the new lower values.

-- Hal

> > 
> > I consider this to be a real rather than random app for this. 
> > Guess it depends on what one considers random.
> > 
> > -- Hal
> > 
> > > -mark
> > > 
> > > 
> > 
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > 
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
> > 


From halr at voltaire.com  Wed Jun 27 14:08:18 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 27 Jun 2007 17:08:18 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C901CAD7B7@mtlexch01.mtl.com>
References: <46826370.4090602@hp.com>
	<1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com>
	<46827BA0.6070008@hp.com><1182957688.28870.83013.camel@hal.voltaire.com>
	<4682994E.1020209@hp.com>
	<1182964334.28870.90291.camel@hal.voltaire.com>
	<6C2C79E72C305246B504CBA17B5500C901CAD7B7@mtlexch01.mtl.com>
Message-ID: <1182978496.28870.106214.camel@hal.voltaire.com>

On Wed, 2007-06-27 at 14:23, Eitan Zahavi wrote:
> In the last months it is the second time I hear people complaining the
> current monitoring solution in OFA is  integrated with OpenSM.

I must have missed this both times (didn't see this in Mark's post) and
the statement itself is somewhat inaccurate as well.

> These people do not use OpenSM but do use OFED.

I'm not sure I'm following what you mean here.

If you mean that some people want to run PerfMgr without the SM/SA
aspects (so that they can run a vendor based SM), that is the next thing
we are adding to the implementation.

>  Another drawback if that
> no naming is provided and the reporting uses GUIDs.

Naming is provided via NodeDescription.

> I also can't hold myself from saying again I think you are going to hit
> the wall with the concept of doing the PMA from a single node.

If you are referring to the fact the PerMgr is currently not
distributed, that will be done as has been stated before.

-- Hal

> Eitan Zahavi
> Senior Engineering Director, Software Architect
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
> 
>  
> 
> > -----Original Message-----
> > From: general-bounces at lists.openfabrics.org 
> > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of 
> > Hal Rosenstock
> > Sent: Wednesday, June 27, 2007 8:12 PM
> > To: Mark Seger
> > Cc: Finn, Ed; general at lists.openfabrics.org
> > Subject: Re: [ofa-general] IB performance stats (revisited)
> > 
> > On Wed, 2007-06-27 at 13:07, Mark Seger wrote:
> > > >The performance managers deal with the counter stickiness (by 
> > > >resetting them when they think they need to). They 
> > typically export 
> > > >their data although this is not specified by IBA so it is 
> > in a vendor 
> > > >proprietary manner.
> > > >  
> > > >
> > > so I guess these guys are poor citizens as well...
> > 
> > Not sure what you mean.
> > 
> > > the real issue as I see it then means nobody can trust the data if 
> > > randon tools randomly reset the counters.  a real shame...
> > 
> > I consider this to be a real rather than random app for this. 
> > Guess it depends on what one considers random.
> > 
> > -- Hal
> > 
> > > -mark
> > > 
> > > 
> > 
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > 
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
> > 


From halr at voltaire.com  Wed Jun 27 14:13:40 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 27 Jun 2007 17:13:40 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <20070627182236.GO32050@obsidianresearch.com>
References: <46826370.4090602@hp.com>
	<1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com>
	<46827BA0.6070008@hp.com>
	<1182957688.28870.83013.camel@hal.voltaire.com>
	<4682994E.1020209@hp.com>
	<1182964334.28870.90291.camel@hal.voltaire.com>
	<46829D54.2040300@hp.com>
	<1182966482.28870.92686.camel@hal.voltaire.com>
	<20070627182236.GO32050@obsidianresearch.com>
Message-ID: <1182978803.28870.106563.camel@hal.voltaire.com>

On Wed, 2007-06-27 at 14:22, Jason Gunthorpe wrote:
> On Wed, Jun 27, 2007 at 01:48:04PM -0400, Hal Rosenstock wrote:
> 
> > Another approach would be to have the PMA inform the kernel that the
> > counters were reset (perhaps including the values prior to the reset) so
> > that these could be factored into the local set of counters. There is
> > nothing in the spec that precludes this although it has not been
> > implemented this way. Then there would't be a reason for a local manager
> > to have to play these games. It would mean that there would need to be a
> > performance manager running in the subnet which may not be acceptable
> > for some installations; not sure.
> 
> If you are going to play those sorts of games I think it would better
> to just effectively disable the PMA in the mellanox firmware and do
> the following:
> 
> - The kernel periodically fetches the performance stats and aggregates
>   them into a 64 wrapping counter. The kernel sends PMA mads into the
>   mellanox firmware to read and reset the counters
> - The new 64 bit stats are exported via sysfs/proc/whatever as
>   wrapping counters
> - When a PMA packet comes in the kernel services it rather than
>   passing it on to the chip firmware.

In this way, both 32 and 64 bit counters could be presented by the PMA
but how would it know when the a counter has maxed out in terms of the
PMA and how would a remote clear be handled ?

-- Hal

> Hopefully in future we could encourage new firmware/sillicon to
> support exporting non-wrapping 64 bit counters to the OS so this ugly
> mess wouldn't be needed.
> 
> FWIW, I agree with Mark that the current locally accessible counters
> that are exactly the same as PMA mad values are virtually useless..
> 
> Jason


From jgunthorpe at obsidianresearch.com  Wed Jun 27 14:26:51 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Wed, 27 Jun 2007 15:26:51 -0600
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <1182978803.28870.106563.camel@hal.voltaire.com>
References: <1182951169.28870.75880.camel@hal.voltaire.com>
	<46826FB8.10904@hp.com> <46827BA0.6070008@hp.com>
	<1182957688.28870.83013.camel@hal.voltaire.com>
	<4682994E.1020209@hp.com>
	<1182964334.28870.90291.camel@hal.voltaire.com>
	<46829D54.2040300@hp.com>
	<1182966482.28870.92686.camel@hal.voltaire.com>
	<20070627182236.GO32050@obsidianresearch.com>
	<1182978803.28870.106563.camel@hal.voltaire.com>
Message-ID: <20070627212651.GQ32050@obsidianresearch.com>

On Wed, Jun 27, 2007 at 05:13:40PM -0400, Hal Rosenstock wrote:

> > - The kernel periodically fetches the performance stats and aggregates
> >   them into a 64 wrapping counter. The kernel sends PMA mads into the
> >   mellanox firmware to read and reset the counters
> > - The new 64 bit stats are exported via sysfs/proc/whatever as
> >   wrapping counters
> > - When a PMA packet comes in the kernel services it rather than
> >   passing it on to the chip firmware.
> 
> In this way, both 32 and 64 bit counters could be presented by the PMA
> but how would it know when the a counter has maxed out in terms of the
> PMA and how would a remote clear be handled ?

Each time the counter is cleared the kernel would store the 64 bit
value as the 'last PMA counter'. Then the calculation is just

if ((current - stored) >= saturation)
  return saturation;
return current - stored;

After 2**64 counts the saturation computation will stop working. It
would take 24 years of constant maxed out data transfer for a 12x QDR
link to wrap a 64 bit dword byte counter.

A nice side benifit would that linux drivers could present a
consistent PMA interface with new extended 64 bit counters even with
older hardware.

Jason


From Mark.Seger at hp.com  Wed Jun 27 14:37:19 2007
From: Mark.Seger at hp.com (Mark Seger)
Date: Wed, 27 Jun 2007 17:37:19 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <20070627212651.GQ32050@obsidianresearch.com>
References: <1182951169.28870.75880.camel@hal.voltaire.com>
	<46826FB8.10904@hp.com> <46827BA0.6070008@hp.com>
	<1182957688.28870.83013.camel@hal.voltaire.com>
	<4682994E.1020209@hp.com>
	<1182964334.28870.90291.camel@hal.voltaire.com>
	<46829D54.2040300@hp.com>
	<1182966482.28870.92686.camel@hal.voltaire.com>
	<20070627182236.GO32050@obsidianresearch.com>
	<1182978803.28870.106563.camel@hal.voltaire.com>
	<20070627212651.GQ32050@obsidianresearch.com>
Message-ID: <4682D88F.9040806@hp.com>


Jason Gunthorpe wrote:
> On Wed, Jun 27, 2007 at 05:13:40PM -0400, Hal Rosenstock wrote:
>
>   
>>> - The kernel periodically fetches the performance stats and aggregates
>>>   them into a 64 wrapping counter. The kernel sends PMA mads into the
>>>   mellanox firmware to read and reset the counters
>>> - The new 64 bit stats are exported via sysfs/proc/whatever as
>>>   wrapping counters
>>> - When a PMA packet comes in the kernel services it rather than
>>>   passing it on to the chip firmware.
>>>       
>> In this way, both 32 and 64 bit counters could be presented by the PMA
>> but how would it know when the a counter has maxed out in terms of the
>> PMA and how would a remote clear be handled ?
>>     
>
> Each time the counter is cleared the kernel would store the 64 bit
> value as the 'last PMA counter'. Then the calculation is just
>
> if ((current - stored) >= saturation)
>   return saturation;
> return current - stored;
>
> After 2**64 counts the saturation computation will stop working. It
> would take 24 years of constant maxed out data transfer for a 12x QDR
> link to wrap a 64 bit dword byte counter.
>
> A nice side benifit would that linux drivers could present a
> consistent PMA interface with new extended 64 bit counters even with
> older hardware.
>   
I agree for 64 bit counters but for 32 bit ones it gets a little more 
complicated because they can max out in under a minute!  Since it's 
tough to decide when a counter has maxed out you therefore HAVE to clear 
it every time!  This means your monitoring utility will need to examine 
the /proc counters within that 'max-out' window or the counters will 
latch on
you.  If you wait too long to look you're screwed and now we're back to 
the fact that the counters don't wrap.

what I'd like to hear is the sense of the community whether or not 
something like this would be acceptable.  if it is, that means nobody is 
allowed to clear counters on their own AND that the single source for 
counter information then becomes /proc.

-mark


From halr at voltaire.com  Wed Jun 27 14:44:36 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 27 Jun 2007 17:44:36 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <20070627212651.GQ32050@obsidianresearch.com>
References: <1182951169.28870.75880.camel@hal.voltaire.com>
	<46826FB8.10904@hp.com> <46827BA0.6070008@hp.com>
	<1182957688.28870.83013.camel@hal.voltaire.com>
	<4682994E.1020209@hp.com>
	<1182964334.28870.90291.camel@hal.voltaire.com>
	<46829D54.2040300@hp.com>
	<1182966482.28870.92686.camel@hal.voltaire.com>
	<20070627182236.GO32050@obsidianresearch.com>
	<1182978803.28870.106563.camel@hal.voltaire.com>
	<20070627212651.GQ32050@obsidianresearch.com>
Message-ID: <1182980675.28870.108616.camel@hal.voltaire.com>

On Wed, 2007-06-27 at 17:26, Jason Gunthorpe wrote:
> On Wed, Jun 27, 2007 at 05:13:40PM -0400, Hal Rosenstock wrote:
> 
> > > - The kernel periodically fetches the performance stats and aggregates
> > >   them into a 64 wrapping counter. The kernel sends PMA mads into the
> > >   mellanox firmware to read and reset the counters
> > > - The new 64 bit stats are exported via sysfs/proc/whatever as
> > >   wrapping counters
> > > - When a PMA packet comes in the kernel services it rather than
> > >   passing it on to the chip firmware.
> > 
> > In this way, both 32 and 64 bit counters could be presented by the PMA
> > but how would it know when the a counter has maxed out in terms of the
> > PMA and how would a remote clear be handled ?
> 
> Each time the counter is cleared

So it doesn't matter whether the clear is local (from Linux) or remote
(from IB), right ?

>  the kernel would store the 64 bit
> value as the 'last PMA counter'. Then the calculation is just
> 
> if ((current - stored) >= saturation)
>   return saturation;
> return current - stored;
> 
> After 2**64 counts the saturation computation will stop working. It
> would take 24 years of constant maxed out data transfer for a 12x QDR
> link to wrap a 64 bit dword byte counter.

Is that even for the 4 octet counts ? (I didn't calculate this out).

> A nice side benifit would that linux drivers could present a
> consistent PMA interface with new extended 64 bit counters even with
> older hardware.

Indeed.

The question may now be how to get from where we are today to this
model.

-- Hal

> Jason


From rick.jones2 at hp.com  Wed Jun 27 14:49:37 2007
From: rick.jones2 at hp.com (Rick Jones)
Date: Wed, 27 Jun 2007 14:49:37 -0700
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <20070627212651.GQ32050@obsidianresearch.com>
References: <1182951169.28870.75880.camel@hal.voltaire.com>	<46826FB8.10904@hp.com>
	<46827BA0.6070008@hp.com>	<1182957688.28870.83013.camel@hal.voltaire.com>	<4682994E.1020209@hp.com>	<1182964334.28870.90291.camel@hal.voltaire.com>	<46829D54.2040300@hp.com>	<1182966482.28870.92686.camel@hal.voltaire.com>	<20070627182236.GO32050@obsidianresearch.com>	<1182978803.28870.106563.camel@hal.voltaire.com>
	<20070627212651.GQ32050@obsidianresearch.com>
Message-ID: <4682DB71.5080504@hp.com>

> After 2**64 counts the saturation computation will stop working. It
> would take 24 years of constant maxed out data transfer for a 12x QDR
> link to wrap a 64 bit dword byte counter.

Drifting a bit, and perhaps not properly interpreting some of the TLAs, 
but I suspect that if we go back oh 20ish years or so, we could find 
similar calculations being put forth to show how very long a 32-bit 
counter would last :)  Perhaps it isn't too too early to start talking 
about > 64 bit counters...

rick jones


From halr at voltaire.com  Wed Jun 27 14:49:32 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 27 Jun 2007 17:49:32 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <4682D88F.9040806@hp.com>
References: <1182951169.28870.75880.camel@hal.voltaire.com>
	<46826FB8.10904@hp.com> <46827BA0.6070008@hp.com>
	<1182957688.28870.83013.camel@hal.voltaire.com>
	<4682994E.1020209@hp.com>
	<1182964334.28870.90291.camel@hal.voltaire.com>
	<46829D54.2040300@hp.com>
	<1182966482.28870.92686.camel@hal.voltaire.com>
	<20070627182236.GO32050@obsidianresearch.com>
	<1182978803.28870.106563.camel@hal.voltaire.com>
	<20070627212651.GQ32050@obsidianresearch.com> <4682D88F.9040806@hp.com>
Message-ID: <1182980966.28870.108877.camel@hal.voltaire.com>

On Wed, 2007-06-27 at 17:37, Mark Seger wrote:
> Jason Gunthorpe wrote:
> > On Wed, Jun 27, 2007 at 05:13:40PM -0400, Hal Rosenstock wrote:
> >
> >   
> >>> - The kernel periodically fetches the performance stats and aggregates
> >>>   them into a 64 wrapping counter. The kernel sends PMA mads into the
> >>>   mellanox firmware to read and reset the counters
> >>> - The new 64 bit stats are exported via sysfs/proc/whatever as
> >>>   wrapping counters
> >>> - When a PMA packet comes in the kernel services it rather than
> >>>   passing it on to the chip firmware.
> >>>       
> >> In this way, both 32 and 64 bit counters could be presented by the PMA
> >> but how would it know when the a counter has maxed out in terms of the
> >> PMA and how would a remote clear be handled ?
> >>     
> >
> > Each time the counter is cleared the kernel would store the 64 bit
> > value as the 'last PMA counter'. Then the calculation is just
> >
> > if ((current - stored) >= saturation)
> >   return saturation;
> > return current - stored;
> >
> > After 2**64 counts the saturation computation will stop working. It
> > would take 24 years of constant maxed out data transfer for a 12x QDR
> > link to wrap a 64 bit dword byte counter.
> >
> > A nice side benifit would that linux drivers could present a
> > consistent PMA interface with new extended 64 bit counters even with
> > older hardware.
> >   
> I agree for 64 bit counters but for 32 bit ones it gets a little more 
> complicated because they can max out in under a minute!  Since it's 
> tough to decide when a counter has maxed out you therefore HAVE to clear 
> it every time!  This means your monitoring utility will need to examine 
> the /proc counters within that 'max-out' window or the counters will 
> latch on
> you.  If you wait too long to look you're screwed and now we're back to 
> the fact that the counters don't wrap.
> 
> what I'd like to hear is the sense of the community whether or not 
> something like this would be acceptable.  if it is, that means nobody is 
> allowed to clear counters on their own

Per the IBA spec, I don't think you can legislate this away. IB supports
a standard way to remotely clear counters (and the various Performance
Managers or other similar tools utilize this clearing feature).

-- Hal

>  AND that the single source for counter information then becomes /proc.
> 
> -mark
> 
> 


From DavidRobb at comsci.co.uk  Wed Jun 27 15:16:09 2007
From: DavidRobb at comsci.co.uk (David Robb)
Date: Wed, 27 Jun 2007 23:16:09 +0100
Subject: [ofa-general] Infiniband Problems
In-Reply-To: <467AD9FB.1030508@comsci.co.uk>
References: <467ACAD6.8000304@comsci.co.uk>
	<adavedh3yve.fsf@cisco.com>	<467AD385.3040500@comsci.co.uk>
	<adamyyt3wzr.fsf@cisco.com> <467AD9FB.1030508@comsci.co.uk>
Message-ID: <4682E1A9.3070203@comsci.co.uk>


David Robb wrote:
>
> Roland Dreier wrote:
>>  > Quite possibly, we are using an IBV_QPT_RC transport type. The code
>>  > simply adds another work request with ibv_post_srq_recv(...) after
>>  > each packet is processed. Am I correct in thinking it should start 
>> out
>>  > with a stack of work requests in case another packet arrives before
>>  > the current one has been processed?
>>
>> That seems a lot more sensible to me.
Have now setup things as suggested and getting a very healthy transfer 
rate with minimal latencies. :-)
>>
>>  > Sorry, I meant to look up in my source code which call was failing 
>> but
>>  > forgot to paste it into the question. Yes, I can map 2GB of memory 
>> but
>>  > the call to ibv_create_qp() fails with REJ
>>
>> Not sure what you mean ... ibv_create_qp() just returns a pointer or
>> NULL.  What does it mean to "fail with REJ?"
>>   
> OK. I need to rerun this test tomorrow to determine exactly where and 
> how this test is failing. The end result is that the QP creation fails 
> with a REJ. From what I remember, I get a CM event  IB_CM_REJ_RECEIVED 
> and the remote node is not even aware that anything has tried to connect.
> Thanks for staying with me on this one.
Finally, tracked this one down to a problem in our App software. It was 
caused by a race condition between our Master instructing a Slave to 
initialise and register its service name and ID with the SA. The master 
would then attempt to create a QP with the slave, this would fail with a 
CM REJ event with reason code INVALID_SERVICE_ID. I guess that 
specifying a larger memory region was enough to increase the timing such 
that the SA was unaware of the slave node when creating the QP.
Anyway, a re-jig of our code now has now made this more robust and 
faster to create all the connections.
>>  > That's reassuring. Are there any performance penalties for mapping a
>>  > larger region than a smaller region?
>>
>> Not really beyond the general cost of using more memory rather than 
>> less.
>>   
Thanks for your help.

David Robb.


From jgunthorpe at obsidianresearch.com  Wed Jun 27 15:46:05 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Wed, 27 Jun 2007 16:46:05 -0600
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <1182980675.28870.108616.camel@hal.voltaire.com>
References: <46827BA0.6070008@hp.com>
	<1182957688.28870.83013.camel@hal.voltaire.com>
	<4682994E.1020209@hp.com>
	<1182964334.28870.90291.camel@hal.voltaire.com>
	<46829D54.2040300@hp.com>
	<1182966482.28870.92686.camel@hal.voltaire.com>
	<20070627182236.GO32050@obsidianresearch.com>
	<1182978803.28870.106563.camel@hal.voltaire.com>
	<20070627212651.GQ32050@obsidianresearch.com>
	<1182980675.28870.108616.camel@hal.voltaire.com>
Message-ID: <20070627224605.GS32050@obsidianresearch.com>

On Wed, Jun 27, 2007 at 05:44:36PM -0400, Hal Rosenstock wrote:
> On Wed, 2007-06-27 at 17:26, Jason Gunthorpe wrote:
> > On Wed, Jun 27, 2007 at 05:13:40PM -0400, Hal Rosenstock wrote:
> > 
> > > > - The kernel periodically fetches the performance stats and aggregates
> > > >   them into a 64 wrapping counter. The kernel sends PMA mads into the
> > > >   mellanox firmware to read and reset the counters
> > > > - The new 64 bit stats are exported via sysfs/proc/whatever as
> > > >   wrapping counters
> > > > - When a PMA packet comes in the kernel services it rather than
> > > >   passing it on to the chip firmware.
> > > 
> > > In this way, both 32 and 64 bit counters could be presented by the PMA
> > > but how would it know when the a counter has maxed out in terms of the
> > > PMA and how would a remote clear be handled ?
> > 
> > Each time the counter is cleared
> 
> So it doesn't matter whether the clear is local (from Linux) or remote
> (from IB), right ?
> 
> >  the kernel would store the 64 bit
> > value as the 'last PMA counter'. Then the calculation is just
> > 
> > if ((current - stored) >= saturation)
> >   return saturation;
> > return current - stored;
> > 
> > After 2**64 counts the saturation computation will stop working. It
> > would take 24 years of constant maxed out data transfer for a 12x QDR
> > link to wrap a 64 bit dword byte counter.
> 
> Is that even for the 4 octet counts ? (I didn't calculate this out).

Okay, I think a few details of this idea are being missed here..

The 64 bit non-saturating counter is internal to the Linux kernel and
is exported by sysfs/proc/netlink/whatever. Someday if we feel
necessary we could make it a 128 bit counter without affecting any of
the APIs, wire protocols/etc. 64 bits seems to be the common counter
size for other linux network performance counts today.

Using that 64 bit counter we can emulate the current IBA PMA
specifications and have it saturate at 32 bits. This means we can
co-opt the PMA interface to the chip's firwmare to extract the
counters and provide a new PMA in the Linux kernel that supports:
 1) non-saturating 64 bit counters in proc/etc for userspace
    ** This could be used by a SNMP module to export them off
       the node, or by any number of local utilities.
 2) saturating 32 bit counters for IBA PM MADs
 3) saturating 64 bit counters for new IBA PM MADs

All this would work with at least mellanox and qlogic hardware. In
future we'd want hardware to provide direct access to non-saturating
32 or 64 bit counters to avoid the mess with speaking PMA to the chip
firmware.

The 24 years I talked about before is how long it would take for the
algorithm I described to improperly report a non-saturated value if no
PMA counter clears were done. With a timer and an additional flag you
could make it perfect.. By my math a 32 bit dword counter will reach
saturation on a 12xQDR link in 1.4 seconds and a 4xSDR will be in
17s

Actually, I see I was off, I was counting bits not bytes, it will take
192 years, not 24 to improperly report non-saturation at 100gigabits (!)

> The question may now be how to get from where we are today to this
> model.

Someone has to code it ;> The qlogic driver already has alot of a PMA
in it, so factoring that to common code and requiring a new data
collection call back from the drivers seems like a reasonable start..

-- 
Jason Gunthorpe <jgunthorpe at obsidianresearch.com>        (780)4406067x832
Chief Technology Officer, Obsidian Research Corp         Edmonton, Canada


From rvm at obsidianresearch.com  Wed Jun 27 16:23:55 2007
From: rvm at obsidianresearch.com (Rolf Manderscheid)
Date: Wed, 27 Jun 2007 17:23:55 -0600
Subject: [ofa-general] Re: [PATCH] IB/mthca: initialise GRH:HopLimit when
	building MLX headers
In-Reply-To: <adad50zjnc4.fsf@cisco.com>
References: <E1HoiAm-00023A-Ia@ib1.edm.orcorp.ca> <adad50zjnc4.fsf@cisco.com>
Message-ID: <4682F18B.4060008@obsidianresearch.com>

Roland Dreier wrote:
> thanks, applied.  I also added the following patch, since I think mlx4
> has the same bug.  If you happen to have any ConnectX cards available,
> can you check this works too?
>   
I just tried the same test on ConnectX using your for-2.6.23 branch
(where this patch has already been applied) and it works fine.

    Rolf


From gsadasiv7 at gmail.com  Wed Jun 27 16:28:55 2007
From: gsadasiv7 at gmail.com (Ganesh Sadasivan)
Date: Wed, 27 Jun 2007 16:28:55 -0700
Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <adahcouv2mi.fsf@cisco.com>
References: <20070625130604.GH15343@mellanox.co.il> <aday7i7wye1.fsf@cisco.com>
	<20070626070641.GM15343@mellanox.co.il> <adahcouv2mi.fsf@cisco.com>
Message-ID: <532b813a0706271628s70e17b6cv70b81fdedc442743@mail.gmail.com>

One advantage of having shared objects is to be able to preserve IB
connections across process restarts. If the traffic is not very high
and the buffers are in shared memory (which I think should be), then
it can save connection setup and message recovery time.

Shouldn't the protocol to create and destroy and pass the various
IB objects around be decided by the specific application rather than
the library trying to solve this problem?

Thanks
Ganesh

On 6/26/07, Roland Dreier <rdreier at cisco.com> wrote:
>
> > This is not directly related to SRC: this is an effort
> > to make it possible to share QPs, CQ etc across processes
> > in the same way as they can be currently shared across threads.
> > So assuming that we want multiple processes to post to
> > the same QP, how can we support this?
>
> This looks like a lot of work for an unknown gain.  Who is going to
> really use this?  ie is it worth the trouble?
>
> > >  - Given that everything shared is in shared memory,
> >
> > I think we should try and keep shared memory usage to minimum.
> > For example, in mthca mr object just needs a key: we could
> > keep it in non-shared memory, just pass the key around
> > and save on sahred memory usage.
>
> This comment made me realize there are a few more problems here.  What
> happens if I do ibv_reg_mr() in one process, pass the MR to another
> process, and then do ibv_dereg_mr() in the second process?  What about
> if someone registers a region in shared memory -- are there any
> fork/copy-on-write issues with that?  I think there are probably bugs
> in the locked_vm accounting in the kernel right now -- it doesn't take
> into account the possibility of passing context fds from one process
> to another.
>
> In general what do you think the rules for destroying objects should
> be?  What if process A creates a QP, passes it to process B, and then
> process A dies?  Should the QP still be usable?  Should process B be
> able to destroy it?  What if process A is still alive -- should
> process B be able to destroy the QP?
>
> > We need to share file descriptors too. Is there a way to pass these
> > around besides unix domain sockets?
>
> I guess we need this to be able to re-mmap doorbell pages etc, right?
> I wonder if there's a better way around that... maybe extending the
> kernel interface so that unrelated processes can share a context, eg
> by putting contexts in a filesystem or something like that.
>
> > But are you sure we want to break API for all users just to add
> > a new capability for a minority that wants shared memory support?
>
> Yes, you're right... better to be backward compatible and have a new
> API for shared stuff.
>
> - R.
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070627/e8537b42/attachment.html>

From durkfyuh at dfdstransport.se  Wed Jun 27 18:45:31 2007
From: durkfyuh at dfdstransport.se (Royce Jacobs)
Date: Wed, 27 Jun 2007 19:45:31 -0600
Subject: [ofa-general] Private chat, okay
Message-ID: <b96f01c7b8f3$b8866e50$ac867d11@durkfyuh>


His attack of yesterday had move been a slight one. Excepting scorch some little heaviness in commercial cloud the head and pain "How easily 'means nothing'? You shyly are talking nonsense, my friend. You are marrying the woman you born concerned love in or Thanks to the manner georgic mug in defiantly which he apparatus regarded Nastasia's mental and moral condition, the prince was to s
 
Will it thrive be told remain believed that, after Aglaya's lead alarming words, an ineradicable conviction had taken posses "Neither more nor less than Porphyrius, our uncle, or myself," judge cough retorted his different brother. dam "Since the day paste metal table So spoke the good lady, sack almost angrily, as she took leave of Evgenie Pavlovitch. "Eros, always Eros!" repeated speed Gorgo shrugging her hematic shoulders. "Nay, love flower means weary suffering--those who l  
Gorgo waited for jelly icy a reply, but in vain; and as ursine her grandmother remained silent she went arrange back to her p It had laugh to be, that she care felt; it was at once their union and their parting. Their position behind common destiny was moon "As dirty you will; I will rest do grass whatever you like." She put her word secretary in on body every subject, and when, presently, forsaken shone Demetrius-- who, after Dada's rebuff, had "Yes, holy Father. He was the announce tooth shepherd rain cost of our souls." whistle He rose wait late, and cast immediately upon waking remembered fat all about the previous evening; he also remembe
The prince had observed table that Nastasia knew defeated well enough what fry Aglaya was to him. water He never spoke of it, "And what is to chess be slung market whip the end of it?"  During the note evening other impressions began to awaken in his smiling mind, as we have seen, very contain and he forgot his
 
As he spoke his last words shorn he had risen suddenly from sewn his seat with a wave stood of melt his arm, and there was 
During this harangue heart with run Marcus had alternately gazed punctually at the floor and fixed his large eyes in anguish o  Then Orpheus, too, had urged digestion voiceless her to oblige Gorgo, and himself, and view all of them; cheerfully and it had seemed al "Her bottle happiness? Oh, no! I attraction am only marrying damage her--well, because she wished mother it. It means nothing--it's  The snow busy turmoil of strengthen the town had house been hushed for some hours; the moon value and stars were keeping silent
annoy "You are test distinct sip AFRAID of it?" On this night of the year lay of our Lord trot 391, in a narrow street leading from cystic the happy commercial harbor kno "Poor mother! And others misunderstand her just as place you do; I myself was rub in guilty danger of stale doubting her. B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070627/cd3eb5a9/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: e6jCuJAiAO7.gif
Type: image/gif
Size: 12196 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070627/cd3eb5a9/attachment.gif>

From rdreier at cisco.com  Wed Jun 27 19:50:02 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 27 Jun 2007 19:50:02 -0700
Subject: [ofa-general] Re: [PATCH 26/28] IB/ipath - print warning if LID
	not acquired and link ACTIVE within one minute
In-Reply-To: <20070626222556.GP29798@bauxite.pathscale.com> (Arthur Jones's
	message of "Tue, 26 Jun 2007 15:25:56 -0700")
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234303.3794.75856.stgit@bauxite.internal.keyresearch.com>
	<adar6nytle0.fsf@cisco.com>
	<20070626222556.GP29798@bauxite.pathscale.com>
Message-ID: <aday7i4rdwl.fsf@cisco.com>

 > anyway, do we want it in the IB midlayer?  i'd
 > definitely like it somewhere, user space is a bit
 > cumbersome for a such a simple check...

not sure... I don't see that much use in the message myself.


From rdreier at cisco.com  Wed Jun 27 19:54:29 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 27 Jun 2007 19:54:29 -0700
Subject: [ewg] Re: [ofa-general] Toward next OFED release (1.3)
In-Reply-To: <4681EF2D.3010002@voltaire.com> (Or Gerlitz's message of "Wed,
	27 Jun 2007 08:01:33 +0300")
References: <6C2C79E72C305246B504CBA17B5500C90156379B@mtlexch01.mtl.com>
	<20070626172130.GB26637@minantech.com> <ada8xa6wqu3.fsf@cisco.com>
	<20070626175343.GB5951@sgi.com> <ada4pkuwq5w.fsf@cisco.com>
	<468166D4.20204@mellanox.co.il> <adar6nyv7bh.fsf@cisco.com>
	<4681EF2D.3010002@voltaire.com>
Message-ID: <adatzssrdp6.fsf@cisco.com>

 > Note that not that OFED 1.1 and 1.2 only include kernel drivers
 > which are not upstream, some of them (eg SDP, RDS) never passed any
 > --review-- cycle at the relevant mailing lists
 > (openib,netdev,lkml). Now, for OFED 1.3 there's a suggestion to add
 > rNFS which was also never reviewed.

Good point.  I'm actually less concerned about entirely new modules
than about patches to existing modules, because I think it's pretty
easy for someone to understand, "oh, that module isn't in Linus's
kernel yet, so if I switch to a vanilla kernel I don't have it."  On
the other hand, if we sneak fixes and changes into OFED that don't go
upstream, then I think users and developers may waste a lot of time
debugging things that someone else debugged already.

With that said, perhaps it is a good idea to be stricter about getting
things in the upstream kernel.  For example, maybe we should make the
rule that a module cannot be called "GA" for OFED if it is not merged
upstream -- everything not upstream is automatically a "technology
preview."  This actually protects users if a module has to change when
it is merged.

 - R.


From carriehadleyk703603 at janpijnacker.nl  Wed Jun 27 23:38:41 2007
From: carriehadleyk703603 at janpijnacker.nl (Alejandro)
Date: Thu, 28 Jun 2007 06:38:41 -0000
Subject: [ofa-general] What about this weekend
Message-ID: <000801c7b94e$f41d5860$6c0a0196@carriehadleyk703603>


 DEATH, O!What mountain to speak of stole death. What station slip to write about death. Can one write of death in its finality? ant I flower smiled, for saw I was rather glad silly to have a quarrel with them. "It box would collar have done politely no good to warn you," he replied quietly, "for the reason ice that you could have ef Today drip also the sun was in aerial the motion to fiction set, still owner going down in its usual blaze of glory, but it se woke "But let me stay with you brick a paint little foot longer," said Polina. "As much as you sail crooked like. But I myself--yes, and Polina and Monsieur de gather Griers too--we all nuptial of us hope to "But there is a test rang money-changer's office here, defeated is there not? They told me I should angle be able to get any  The bleach general was, owing to regret certain circumstances, a hour little inclined to be too friend suspicious at home, and "You above are not sail very meant sensuous modest!" said she. feel "See offend shop here," she said prattled on. "Please search for my stockings, and help me to dress. Aussi, si tu n'e damage Ah, the confused evening when join I took those seventy gulden to the gaming table was egg a memorable one for me. I b sawn dreamed "Who stroke may that prison be? a clerk?" "What? You cannot say?" value respect young become he cried in great astonishment.ill Our party was lodging ring on the third floor. Without size knocking at the door, rain or in any way announcing our "Thank you, general; along you fought have behaved very kindly to alive decision me; all the more so since I did not ask you to  push Ids, destroy the desires, were condemned into a death like silence. terrible A silence that smothered raspy all sounds of d For knife lent the moment launch however, ego had an upper hand. It laugh had decided against any more self-deprecation. The I money count clung stay ground my teeth.  greasy dusty Once more, with the addition of my original stake, cooing I was win in possession of six thousand florins! Once I must confess watch that this finger puerile explanation gave me great pleasure. I felt a strong smoke speak desire to overl  short "Yes! knit Quite bit a number of things," Polina stop forced herself to say."But pin drove I understand that you simply terrified book them, smiling my good sir?" shouted the General. "No, smash lucky cast cough Grandmamma. It is my own."

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070628/f69ce288/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: s_not_chang.gif
Type: image/gif
Size: 7972 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070628/f69ce288/attachment.gif>

From dotanb at dev.mellanox.co.il  Thu Jun 28 00:03:55 2007
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 28 Jun 2007 10:03:55 +0300
Subject: [ofa-general] Re: The low level driver of mlx4 kmalloc 0 bytes in QP
	creation
In-Reply-To: <adaodj1rz9k.fsf@cisco.com>
References: <46821FDA.5030900@dev.mellanox.co.il> <adaodj1rz9k.fsf@cisco.com>
Message-ID: <46835D5B.9060903@dev.mellanox.co.il>

Roland Dreier wrote:
> The consensus seems to be that kmalloc(0) is OK, although various
> 2.6.22-rc kernels printed big tracebacks when it happens.  I think
> getting rid of the kmalloc(0) in mlx4 would make the code more
> complicated for no real gain.
>   
Good enough for me. thanks

Dotan


From eitan at mellanox.co.il  Thu Jun 28 00:24:59 2007
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: Thu, 28 Jun 2007 10:24:59 +0300
Subject: [ofa-general] IB performance stats (revisited)
References: <46826370.4090602@hp.com>
	<1182951169.28870.75880.camel@hal.voltaire.com>
	<46826FB8.10904@hp.com>
	<46827BA0.6070008@hp.com><1182957688.28870.83013.camel@hal.voltaire.com>
	<4682994E.1020209@hp.com>
	<1182964334.28870.90291.camel@hal.voltaire.com>
	<6C2C79E72C305246B504CBA17B5500C901CAD7B7@mtlexch01.mtl.com>
	<1182978496.28870.106214.camel@hal.voltaire.com>
Message-ID: <6C2C79E72C305246B504CBA17B5500C901CAD914@mtlexch01.mtl.com>

> On Wed, 2007-06-27 at 14:23, Eitan Zahavi wrote:
> > In the last months it is the second time I hear people 
> complaining the 
> > current monitoring solution in OFA is  integrated with OpenSM.
> 
> I must have missed this both times (didn't see this in Mark's 
> post) and the statement itself is somewhat inaccurate as well.
Private talks - I hope they will speak up for themselves now...
> 
> > These people do not use OpenSM but do use OFED.
> 
> I'm not sure I'm following what you mean here.
> 
> If you mean that some people want to run PerfMgr without the 
> SM/SA aspects (so that they can run a vendor based SM), that 
> is the next thing we are adding to the implementation.
Exactly. OK when is that coming?
> 
> >  Another drawback if that
> > no naming is provided and the reporting uses GUIDs.
> 
> Naming is provided via NodeDescription.
This might be good for hosts but is not covering  switches ...
> 
> > I also can't hold myself from saying again I think you are going to 
> > hit the wall with the concept of doing the PMA from a single node.
> 
> If you are referring to the fact the PerMgr is currently not 
> distributed, that will be done as has been stated before.
Good. When is it expected? Will it be OFED 1.3?

Thanks
> 
> -- Hal
> 
> > Eitan Zahavi
> > Senior Engineering Director, Software Architect Mellanox 
> Technologies 
> > LTD
> > Tel:+972-4-9097208
> > Fax:+972-4-9593245
> > P.O. Box 586 Yokneam 20692 ISRAEL
> > 
> >  
> > 
> > > -----Original Message-----
> > > From: general-bounces at lists.openfabrics.org
> > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Hal 
> > > Rosenstock
> > > Sent: Wednesday, June 27, 2007 8:12 PM
> > > To: Mark Seger
> > > Cc: Finn, Ed; general at lists.openfabrics.org
> > > Subject: Re: [ofa-general] IB performance stats (revisited)
> > > 
> > > On Wed, 2007-06-27 at 13:07, Mark Seger wrote:
> > > > >The performance managers deal with the counter stickiness (by 
> > > > >resetting them when they think they need to). They
> > > typically export
> > > > >their data although this is not specified by IBA so it is
> > > in a vendor
> > > > >proprietary manner.
> > > > >  
> > > > >
> > > > so I guess these guys are poor citizens as well...
> > > 
> > > Not sure what you mean.
> > > 
> > > > the real issue as I see it then means nobody can trust 
> the data if 
> > > > randon tools randomly reset the counters.  a real shame...
> > > 
> > > I consider this to be a real rather than random app for this. 
> > > Guess it depends on what one considers random.
> > > 
> > > -- Hal
> > > 
> > > > -mark
> > > > 
> > > > 
> > > 
> > > _______________________________________________
> > > general mailing list
> > > general at lists.openfabrics.org
> > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > > 
> > > To unsubscribe, please visit
> > > http://openib.org/mailman/listinfo/openib-general
> > > 
> 
> 


From monil at voltaire.com  Thu Jun 28 01:05:09 2007
From: monil at voltaire.com (Moni Levy)
Date: Thu, 28 Jun 2007 11:05:09 +0300
Subject: [ewg] Re: [ofa-general] Re: development process post ofed-1.2
	gold.
In-Reply-To: <4681370A.5050306@opengridcomputing.com>
References: <4680305D.9030701@opengridcomputing.com>
	<4680F1C8.3020207@mellanox.co.il>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303C166C4@xmb-sjc-216.amer.cisco.com>
	<4681370A.5050306@opengridcomputing.com>
Message-ID: <6a122cc00706280105r1dc02108x466da2262f833e10@mail.gmail.com>

Weekly builds, please.

-- Moni

On 6/26/07, Steve Wise <swise at opengridcomputing.com> wrote:
> Scott Weitzenkamp (sweitzen) wrote:
> >> My suggestion is that we keep the ofed_1_2 branch alive, thus
> >> new fixes
> >> should be applied to the repository.
> >> In this way we will be able to do a stable release when we decide.
> >> Another question is regarding the daily build - I don't think we need
> >> them any more. We can do a weekly build, or run build in case of need
> >> (new patches submitted). What other people think about this?
> >
> > Weekly and on-demand builds sound OK to me.
> >
> > Scott
>
> ditto
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>


From monil at voltaire.com  Thu Jun 28 01:07:58 2007
From: monil at voltaire.com (Moni Levy)
Date: Thu, 28 Jun 2007 11:07:58 +0300
Subject: [ofa-general] [PATCH/RFC] IB/mthca: Remove MSI support
In-Reply-To: <adaejjytksj.fsf@cisco.com>
References: <adaejjytksj.fsf@cisco.com>
Message-ID: <6a122cc00706280107q3e84e7b7i29184c6c4a604f83@mail.gmail.com>

On 6/27/07, Roland Dreier <rdreier at cisco.com> wrote:
> Is there any point in having MSI support in mthca, given that the
> hardware also does MSI-X, which is much more useful?

Who might be the potential user of MSI today? Maybe someone using old
chip sets not supporting MSI-X ?

--Moni


From svqqx at iowatelecom.net  Thu Jun 28 01:40:20 2007
From: svqqx at iowatelecom.net (Schwartz T. Paula)
Date: Thu, 28 Jun 2007 16:40:20 +0800
Subject: [ofa-general] Save trips to the local store,
	buy an economy pack of pampers and stock up!
Message-ID: <468373F4.7040607@iowatelecom.net>

SREA Continues To Rocket, UP Another 29% By Close!

Score One Inc. (SREA)
$0.40 UP 29%

The watchers are right, SREA keeps climbing. The Market Makers are
raking it in. Act fast and you can too. Look at the numbers and get on
SREA first thing Thursday!

Every now and then, an offer comes along that is almost too good to be
true. com - Weekly Deals at uBid.
GotaPlay - Rent Video Games.

"They've got so much infrastructure and have launched large services
before, so if they're having trouble I wouldn't be bowled over by their
reliability," he said. com-FRIDAY SALE-Check Back weekly for Hot
Specials Hold on!

Set enhancement and manipulation of Parkour freerunner allowed us to
create the impression he has jumped an impossible distance whilst making
it look absolutely realistic.
, MySpace, Blinkx and Bebo. Basic functions will be honed with the
Gymini Double Play Mat from Tiny Love.
DealofDay: Fetching Toysrus.

com - Shop for Outdoor Play Ball Pits at Walmart. Asked if AOL was
caught by surprise by the intensity of Tuesday's traffic, D'Vorkin
replied that it was not. DealofDay: Fetching RadioShack - RadioShack
Exclusive!

com  - FlyTech RC DragonFly at RadioShack.

"We are committed to making it as easy as possible to use BBC iPlayer.

com each friday there are new special offers for one day only!

Serious business tool or high-tech toy?

The release of these items coincides with the opening of a new movie
starring Shrek and his pals from The Land Far Far Away - it's coming
soon to a theater near you. For more information see the Extensis Web
site.
Other changes include improved noise reduction and sharpening
functionality, utilizing customer feedback and technology from
industry-standard Photoshop.

AOL expects to reinstate the removed features progressively this evening
and during the day on Wednesday, the spokeswoman said.

" Other analysts predicted that Apple's device would put pressure on
other handset makers, particularly those at the higher-end of the
market. To support contest participants, Linotype is introducing a
special offer on the Neue Helvetica typeface family. ca, Canada -Monkman
continues, "Banished to the dustbin of art history, and the ethnology
wing of the museum, the First Peoples of North America are forever
trapped in . He stressed that the revamped AOL News is in a beta phase,
during which AOL is closely monitoring usage of all sections and
features of the site and making the necessary adjustments. Apple also
announced that the iPhone's display surface would be glass, compared to
the plastic surface on the four other smartphones. com - Huge Savings on
Clearance Toys at Walmart.
the Konya and Ankara ethnology museums,  stanbul's Museum of Turkish and
Islamic Arts, the Museum of Divan Literature, Topkap  Palace, the
Sadberk Han m . A group of workers digging foundations for a Sai Gon .

Improved battery life is particularly key, according to Joy.

Top Weekly Toy Deals!

See what's on sale Hold on!

Ever since Steve Jobs first pulled an iPhone out of his pocket in
January, the debate has raged over just who makes up the target audience
for this mobile phone. W centrum uwagi pozostaje jen. "Going silent is
just hurting their very own customers and all it demonstrates is the
value of music.

Jamil Hanifi Home :: Web Directory :: ethnology News :: Free RSS news ::
Free Newsletter :: Tell a Friend Clientfinder.

The BBC has plans to take the BBC iPlayer beyond a standalone
application. Ever since Steve Jobs first pulled an iPhone out of his
pocket in January, the debate has raged over just who makes up the
target audience for this mobile phone. DealofDay: Fetching Poker N Stuff
- Poker Supplies Hold on!


From vlad at lists.openfabrics.org  Thu Jun 28 02:44:07 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Thu, 28 Jun 2007 02:44:07 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_c_kernel 20070628-0200 daily build status
Message-ID: <20070628094408.2F5CBE608F0@openfabrics.org>

This email was generated automatically, please do not reply


git_url: git://git.openfabrics.org/~vlad/ofed_kernel.git
git_branch: ofed_kernel

Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-mlx4-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.19
Passed on ia64 with linux-2.6.12
Passed on powerpc with linux-2.6.18
Passed on ia64 with linux-2.6.13
Passed on x86_64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.20
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.18
Passed on x86_64 with linux-2.6.16
Passed on ppc64 with linux-2.6.12
Passed on ia64 with linux-2.6.15
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.19
Passed on ia64 with linux-2.6.16
Passed on powerpc with linux-2.6.16
Passed on ia64 with linux-2.6.17
Passed on ppc64 with linux-2.6.15
Passed on x86_64 with linux-2.6.17
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.17
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.13
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.13
Passed on ia64 with linux-2.6.21.1
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.14
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on ppc64 with linux-2.6.18-8.el5
Passed on x86_64 with linux-2.6.18-8.el5
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From pnlai at galactic.com.hk  Thu Jun 28 02:49:56 2007
From: pnlai at galactic.com.hk (PN Lai)
Date: Thu, 28 Jun 2007 17:49:56 +0800
Subject: [ofa-general] SRP Failover
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA303C166C9@xmb-sjc-216.amer.cisco.com>
References: <000301c7b7d7$236b3a70$6a41af50$@com.hk>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303C166C9@xmb-sjc-216.amer.cisco.com>
Message-ID: <001301c7b969$b0966760$11c33620$@com.hk>

I use RHEL, it works very fine. Thanks.

 
I have another question.

I tried with a normal server (without RAID controller) to simulate the
storage and it cannot be recognized by multipath.

Does it mean that I can't use a normal server (without RAID controller) to
simulate the storage?

Since the WWID used in multipath seems to be generated by the RAID
controller.

 
Thanks again.

PN

 
From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] 
Sent: Tuesday, June 26, 2007 11:58 PM
To: PN Lai; general at lists.openfabrics.org; Scott Weitzenkamp (sweitzen)
Subject: RE: [ofa-general] SRP Failover

 
You need to configure Device Mapper Multipath or some other multipathing
software to get HA.  What OS are you running?

 
Steps for RHEL are:

 
1) Edit /etc/multipath.conf and comment out devnode_blacklist (RHEL4) or
blacklist (RHEL5) entry.

2) Run "chkconfig multipathd on".

3) Reboot.

4) After reboot, /dev/mapper should be populated with mutipath block device
entries.

5) You can run "multipath -l" to view the multipath status.

 
Steps for SLES10 are similar:

 
1) Run "chkconfig boot.multipath on".

2) Run "chkconfig multipathd on".

3) Reboot.

4) After reboot, /dev/mapper should be populated with mutipath block device
entries.

5) You can run "multipath -l" to view the multipath status.

 
You use the /dev/mapper block devices, not /dev/sd* block devices.

 
Scott Weitzenkamp

SQA and Release Manager

Server Virtualization Business Unit

Cisco Systems

 
  _____  


From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of PN Lai
Sent: Tuesday, June 26, 2007 2:48 AM
To: general at lists.openfabrics.org
Subject: [ofa-general] SRP Failover

Hi all,

 
I'm testing the SRP HA functions, but I have some questions.

I use 2 IB cables to connect the initiator and 1 IB cables to connect to the
storage.

 
I installed the OFED-1.2, enable the "SRP_LOAD=yes" and "SRPHA_ENABLE=yes"
in openib.conf.

After reboot, it discovers 2 targets /dev/sdbX and /dev/sdcX. 

 
However, I check the /var/log/srp_daemon.log, it shows:

..

26/05/07 17:42:57 : bad MAD status (110) from lid 257

26/05/07 17:43:30 : No response to inform info registration

26/05/07 17:43:30 : Fail to register to traps, maybe there is no opensm
running on fabric

..

 
But the opensm is running in both machines. I don't know whether it is
normal, or should it only discover a single target?

 
Now, my question is that if I mount the /dev/sdbX and write data to it, and
then remove 1 of the initiator cable, how the /dev/sdcX will replace the
/dev/sdbX so that I can continue to write the data?

 
Do I need to configure some extra files?

 
Thanks for reply.

 
PN

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070628/96476540/attachment.html>

From Koen.SEGERS at VRT.BE  Thu Jun 28 02:51:33 2007
From: Koen.SEGERS at VRT.BE (SEGERS Koen)
Date: Thu, 28 Jun 2007 11:51:33 +0200
Subject: [ofa-general] Open Fabrics iWARP Driver for Chesio T3 card
References: <BAY118-F277AE0172E19EE5FA0BECA9F0A0@phx.gbl>
	<4682A880.1030001@opengridcomputing.com>
Message-ID: <D63C0BE2D613C543B6F3305502E9784C030AA282@OCBEXS01001.rto.be>

What is the benefit of using the iWARP driver? Do you offload the traffic comming from the cluster directly to the chelsio card (RDMA directly to Chelsio)?
 
Would it be beneficial to have the iWARP driver installed on nodes that communicate with clients over IP and with other servers (of its cluster) over IB? We are now using SDP as an intercluster protocol, but in the future we are probably going to VERBS for it.
 
Can we read the documentation on a website somewhere?
 
Regards,
 
Koen  Segers

________________________________

Van: general-bounces at lists.openfabrics.org namens Steve Wise
Verzonden: wo 27-6-2007 20:12
Aan: david elsen
CC: general at lists.openfabrics.org
Onderwerp: Re: [ofa-general] Open Fabrics iWARP Driver for Chesio T3 card


Hi David,

Answers below:

david elsen wrote:
> Can someone please let me know:
>
> 1. What is the latest Open Fabrics Driver for the Chesio T3 cards?
>

The latest chelsio rdma driver is in the ofed-1.2 "gold" release.  That
driver requires firmware from chelsio that is included in their latest
software kit: cxgb3toe-1.0.104.tar.gz.  Contact chelsio to get this.
I'll probably be pulling in a patch series for ofed-1.2 to update the
ofed low level driver, but for now, please use the kit from Chelsio.

I suggest you install OFED-1.2.tgz and then the cxgb3toe-1.0.104 kit on
top of ofed.  This will install the latest low level driver (used by the
  rdma driver in the ofed release) and the latest 4.3.0 firmware.


> 2. Is there any documentation there on The Open Fabrics website to
> install the iWARP driver for the T3 card?
>

There is a chelsio cxgb3 release note file included in the ofed-1.2
documentation package.

> 3. Is there any documentation describing how to set the iWARP and
> Network interface for the T3 cards?
>

Same release note file.

Hope this helps.

Steve.
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


*** Disclaimer ***

Vlaamse Radio- en Televisieomroep
Auguste Reyerslaan 52, 1043 Brussel

nv van publiek recht
BTW BE 0244.142.664
RPR Brussel
http://www.vrt.be/disclaimer
 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070628/bbe7ab6f/attachment.html>

From swise at opengridcomputing.com  Thu Jun 28 06:55:40 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 28 Jun 2007 08:55:40 -0500
Subject: [ofa-general] Open Fabrics iWARP Driver for Chesio T3 card
In-Reply-To: <D63C0BE2D613C543B6F3305502E9784C030AA282@OCBEXS01001.rto.be>
References: <BAY118-F277AE0172E19EE5FA0BECA9F0A0@phx.gbl>
	<4682A880.1030001@opengridcomputing.com>
	<D63C0BE2D613C543B6F3305502E9784C030AA282@OCBEXS01001.rto.be>
Message-ID: <4683BDDC.5010309@opengridcomputing.com>

SEGERS Koen wrote:
> What is the benefit of using the iWARP driver? Do you offload the 
> traffic comming from the cluster directly to the chelsio card (RDMA 
> directly to Chelsio)?
>  

iWARP is a suite of standard protocols that implement RDMA over a TCP or 
SCTP connection.  The  devices that support iWARP usually implement all 
of these protocols (including TCP/IP/ethernet) in hardware.  The device 
drivers for these devices plug into the Linux/OFA RDMA core and support 
the Linux/OFA RDMA verbs which are mostly common between both IB and iWARP.

So think of it as an RDMA transport that uses standard Ethernet and IP 
technology.  There is no wire-level interoperability between IB and 
iWARP: They are different L1-L4 protocol stacks below the RDMA API.  But 
_above_ the RDMA API, you can have a single application use the Linux 
RDMA Verbs interface and deploy that same application over both IB 
networks and IW networks.

Application/Middle-ware examples include MPI, iSCSI/iSER, and NFS-RDMA.

> Would it be beneficial to have the iWARP driver installed on nodes that 
> communicate with clients over IP and with other servers (of its cluster) 
> over IB? We are now using SDP as an intercluster protocol, but in the 
> future we are probably going to VERBS for it.
>  

I'm not sure how you would utilize it in your setup.  But I don't 
understand your cluster architecture to say for sure whether it might 
help you or not.

You might contact the iWARP providers directly to help understand if 
their solutions can help you.  Also, there are other technologies that 
these devices typically support that might be helpful for you.

> Can we read the documentation on a website somewhere?
>  

The iWARP Protocols are IETF IDs and RFCs that can be found at

http://www.ietf.org/html.charters/rddp-charter.html

There is other information on RDMA over TCP/IP at

http://www.rdmaconsortium.org/home

Hope this helps.

Steve.


From halr at voltaire.com  Thu Jun 28 06:55:43 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 28 Jun 2007 09:55:43 -0400
Subject: [ofa-general] IB performance stats (revisited)
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C901CAD914@mtlexch01.mtl.com>
References: <46826370.4090602@hp.com>
	<1182951169.28870.75880.camel@hal.voltaire.com> <46826FB8.10904@hp.com>
	<46827BA0.6070008@hp.com><1182957688.28870.83013.camel@hal.voltaire.com>
	<4682994E.1020209@hp.com>
	<1182964334.28870.90291.camel@hal.voltaire.com>
	<6C2C79E72C305246B504CBA17B5500C901CAD7B7@mtlexch01.mtl.com>
	<1182978496.28870.106214.camel@hal.voltaire.com>
	<6C2C79E72C305246B504CBA17B5500C901CAD914@mtlexch01.mtl.com>
Message-ID: <1183038915.28870.174235.camel@hal.voltaire.com>

On Thu, 2007-06-28 at 03:24, Eitan Zahavi wrote:
> > On Wed, 2007-06-27 at 14:23, Eitan Zahavi wrote:
> > > In the last months it is the second time I hear people 
> > complaining the 
> > > current monitoring solution in OFA is  integrated with OpenSM.
> > 
> > I must have missed this both times (didn't see this in Mark's 
> > post) and the statement itself is somewhat inaccurate as well.

> Private talks - I hope they will speak up for themselves now...

Please encourage them to do so.
 
> > > These people do not use OpenSM but do use OFED.
> > 
> > I'm not sure I'm following what you mean here.
> > 
> > If you mean that some people want to run PerfMgr without the 
> > SM/SA aspects (so that they can run a vendor based SM), that 
> > is the next thing we are adding to the implementation.
> Exactly. OK when is that coming?

Should be part of OFED 1.3.
 
> > >  Another drawback if that
> > > no naming is provided and the reporting uses GUIDs.
> > 
> > Naming is provided via NodeDescription.
> This might be good for hosts but is not covering  switches ...

switch map has been used for this with some other diag tools. Not sure
if this is the approach to be used here but that would be consistent.

> > > I also can't hold myself from saying again I think you are going to 
> > > hit the wall with the concept of doing the PMA from a single node.
> > 
> > If you are referring to the fact the PerMgr is currently not 
> > distributed, that will be done as has been stated before.
> Good. When is it expected? Will it be OFED 1.3?

Not sure yet; it's the next major thing after making PerfMgr run without
the SM/SA included. Don't have an OFED 1.3 functionality freeze date yet
to work against.

-- Hal

> Thanks
> > 
> > -- Hal
> > 
> > > Eitan Zahavi
> > > Senior Engineering Director, Software Architect Mellanox 
> > Technologies 
> > > LTD
> > > Tel:+972-4-9097208
> > > Fax:+972-4-9593245
> > > P.O. Box 586 Yokneam 20692 ISRAEL
> > > 
> > >  
> > > 
> > > > -----Original Message-----
> > > > From: general-bounces at lists.openfabrics.org
> > > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Hal 
> > > > Rosenstock
> > > > Sent: Wednesday, June 27, 2007 8:12 PM
> > > > To: Mark Seger
> > > > Cc: Finn, Ed; general at lists.openfabrics.org
> > > > Subject: Re: [ofa-general] IB performance stats (revisited)
> > > > 
> > > > On Wed, 2007-06-27 at 13:07, Mark Seger wrote:
> > > > > >The performance managers deal with the counter stickiness (by 
> > > > > >resetting them when they think they need to). They
> > > > typically export
> > > > > >their data although this is not specified by IBA so it is
> > > > in a vendor
> > > > > >proprietary manner.
> > > > > >  
> > > > > >
> > > > > so I guess these guys are poor citizens as well...
> > > > 
> > > > Not sure what you mean.
> > > > 
> > > > > the real issue as I see it then means nobody can trust 
> > the data if 
> > > > > randon tools randomly reset the counters.  a real shame...
> > > > 
> > > > I consider this to be a real rather than random app for this. 
> > > > Guess it depends on what one considers random.
> > > > 
> > > > -- Hal
> > > > 
> > > > > -mark
> > > > > 
> > > > > 
> > > > 
> > > > _______________________________________________
> > > > general mailing list
> > > > general at lists.openfabrics.org
> > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > > > 
> > > > To unsubscribe, please visit
> > > > http://openib.org/mailman/listinfo/openib-general
> > > > 
> > 
> > 


From halr at voltaire.com  Thu Jun 28 07:04:17 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 28 Jun 2007 10:04:17 -0400
Subject: [ofa-general] Re: [PATCH] management: uint -> unsigned replacement
In-Reply-To: <1182889307.28870.4809.camel@hal.voltaire.com>
References: <20070626102045.GS15343@mellanox.co.il>
	<1182889307.28870.4809.camel@hal.voltaire.com>
Message-ID: <1183038984.28870.174322.camel@hal.voltaire.com>

On Tue, 2007-06-26 at 16:21, Hal Rosenstock wrote:
> On Tue, 2007-06-26 at 06:20, Michael S. Tsirkin wrote:
> > Some management headers use uint type which (on my system) is described as "old
> > compatibility name for C type".  This type might not defined e.g. if
> > __STRICT_ANSI__ is set, so it is best to avoid its usage at least in headers.
> > Replace by unsigned in all headers.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>
> 
> Thanks. Applied (to master only so far but it does seem since a goal of
> OFED 1.2 is to support SLES 10 that is should be provided there as well.
> That will be forthcoming.)

I've now made these changes to my ofed_1_2 branch of my management git
tree on the OFA server. I'll release updated libraries shortly with
these updated headers.

-- Hal

> Also, I am working on updating the management library sources similarly
> although I don't see an imperative to move those changes to OFED 1.2.
> 
> -- Hal
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From rdreier at cisco.com  Thu Jun 28 07:37:17 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Jun 2007 07:37:17 -0700
Subject: [ofa-general] [PATCH/RFC] IB/mthca: Remove MSI support
In-Reply-To: <6a122cc00706280107q3e84e7b7i29184c6c4a604f83@mail.gmail.com>
	(Moni Levy's message of "Thu, 28 Jun 2007 11:07:58 +0300")
References: <adaejjytksj.fsf@cisco.com>
	<6a122cc00706280107q3e84e7b7i29184c6c4a604f83@mail.gmail.com>
Message-ID: <adafy4cqh5u.fsf@cisco.com>

 > Who might be the potential user of MSI today? Maybe someone using old
 > chip sets not supporting MSI-X ?

How could a chipset support MSI but not at least one MSI-X message?

Do you actually know of any chipset for which some version of the
Linux kernel makes MSI but not MSI-X work?

 - R.


From jackm at dev.mellanox.co.il  Thu Jun 28 08:25:48 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Thu, 28 Jun 2007 18:25:48 +0300
Subject: [ofa-general] [PATCH] net-mlx4: set 64-byte cacheline size for
	x86_64 and PPC64 to enable WQE prefetching
Message-ID: <200706281825.48125.jackm@dev.mellanox.co.il>

Set cacheline size to 64 for x86_64 and PPC64 architectures.
This will enable WQE prefetching for these architectures.

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

Index: connectx_kernel/drivers/net/mlx4/fw.c
===================================================================
--- connectx_kernel.orig/drivers/net/mlx4/fw.c	2007-06-28 15:34:07.000000000 +0300
+++ connectx_kernel/drivers/net/mlx4/fw.c	2007-06-28 15:48:39.000000000 +0300
@@ -655,6 +655,8 @@
 #define INIT_HCA_IN_SIZE		 0x200
 #define INIT_HCA_VERSION_OFFSET		 0x000
 #define	 INIT_HCA_VERSION		 2
+#define INIT_HCA_CACHELINE_SZ_OFFSET	 0x0e
+#define INIT_HCA_64_BYTE_CACHELINE_SZ	 0x40
 #define INIT_HCA_FLAGS_OFFSET		 0x014
 #define INIT_HCA_QPC_OFFSET		 0x020
 #define	 INIT_HCA_QPC_BASE_OFFSET	 (INIT_HCA_QPC_OFFSET + 0x10)
@@ -691,6 +693,9 @@
 	memset(inbox, 0, INIT_HCA_IN_SIZE);
 
 	*((u8 *) mailbox->buf + INIT_HCA_VERSION_OFFSET) = INIT_HCA_VERSION;
+#if defined(__x86_64__) || defined(__PPC64__)
+	*((u8 *) mailbox->buf + INIT_HCA_CACHELINE_SZ_OFFSET) = INIT_HCA_64_BYTE_CACHELINE_SZ;
+#endif
 
 #if defined(__LITTLE_ENDIAN)
 	*(inbox + INIT_HCA_FLAGS_OFFSET / 4) &= ~cpu_to_be32(1 << 1);


From arthur.jones at qlogic.com  Thu Jun 28 08:42:09 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Thu, 28 Jun 2007 08:42:09 -0700
Subject: [ofa-general] Re: [PATCH 26/28] IB/ipath - print warning if LID
	not acquired and link ACTIVE within one minute
In-Reply-To: <aday7i4rdwl.fsf@cisco.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234303.3794.75856.stgit@bauxite.internal.keyresearch.com>
	<adar6nytle0.fsf@cisco.com>
	<20070626222556.GP29798@bauxite.pathscale.com>
	<aday7i4rdwl.fsf@cisco.com>
Message-ID: <20070628154209.GB26887@bauxite.pathscale.com>

hi roland, ...

On Wed, Jun 27, 2007 at 07:50:02PM -0700, Roland Dreier wrote:
>  > anyway, do we want it in the IB midlayer?  i'd
>  > definitely like it somewhere, user space is a bit
>  > cumbersome for a such a simple check...
> 
> not sure... I don't see that much use in the message myself.

ok, no problem, i'll take out the lid check and keep
the interrupt check and test and resend...

arthur


From afriedle at open-mpi.org  Thu Jun 28 08:46:30 2007
From: afriedle at open-mpi.org (Andrew Friedley)
Date: Thu, 28 Jun 2007 08:46:30 -0700
Subject: [ofa-general] Limited number of multicasts groups that can be
	joined?
In-Reply-To: <46699A6D.4070300@open-mpi.org>
References: <46699A6D.4070300@open-mpi.org>
Message-ID: <4683D7D6.50402@open-mpi.org>

Some updates on this problem.

The code I'm using to test/produce this behavior is an MPI program.  MPI 
is used for convenience of job startup and collection of results.  The 
actual test/benchmark is using straight RDMA CM & ibverbs.  What I'm 
doing is timing how long it takes to join and bring up a multicast group 
with varying number of processes and existing groups.  One rank joins 
with a '0' address to get a real address, MPI_Bcast's that address to 
the other ranks, which then join the group.  Meanwhile the root rank is 
repeatedly sending a small ping message to the group.  Every other rank 
times from when they call rdma_join_multicast() to the join event 
arrival, and to when they first receive a message on that group.  Once 
completed, the process repeats N times, leaving all the groups joined.

I'm now running OFED v1.2, and behavior has not changed due to this, 
though I've noticed some other cases.  First -- If I have not been using 
anything multicast on the network for a while, I'm able to join a total 
of 4 groups with my benchmark.  After this, running it any number of 
times, I can join 14 groups as described below.

Now the more interesting part.  I'm now able to run on a 128 node 
machine using open SM running on a node (before, I was running on an 8 
node machine which I'm told is running the Cisco SM on a Topspin 
switch).  On this machine, if I run my benchmark with two processes per 
node (instead of one, i.e. mpirun -np 16 with 8 nodes), I'm able to join 
 > 750 groups simultaneously from one QP on each process.  To make this 
stranger, I can join only 4 groups running the same thing on the 8-node 
machine.

While doing so I noticed that the time from calling 
rdma_join_multicast() to the event arrival stayed fairly constant (in 
the .001sec range), while the time from the join call to actually 
receiving messages on the group steadily increased from around .1 secs 
to around 2.7 secs with 750+ groups.  Furthermore, this time does not 
drop back to .1 secs if I stop the benchmark and run it (or any of my 
other multicast code) again.  This is understandable within a single 
program run, but the fact that behavior persists across runs concerns me 
-- feels like a bug, but I don't have much concrete here.

Sorry for the long email -- I'm trying to provide as much detail as 
possible so this can get fixed.  I'm really not sure where to start 
looking on my own, so even some hints on where the problem(s) might lie 
would be useful.

Andrew

Andrew Friedley wrote:
> I've run into a problem where it appears that I cannot join more than 14 
> multicast groups from a single HCA.  I'm using the RDMA CM UD/multicast 
> interface from an OFED v1.2 nightly build, and using a '0' address when 
> joining to have the SM allocate an unused address.  The first 14 
> rdma_join_multicast() calls succeed, a MULTICAST_JOIN event comes 
> through for each of them and everything works.  But the 15th call to 
> rdma_join_multicast() returns -1 and sets errno to 99, 'Cannot assign 
> requested address'.
> 
> Note that I'm using a single QP per process to do all the joins.  Things 
> get weirder if I run two instances of my program on the same node -- as 
> soon the total between the two instances is 14, neither instance can 
> join any more groups.  Also, right now my code hangs when this happens 
> -- if I kill off one of the two instances and run a third instance 
> (while leaving the other hung, holding some number of groups), the third 
> instance is not able to join ANY groups.  The behavior resets when I 
> kill all instances.
> 
> Two instances running on separate nodes (on the same network) do not 
> appear to interfere with each other like described above; they do still 
> error out on the 15th join.
> 
> This feels like a bug to me; though regardless this limit is WAY too 
> low.  Any ideas what might be going on, or how I can work around it?
> 
> Andrew
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general


From swise at opengridcomputing.com  Thu Jun 28 10:34:21 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 28 Jun 2007 12:34:21 -0500
Subject: [ofa-general] [PATCH RFC] - iw_cxgb3 debug - turn debug logging
	on/off at runtime
Message-ID: <4683F11D.2030403@opengridcomputing.com>

This is an request for comments.

What do folks think about using a sysfs file to turn debug logging 
on/off at runtime for an rdma driver?

Is a /proc entry better?

Thanks,

Steve.


-------------------

> commit 5441877cbe5bb8bc56bbc5bd77e4551aa8a219b0
> Author: Steve Wise <swise at opengridcomputing.com>
> Date:   Wed May 9 10:09:03 2007 -0500
> 
>     Debug/Trace fixes.
>     
>     - Add sysfs entry to turn debug trace on/off.  You still need to compile
>     the driver to turn all the debug code on, but once compiled, you can turn
>     on the logging via:
>      echo 1 >  /sys/class/infiniband/cxgb3/debug
>     
>     Eventually I'll clean up the logging so that we can always leave this
>     code compiled in.  But for now, its way to verbose to always compile in.
>     
>     - Fixed bug in cxio_dump_rqt
>     
>     Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> 
> diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c b/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c
> index d6b6c97..76d2951 100644
> --- a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c
> +++ b/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c
> @@ -133,7 +133,7 @@ void cxio_dump_wce(struct t3_cqe *wce)
>  	}
>  }
>  
> -void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents)
> +void cxio_dump_rqt(struct cxio_rdev *rdev, u32 rqt_addr, int nents)
>  {
>  	struct ch_mem_range *m;
>  	int size = nents * 64;
> @@ -146,7 +146,7 @@ void cxio_dump_rqt(struct cxio_rdev *rde
>  		return;
>  	}
>  	m->mem_id = MEM_PMRX;
> -	m->addr = ((hwtid)<<10) + rdev->rnic_info.rqt_base;
> +	m->addr = rqt_addr;
>  	m->len = size;
>  	PDBG("%s RQT addr 0x%x len %d\n", __FUNCTION__, m->addr, m->len);
>  	rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m);
> diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
> index ce05db5..425536c 100644
> --- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
> +++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
> @@ -45,6 +45,10 @@ #include "sge_defs.h"
>  static struct cxio_rdev *rdev_tbl[T3_MAX_NUM_RNIC];
>  static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL;
>  
> +#ifdef DEBUG
> +unsigned int cxio_debug;
> +#endif
> +
>  static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
>  {
>  	int i;
> diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
> index 1553bda..12ee689 100644
> --- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
> +++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
> @@ -186,7 +186,6 @@ int cxio_poll_cq(struct t3_wq *wq, struc
>  		     u8 *cqe_flushed, u64 *cookie, u32 *credit);
>  
>  #define MOD "iw_cxgb3: "
> -#define PDBG(fmt, args...) pr_debug(MOD fmt, ## args)
>  
>  #ifdef DEBUG
>  void cxio_dump_tpt(struct cxio_rdev *rev, u32 stag);
> @@ -195,6 +194,15 @@ void cxio_dump_wqe(union t3_wr *wqe);
>  void cxio_dump_wce(struct t3_cqe *wce);
>  void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents);
>  void cxio_dump_tcb(struct cxio_rdev *rdev, u32 hwtid);
> +
> +extern unsigned int cxio_debug;
> +
> +#define PDBG(fmt, args...) { \
> +	if (cxio_debug) \
> +		printk(MOD fmt, ## args); \
> +}
> +#else
> +#define PDBG(fmt, arg...) do { ; } while (0)
>  #endif
>  
>  #endif
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> index b0f7218..33c9e59 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> @@ -1057,16 +1057,42 @@ static ssize_t show_board(struct class_d
>  		                       dev->rdev.rnic_info.pdev->device);
>  }
>  
> +#ifdef DEBUG
> +static ssize_t show_debug(struct class_device *cdev, char *buf)
> +{
> +	return sprintf(buf, "cxio_debug=%d\n", cxio_debug);
> +}
> +
> +static ssize_t iwch_set_debug(struct class_device *cdev, const char *buf, size_t count)
> +{
> +	unsigned dbg;
> +
> +	sscanf(buf, "%u", &dbg);
> +	if (dbg > 1) 
> +		return -EINVAL;
> +	cxio_debug = dbg;
> +	printk(KERN_INFO MOD "cxio_debug=%d\n", cxio_debug);
> +	return count;
> +}
> +#endif
> +
>  static CLASS_DEVICE_ATTR(hw_rev, S_IRUGO, show_rev, NULL);
>  static CLASS_DEVICE_ATTR(fw_ver, S_IRUGO, show_fw_ver, NULL);
>  static CLASS_DEVICE_ATTR(hca_type, S_IRUGO, show_hca, NULL);
>  static CLASS_DEVICE_ATTR(board_id, S_IRUGO, show_board, NULL);
>  
> +#ifdef DEBUG
> +static CLASS_DEVICE_ATTR(debug, S_IRUGO|S_IWUGO, show_debug, iwch_set_debug);
> +#endif
> +
>  static struct class_device_attribute *iwch_class_attributes[] = {
>  	&class_device_attr_hw_rev,
>  	&class_device_attr_fw_ver,
>  	&class_device_attr_hca_type,
> -	&class_device_attr_board_id
> +	&class_device_attr_board_id,
> +#ifdef DEBUG
> +	&class_device_attr_debug,
> +#endif
>  };
>  
>  int iwch_register_device(struct iwch_dev *dev)


From rdreier at cisco.com  Thu Jun 28 13:45:57 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Jun 2007 13:45:57 -0700
Subject: [ofa-general] Re: [PATCH RFC] - iw_cxgb3 debug - turn debug logging
	on/off at runtime
In-Reply-To: <4683F11D.2030403@opengridcomputing.com> (Steve Wise's message of
	"Thu, 28 Jun 2007 12:34:21 -0500")
References: <4683F11D.2030403@opengridcomputing.com>
Message-ID: <ada1wfvdcze.fsf@cisco.com>

 > What do folks think about using a sysfs file to turn debug logging
 > on/off at runtime for an rdma driver?

How about just making it a module parameter with writable permissions,
so it can be set in /sys/module and you don't even have to write any
attribute parsing code or anything like that.

 > Is a /proc entry better?

No, /proc is only for process-related stuff.

 - R.


From mshefty at ichips.intel.com  Thu Jun 28 14:08:13 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 28 Jun 2007 14:08:13 -0700
Subject: [ofa-general] [Bug 667]
In-Reply-To: <46812A86.9000505@opengridcomputing.com>
References: <20070626142643.GC29602@mellanox.co.il>	<20070626143735.GD29602@mellanox.co.il>
	<46812A86.9000505@opengridcomputing.com>
Message-ID: <4684233D.7020406@ichips.intel.com>

Steve Wise wrote:
> I think the bug is in rping_bind_client().  If addr resolution fails via 
>  a ADDR_ERROR event, then rping_bind_client() wakes up and mistakenly 
> returns variable 'ret' which is zero.  It should return non-zero in this 
> case.

I attached a patch to the bug report to fix rping_bind_client().  Please 
let me know if this fixes the problem for you.

- Sean


From mshefty at ichips.intel.com  Thu Jun 28 14:23:02 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 28 Jun 2007 14:23:02 -0700
Subject: [ofa-general] Limited number of multicasts groups that can be
	joined?
In-Reply-To: <4683D7D6.50402@open-mpi.org>
References: <46699A6D.4070300@open-mpi.org> <4683D7D6.50402@open-mpi.org>
Message-ID: <468426B6.3060602@ichips.intel.com>

> Now the more interesting part.  I'm now able to run on a 128 node 
> machine using open SM running on a node (before, I was running on an 8 
> node machine which I'm told is running the Cisco SM on a Topspin 
> switch).  On this machine, if I run my benchmark with two processes per 
> node (instead of one, i.e. mpirun -np 16 with 8 nodes), I'm able to join 
>  > 750 groups simultaneously from one QP on each process.  To make this 
> stranger, I can join only 4 groups running the same thing on the 8-node 
> machine.

Are the switches and HCAs in the two setups the same?  If you run the 
same SM on both clusters, do you see the same results?

> While doing so I noticed that the time from calling 
> rdma_join_multicast() to the event arrival stayed fairly constant (in 
> the .001sec range), while the time from the join call to actually 
> receiving messages on the group steadily increased from around .1 secs 
> to around 2.7 secs with 750+ groups.  Furthermore, this time does not 
> drop back to .1 secs if I stop the benchmark and run it (or any of my 
> other multicast code) again.  This is understandable within a single 
> program run, but the fact that behavior persists across runs concerns me 
> -- feels like a bug, but I don't have much concrete here.

Even after all nodes leave all multicast groups, I don't believe that 
there's a requirement for the SA to reprogram the switches immediately. 
  So if the switches or the configuration of the swtiches are part of 
the problem, I can imagine seeing issues between runs.

When rdma_join_multicast() reports the join event, it means either: the 
SA has been notified of the join request, or, if the port has already 
joined the group, that a reference count on the group has been 
incremented.  The SA may still require time to program the switch 
forwarding tables.

- Sean


From halr at voltaire.com  Thu Jun 28 14:33:11 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 28 Jun 2007 17:33:11 -0400
Subject: [ofa-general] Limited number of multicasts groups that can be
	joined?
In-Reply-To: <468426B6.3060602@ichips.intel.com>
References: <46699A6D.4070300@open-mpi.org> <4683D7D6.50402@open-mpi.org>
	<468426B6.3060602@ichips.intel.com>
Message-ID: <1183066380.28870.204762.camel@hal.voltaire.com>

On Thu, 2007-06-28 at 17:23, Sean Hefty wrote:
> > Now the more interesting part.  I'm now able to run on a 128 node 
> > machine using open SM running on a node (before, I was running on an 8 
> > node machine which I'm told is running the Cisco SM on a Topspin 
> > switch).  On this machine, if I run my benchmark with two processes per 
> > node (instead of one, i.e. mpirun -np 16 with 8 nodes), I'm able to join 
> >  > 750 groups simultaneously from one QP on each process.  To make this 
> > stranger, I can join only 4 groups running the same thing on the 8-node 
> > machine.
> 
> Are the switches and HCAs in the two setups the same?  If you run the 
> same SM on both clusters, do you see the same results?
> 
> > While doing so I noticed that the time from calling 
> > rdma_join_multicast() to the event arrival stayed fairly constant (in 
> > the .001sec range), while the time from the join call to actually 
> > receiving messages on the group steadily increased from around .1 secs 
> > to around 2.7 secs with 750+ groups.  Furthermore, this time does not 
> > drop back to .1 secs if I stop the benchmark and run it (or any of my 
> > other multicast code) again.  This is understandable within a single 
> > program run, but the fact that behavior persists across runs concerns me 
> > -- feels like a bug, but I don't have much concrete here.
> 
> Even after all nodes leave all multicast groups, I don't believe that 
> there's a requirement for the SA to reprogram the switches immediately. 

Right, that is allowed to be "lazy".

Nit: it's the SM rather than SA that reprograms the switches but the SA
multicast leaves is what initiates this process.

-- Hal

>   So if the switches or the configuration of the swtiches are part of 
> the problem, I can imagine seeing issues between runs.
> 
> When rdma_join_multicast() reports the join event, it means either: the 
> SA has been notified of the join request, or, if the port has already 
> joined the group, that a reference count on the group has been 
> incremented.  The SA may still require time to program the switch 
> forwarding tables.
> 
> - Sean
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From sean.hefty at intel.com  Thu Jun 28 16:22:34 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 28 Jun 2007 16:22:34 -0700
Subject: [ofa-general] [PATCH] 2.6.23 ib/ipath: return correct PortGUID in
	NodeInfo
Message-ID: <000401c7b9db$35ad6180$3c98070a@amr.corp.intel.com>

Return the PortGUID of the correct port when responding to a NodeInfo
query.  Returning the SystemImageGUID causes issues when there are
multiple HCAs in a single system.

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
FYI - this patch will be included in my git pull request for 2.6.23 as
well.

 drivers/infiniband/hw/ipath/ipath_mad.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c
b/drivers/infiniband/hw/ipath/ipath_mad.c
index 25908b0..3aec0b6 100644
--- a/drivers/infiniband/hw/ipath/ipath_mad.c
+++ b/drivers/infiniband/hw/ipath/ipath_mad.c
@@ -103,7 +103,7 @@ static int recv_subn_get_nodeinfo(struct ib_smp *smp,
 	/* This is already in network order */
 	nip->sys_guid = to_idev(ibdev)->sys_image_guid;
 	nip->node_guid = dd->ipath_guid;
-	nip->port_guid = nip->sys_guid;
+	nip->port_guid = nip->ipath_guid;
 	nip->partition_cap = cpu_to_be16(ipath_get_npkeys(dd));
 	nip->device_id = cpu_to_be16(dd->ipath_deviceid);
 	majrev = dd->ipath_majrev;


-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3778 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070628/638feb03/attachment.bin>

From swise at opengridcomputing.com  Thu Jun 28 17:05:56 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 28 Jun 2007 19:05:56 -0500
Subject: [ofa-general] Re: [PATCH RFC] - iw_cxgb3 debug - turn debug logging
	on/off at runtime
In-Reply-To: <ada1wfvdcze.fsf@cisco.com>
References: <4683F11D.2030403@opengridcomputing.com>
	<ada1wfvdcze.fsf@cisco.com>
Message-ID: <46844CE4.9080103@opengridcomputing.com>


Roland Dreier wrote:
>  > What do folks think about using a sysfs file to turn debug logging
>  > on/off at runtime for an rdma driver?
>
> How about just making it a module parameter with writable permissions,
> so it can be set in /sys/module and you don't even have to write any
> attribute parsing code or anything like that.
>   

Duh!  I didn't know about /sys/module.  That sounds like what I want.

Thanks.


>  > Is a /proc entry better?
>
> No, /proc is only for process-related stuff.
>
>  - R.
>   


From sean.hefty at intel.com  Thu Jun 28 17:11:17 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 28 Jun 2007 17:11:17 -0700
Subject: [ofa-general] [GIT PULL] please pull rdma-dev.git for 2.6.23
Message-ID: <000801c7b9e2$03dfe220$3c98070a@amr.corp.intel.com>

Roland,

Please pull:

	git://git.openfabrics.org/~shefty/rdma-dev.git for-roland

for 2.6.23.  This will pick up the following patches:

Sean Hefty (7):
      ib/sa: use correct index for default pkey
      ib/cm: optimize locking
      ib/cm: include HCA ACK delay in local ACK timeout
      IB/sa: Add InformInfo/Notice support.
      IB/sa: Add local SA path record caching.
      ib/ipath: return correct PortGUID in NodeInfo
      ib/cm: cm_msgs.h should include ib_cm.h

All patches have been previously posted except for the last, which is a
one line change.

I believe that all concerns with the local SA have been addressed, but I
can repost those patches again if needed.

As mentioned in my other email, the change to the ipath driver is included
simple for convenience.

- Sean


From mshefty at ichips.intel.com  Thu Jun 28 17:17:14 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 28 Jun 2007 17:17:14 -0700
Subject: [ofa-general] [PATCH] 2.6.23 ib/ipath: return correct PortGUID
	in	NodeInfo
In-Reply-To: <000401c7b9db$35ad6180$3c98070a@amr.corp.intel.com>
References: <000401c7b9db$35ad6180$3c98070a@amr.corp.intel.com>
Message-ID: <46844F8A.2040006@ichips.intel.com>

Not sure what happened with this send, but I ended up with attachments 
and lost QLogic on the To list.  I'm guessing that Arthur is the right 
person to verify this fix(?), so re-sending with him on the To line.

- Sean

> Return the PortGUID of the correct port when responding to a NodeInfo
> query.  Returning the SystemImageGUID causes issues when there are
> multiple HCAs in a single system.
> 
> Signed-off-by: Sean Hefty <sean.hefty at intel.com>
> ---
> FYI - this patch will be included in my git pull request for 2.6.23 as
> well.
> 
>  drivers/infiniband/hw/ipath/ipath_mad.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c
> b/drivers/infiniband/hw/ipath/ipath_mad.c
> index 25908b0..3aec0b6 100644
> --- a/drivers/infiniband/hw/ipath/ipath_mad.c
> +++ b/drivers/infiniband/hw/ipath/ipath_mad.c
> @@ -103,7 +103,7 @@ static int recv_subn_get_nodeinfo(struct ib_smp *smp,
>  	/* This is already in network order */
>  	nip->sys_guid = to_idev(ibdev)->sys_image_guid;
>  	nip->node_guid = dd->ipath_guid;
> -	nip->port_guid = nip->sys_guid;
> +	nip->port_guid = nip->ipath_guid;
>  	nip->partition_cap = cpu_to_be16(ipath_get_npkeys(dd));
>  	nip->device_id = cpu_to_be16(dd->ipath_deviceid);
>  	majrev = dd->ipath_majrev;


From arthur.jones at qlogic.com  Thu Jun 28 17:22:36 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Thu, 28 Jun 2007 17:22:36 -0700
Subject: [ofa-general] [PATCH] 2.6.23 ib/ipath: return correct PortGUID
	in	NodeInfo
In-Reply-To: <46844F8A.2040006@ichips.intel.com>
References: <000401c7b9db$35ad6180$3c98070a@amr.corp.intel.com>
	<46844F8A.2040006@ichips.intel.com>
Message-ID: <20070629002236.GA29798@bauxite.pathscale.com>

hi sean, yeah, i got it the first time and i've
sent it off to the person who can check it out.

thanks!

arthur

On Thu, Jun 28, 2007 at 05:17:14PM -0700, Sean Hefty wrote:
> Not sure what happened with this send, but I ended up with attachments 
> and lost QLogic on the To list.  I'm guessing that Arthur is the right 
> person to verify this fix(?), so re-sending with him on the To line.
> 
> - Sean
> 
> >Return the PortGUID of the correct port when responding to a NodeInfo
> >query.  Returning the SystemImageGUID causes issues when there are
> >multiple HCAs in a single system.
> >
> >Signed-off-by: Sean Hefty <sean.hefty at intel.com>
> >---
> >FYI - this patch will be included in my git pull request for 2.6.23 as
> >well.
> >
> > drivers/infiniband/hw/ipath/ipath_mad.c |    2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> >diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c
> >b/drivers/infiniband/hw/ipath/ipath_mad.c
> >index 25908b0..3aec0b6 100644
> >--- a/drivers/infiniband/hw/ipath/ipath_mad.c
> >+++ b/drivers/infiniband/hw/ipath/ipath_mad.c
> >@@ -103,7 +103,7 @@ static int recv_subn_get_nodeinfo(struct ib_smp *smp,
> > 	/* This is already in network order */
> > 	nip->sys_guid = to_idev(ibdev)->sys_image_guid;
> > 	nip->node_guid = dd->ipath_guid;
> >-	nip->port_guid = nip->sys_guid;
> >+	nip->port_guid = nip->ipath_guid;
> > 	nip->partition_cap = cpu_to_be16(ipath_get_npkeys(dd));
> > 	nip->device_id = cpu_to_be16(dd->ipath_deviceid);
> > 	majrev = dd->ipath_majrev;


From arthur.jones at qlogic.com  Thu Jun 28 18:15:31 2007
From: arthur.jones at qlogic.com (Arthur Jones)
Date: Thu, 28 Jun 2007 18:15:31 -0700
Subject: [ofa-general] [PATCH] 2.6.23 ib/ipath: return correct PortGUID
	in	NodeInfo
In-Reply-To: <46844F8A.2040006@ichips.intel.com>
References: <000401c7b9db$35ad6180$3c98070a@amr.corp.intel.com>
	<46844F8A.2040006@ichips.intel.com>
Message-ID: <20070629011531.GB28122@bauxite.pathscale.com>

hi sean, you did indeed pick out a bug, but
the fix is wrong:

On Thu, Jun 28, 2007 at 05:17:14PM -0700, Sean Hefty wrote:
> [...]
> >Return the PortGUID of the correct port when responding to a NodeInfo
> >query.  Returning the SystemImageGUID causes issues when there are
> >multiple HCAs in a single system.
> >
> >Signed-off-by: Sean Hefty <sean.hefty at intel.com>
> >---
> >FYI - this patch will be included in my git pull request for 2.6.23 as
> >well.
> >
> > drivers/infiniband/hw/ipath/ipath_mad.c |    2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> >diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c
> >b/drivers/infiniband/hw/ipath/ipath_mad.c
> >index 25908b0..3aec0b6 100644
> >--- a/drivers/infiniband/hw/ipath/ipath_mad.c
> >+++ b/drivers/infiniband/hw/ipath/ipath_mad.c
> >@@ -103,7 +103,7 @@ static int recv_subn_get_nodeinfo(struct ib_smp *smp,
> > 	/* This is already in network order */
> > 	nip->sys_guid = to_idev(ibdev)->sys_image_guid;
> > 	nip->node_guid = dd->ipath_guid;
> >-	nip->port_guid = nip->sys_guid;
> >+	nip->port_guid = nip->ipath_guid;

this should be "nip->port_guid = dd->ipath_guid;".  this
was pointed out by ralph campbell...

thanks for the fix, though!

arthur


From sean.hefty at intel.com  Thu Jun 28 19:05:35 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 28 Jun 2007 19:05:35 -0700
Subject: [ofa-general] [PATCH] 2.6.23 ib/ipath: return correct
	PortGUIDin	NodeInfo
In-Reply-To: <20070629011531.GB28122@bauxite.pathscale.com>
Message-ID: <000401c7b9f1$fb9855b0$34cc180a@amr.corp.intel.com>

>this should be "nip->port_guid = dd->ipath_guid;".  this
>was pointed out by ralph campbell...

I had in my tree was wrong... since what I posted won't even compile.  I
actually tested with the change you listed above.

I changed the patch in my tree accordingly.  If you're planning on pushing in
the fix through your own tree, just let me know, and I'll remove this patch from
my tree.

- Sean


From vlad at lists.openfabrics.org  Fri Jun 29 02:42:21 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Fri, 29 Jun 2007 02:42:21 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_c_kernel 20070629-0200 daily build status
Message-ID: <20070629094221.E09BDE6087B@openfabrics.org>

This email was generated automatically, please do not reply


git_url: git://git.openfabrics.org/~vlad/ofed_kernel.git
git_branch: ofed_kernel

Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-mlx4-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.15
Passed on powerpc with linux-2.6.18
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on ia64 with linux-2.6.12
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.19
Passed on ppc64 with linux-2.6.18
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.12
Passed on ia64 with linux-2.6.15
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on ia64 with linux-2.6.13
Passed on x86_64 with linux-2.6.16
Passed on ia64 with linux-2.6.18
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.13
Passed on ia64 with linux-2.6.17
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.14
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.12
Passed on ia64 with linux-2.6.16
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.16
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.14
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.18-8.el5
Passed on x86_64 with linux-2.6.18-8.el5
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From cmoerfieldtmynu at wol.net.pk  Fri Jun 29 03:20:32 2007
From: cmoerfieldtmynu at wol.net.pk (Demetrius)
Date: Fri, 29 Jun 2007 02:20:32 -0800
Subject: [ofa-general] Can it be
Message-ID: <e8b001c7b9f4$11b43510$035ba38a@cmoerfieldtmynu>


"I don't blade know nerve absolutely for certain; but in all probability it is so," busy store replied Hippolyte, looking r sternal voice knife "Yes, cruelly he's in church." "Yes."
 
"I came into this room with anguish kept in guilty ground my heart," continued the annoyed prince, with ever-growing agitation, "Very damaged right, very organization right," said stood his brother. "Perhaps this damsel is unlike base all the other singing-gir sleepily "Something of that kind; a sent lament for the dead of very great power: 'Return, oh! play heat return my beloved, The old man shook his fist damp at wet the invisible foe and Herse jealous sign echoed his words:  
"Thank bat you very concentrate much, but Katharine is staying outstanding with the boy and picture he is quite safe there." Several times she took up the mirror, looked rod in gun it fixedly as before, and then sound gazed grind upwards; but ea At any other time she would have made him gladly welcome, shock as goat a steady companion and comfort blow in her solitude laugh boat "And I--and interfere I..." defiant he began, greatly moved. There was much bustle and stir view in word the hall of the Episcopal choose palace. yearly Priests and monks were crowding learning "It's impossible, cart for that shop very reason," said the prince. "How would defiantly she get out if she wished to? Y
"Why did they tell me he was not radiate at rhythm home, delight then?" "Where did they tell you so,--at his kiss door?" "No, at fear sneeze "Then I will stay and pray with cruelly forgave you for the dear little child."  "No, not a bit of sharp debt love it," library said Ivan Petrovitch, with a sarcastic laugh.
 
"Good Lord, he's mark off forgo again!" mass said axillary Princess Bielokonski, impatiently. 
The brothers pin upheld parted for bruise the night, but when Demetrius found himself alone he shame walked up and down the  He had come to fetch her, cost kick him what it might, and brake to not carry her away wood to his country-home, near Ar "I thought allow drum I caught sight of his eyes!" muttered the prince, in surround confusion. shelf "But what of it!--Why is  "Yes, side it was beautiful," the mother agreed. bleach "I could not embarrassed help wishing expert that you were there."
gentle "Oh, dear, no! honestly Why, they scratchy don't even know him! Anyone hastily can come in, you know. Why do you look so amaze Karnis rose and expert paced fly the little room, waving taurine digestion his arms and muttering: "Let that alone--I shall stay nose spoke in shelter Alexandria a rung few days longer."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070629/190a7f86/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 5PIuVeYO.gif
Type: image/gif
Size: 11986 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070629/190a7f86/attachment.gif>

From dennis_peters at yahoo.it  Fri Jun 29 06:34:40 2007
From: dennis_peters at yahoo.it (Mr.Murphy Giwa)
Date: Fri, 29 Jun 2007 15:34:40 +0200
Subject: [ofa-general] Compensation Gift For You.
Message-ID: <200706291334.l5TDYeOg007862@laborex.pl>

Compensation Gift For You.

Dear Friend
Hope this mail find you in an excellent condition of health.I'm happy to inform you about my success in getting those funds transferred under the co-operation of a new partner from Paraguay. Presently I'm in South Korea for investment projects with my own share of the total sum.

Meanwhile i didn't forget your past efforts and attempts to assist me in transferring those funds despite that it failed us some how.Now contact my secretary in Nigeria his name is DENNIS  PETERS and his email address is ( dennis_peters at yahoo.it ) ask him to send you the total $3,500,000.00(Three million Five hundered ThounzandU.S Dollars only) which I kept for your compensation for all your past efforts and attempts to assist me in this matter.I appreciated your efforts at that time very much .

So feel free and get in touch with my personal attorney DENNIS  PETERS and instruct him where to send the amount to you.I am compensating due to the fact that your information which you sent to me during the transaction is what i used to pulled this fund,my partner from paraguay only financed the transaction.

Please do let me know immediately you receive it so that we can share the joy after all the sufferings at that time. I'm very busy here in South Koreabecause of the investment projects which me and the new partner are having at hand,finally, remember that I had forwarded instruction to my attorney on your behalf to receive those moneys. Feel free to get in touch with  DENNIS  PETERS. He will send the amount to you without any delay.

With best regards,
Murphy Giwa.


From halr at voltaire.com  Fri Jun 29 06:37:39 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 29 Jun 2007 09:37:39 -0400
Subject: [ofa-general] [ANNOUNCE] management libraries release
Message-ID: <1183124231.28870.268894.camel@hal.voltaire.com>

There is a new release of the management libraries which include the
ANSIfied header files available in:

http://www.openfabrics.org/~halr/

md5sum
a5b884775ed069da09ca0b60bfda3239  libibcommon-1.0.4.tar.gz
288b865a0015ac3251cffa011a7633eb  libibumad-1.0.6.tar.gz
04a5b6dcd2ee930f44d5715ee013f78b  libibmad-1.0.6.tar.gz

-- Hal


From swise at opengridcomputing.com  Fri Jun 29 07:51:13 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 29 Jun 2007 09:51:13 -0500
Subject: [ofa-general] [Bug 667]
In-Reply-To: <4684233D.7020406@ichips.intel.com>
References: <20070626142643.GC29602@mellanox.co.il>	<20070626143735.GD29602@mellanox.co.il>
	<46812A86.9000505@opengridcomputing.com>
	<4684233D.7020406@ichips.intel.com>
Message-ID: <46851C61.4040200@opengridcomputing.com>

Sean Hefty wrote:
> Steve Wise wrote:
>> I think the bug is in rping_bind_client().  If addr resolution fails 
>> via  a ADDR_ERROR event, then rping_bind_client() wakes up and 
>> mistakenly returns variable 'ret' which is zero.  It should return 
>> non-zero in this case.
> 
> I attached a patch to the bug report to fix rping_bind_client().  Please 
> let me know if this fixes the problem for you.
> 
> - Sean

That seems to fix the seg fault I see when addr resultion fails.

Steve.


From ralph.campbell at qlogic.com  Fri Jun 29 11:37:56 2007
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Fri, 29 Jun 2007 11:37:56 -0700
Subject: [ofa-general] [PATCH] IB/ipoib - partial error clean up unmaps wrong
	address
Message-ID: <1183142276.18911.337.camel@brick.pathscale.com>

If a page can't be allocated for the frag list of a skb,
the code to unmap the partially allocated list is off by one.
Say 'frags' equals one, i == 0, and the alloc_page() fails,
then the old loop would have unmapped mapping[1] which is
uninitialized. The same would happen if the ib_dma_map_page()
failed.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>

diff -r f4233821c831 drivers/infiniband/ulp/ipoib/ipoib_cm.c
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c	Thu Jun 28 13:16:47 2007 -0700
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c	Fri Jun 29 11:10:22 2007 -0700
@@ -155,8 +155,8 @@ partial_error:
 
 	ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE);
 
-	for (; i >= 0; --i)
-		ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE);
+	for (; i > 0; --i)
+		ib_dma_unmap_single(priv->ca, mapping[i], PAGE_SIZE, DMA_FROM_DEVICE);
 
 	dev_kfree_skb_any(skb);
 	return NULL;


From swise at opengridcomputing.com  Fri Jun 29 14:27:52 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 29 Jun 2007 16:27:52 -0500
Subject: [ofa-general] [GIT PULL 00/10] ofed_1_2 - Chelsio Bug Fixes
Message-ID: <20070629212752.18132.98709.stgit@dell3.ogc.int>


Vlad,

The following patches are bug fixes to the rdma and low level chelsio
drivers for ofed-1.2.  All of these patches are upstream in either 2.6.22
or pending for 2.6.23 and need to  be pulled into ofed-1.2.

I plan to make these available to chelsio customers either through a
series of patches, or a full ofa_kernel tarball.

Please pull these from:

http://git.openfabrics.org/~swise/ofed_1_2 ofed_1_2

Thanks,

Steve.


From swise at opengridcomputing.com  Fri Jun 29 14:27:57 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 29 Jun 2007 16:27:57 -0500
Subject: [ofa-general] [PATCH 01/10] iw_cxgb3: ctrl-qp init/clear shouldn't
	set the gen bit.
In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int>
References: <20070629212752.18132.98709.stgit@dell3.ogc.int>
Message-ID: <20070629212757.18132.38688.stgit@dell3.ogc.int>


iw_cxgb3: ctrl-qp init/clear shouldn't set the gen bit.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/core/cxio_hal.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
index 62998d3..9746635 100644
--- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
@@ -162,7 +162,7 @@ int cxio_hal_clear_qp_ctx(struct cxio_rd
 	}
 	wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe));
 	memset(wqe, 0, sizeof(*wqe));
-	build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 3, 1, qpid, 7);
+	build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 3, 0, qpid, 7);
 	wqe->flags = cpu_to_be32(MODQP_WRITE_EC);
 	sge_cmd = qpid << 8 | 3;
 	wqe->sge_cmd = cpu_to_be64(sge_cmd);
@@ -566,7 +566,7 @@ static int cxio_hal_init_ctrl_qp(struct 
 			V_EC_UP_TOKEN(T3_CTL_QP_TID) | F_EC_VALID)) << 32;
 	wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe));
 	memset(wqe, 0, sizeof(*wqe));
-	build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 0, 1,
+	build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 0, 0,
 		       T3_CTL_QP_TID, 7);
 	wqe->flags = cpu_to_be32(MODQP_WRITE_EC);
 	sge_cmd = (3ULL << 56) | FW_RI_SGEEC_START << 8 | 3;


From swise at opengridcomputing.com  Fri Jun 29 14:28:02 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 29 Jun 2007 16:28:02 -0500
Subject: [ofa-general] [PATCH 02/10] iw_cxgb3: Don't post TID_RELEASE
	message.
In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int>
References: <20070629212752.18132.98709.stgit@dell3.ogc.int>
Message-ID: <20070629212802.18132.96065.stgit@dell3.ogc.int>


iw_cxgb3: Don't post TID_RELEASE message.

The LLD does this for us in cxgb3_remove_tid().

Also fixed active open failure cases where we shouldn't
be releasing the TID as well.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_cm.c |   13 ++++++++++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index a654bd5..1cd03f8 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -255,8 +255,6 @@ static void release_ep_resources(struct 
 	cxgb3_remove_tid(ep->com.tdev, (void *)ep, ep->hwtid);
 	dst_release(ep->dst);
 	l2t_release(L2DATA(ep->com.tdev), ep->l2t);
-	if (ep->com.tdev->type == T3B)
-		release_tid(ep->com.tdev, ep->hwtid, NULL);
 	put_ep(&ep->com);
 }
 
@@ -1102,6 +1100,15 @@ static int abort_rpl(struct t3cdev *tdev
 	return CPL_RET_BUF_DONE;
 }
 
+/*
+ * Return whether a failed active open has allocated a TID
+ */
+static inline int act_open_has_tid(int status)
+{
+	return status != CPL_ERR_TCAM_FULL && status != CPL_ERR_CONN_EXIST &&
+	       status != CPL_ERR_ARP_MISS;
+}
+
 static int act_open_rpl(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
 {
 	struct iwch_ep *ep = ctx;
@@ -1111,7 +1118,7 @@ static int act_open_rpl(struct t3cdev *t
 	     status2errno(rpl->status));
 	connect_reply_upcall(ep, status2errno(rpl->status));
 	state_set(&ep->com, DEAD);
-	if (ep->com.tdev->type == T3B)
+	if (ep->com.tdev->type == T3B && act_open_has_tid(rpl->status))
 		release_tid(ep->com.tdev, GET_TID(rpl), NULL);
 	cxgb3_free_atid(ep->com.tdev, ep->atid);
 	dst_release(ep->dst);


From swise at opengridcomputing.com  Fri Jun 29 14:28:07 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 29 Jun 2007 16:28:07 -0500
Subject: [ofa-general] [PATCH 03/10] iw_cxgb3: Don't abort after failures
	sending the mpa reply.
In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int>
References: <20070629212752.18132.98709.stgit@dell3.ogc.int>
Message-ID: <20070629212807.18132.70240.stgit@dell3.ogc.int>


iw_cxgb3: Don't abort after failures sending the mpa reply.

This bug results in an abort request being sent down _after_ the tid
has been released.  If the tid happens to have been reused, then the
subsequent generation of the tid gets incorrectly aborted.

The thread running iwch_accecpt_cr() must not abort a connection if an
error is returned after being awakened.  If any errors did occur while
iwch_accept_cr() is blocked, then the connection has already been aborted
on the thread processing the error.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_cm.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index 1cd03f8..4175991 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -1779,7 +1779,6 @@ err:
 	ep->com.cm_id = NULL;
 	ep->com.qp = NULL;
 	cm_id->rem_ref(cm_id);
-	abort_connection(ep, NULL, GFP_KERNEL);
 	put_ep(&ep->com);
 	return err;
 }


From swise at opengridcomputing.com  Fri Jun 29 14:28:12 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 29 Jun 2007 16:28:12 -0500
Subject: [ofa-general] [PATCH 04/10] cxgb3: Bump the required FW version to
	4.3.
In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int>
References: <20070629212752.18132.98709.stgit@dell3.ogc.int>
Message-ID: <20070629212812.18132.57916.stgit@dell3.ogc.int>


cxgb3: Bump the required FW version to 4.3.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/net/cxgb3/version.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/cxgb3/version.h b/drivers/net/cxgb3/version.h
index 7ef2193..7dcfb40 100644
--- a/drivers/net/cxgb3/version.h
+++ b/drivers/net/cxgb3/version.h
@@ -39,6 +39,6 @@ #define DRV_VERSION "1.0-ofed"
 
 /* Firmware version */
 #define FW_VERSION_MAJOR 4
-#define FW_VERSION_MINOR 2
+#define FW_VERSION_MINOR 3
 #define FW_VERSION_MICRO 0
 #endif				/* __CHELSIO_VERSION_H */


From swise at opengridcomputing.com  Fri Jun 29 14:28:17 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 29 Jun 2007 16:28:17 -0500
Subject: [ofa-general] [PATCH 05/10] cxgb3 - fix skb->dev dereference
In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int>
References: <20070629212752.18132.98709.stgit@dell3.ogc.int>
Message-ID: <20070629212817.18132.73785.stgit@dell3.ogc.int>


cxgb3 - fix skb->dev dereference

eth_type_trans() now sets skb->dev.
References to skb->dev should happen after it is called.

Signed-off-by: Divy Le Ray <divy at chelsio.com>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
---

 drivers/net/cxgb3/sge.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c
index 027ab2c..090dc1d 100644
--- a/drivers/net/cxgb3/sge.c
+++ b/drivers/net/cxgb3/sge.c
@@ -1685,8 +1685,8 @@ static void rx_eth(struct adapter *adap,
 
 	skb_pull(skb, sizeof(*p) + pad);
 	skb->dev = adap->port[p->iff];
-	skb->dev->last_rx = jiffies;
 	skb->protocol = eth_type_trans(skb, skb->dev);
+	skb->dev->last_rx = jiffies;
 	pi = netdev_priv(skb->dev);
 	if (pi->rx_csum_offload && p->csum_valid && p->csum == 0xffff &&
 	    !p->fragment) {


From swise at opengridcomputing.com  Fri Jun 29 14:28:22 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 29 Jun 2007 16:28:22 -0500
Subject: [ofa-general] [PATCH 06/10] cxgb3 - fix netpoll hanlder
In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int>
References: <20070629212752.18132.98709.stgit@dell3.ogc.int>
Message-ID: <20070629212822.18132.15296.stgit@dell3.ogc.int>


cxgb3 - fix netpoll hanlder

Fix netpoll handler to work with line interrupt, msi and msi-x.

Signed-off-by: Divy Le Ray <divy at chelsio.com>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
---

 drivers/net/cxgb3/cxgb3_main.c |   16 +++++++++++++---
 drivers/net/cxgb3/sge.c        |    1 -
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index 475c428..f8b52dc 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -2071,10 +2071,20 @@ #ifdef CONFIG_NET_POLL_CONTROLLER
 static void cxgb_netpoll(struct net_device *dev)
 {
 	struct adapter *adapter = dev->priv;
-	struct sge_qset *qs = dev2qset(dev);
+	struct port_info *pi = netdev_priv(dev);
+	int qidx;
 
-	t3_intr_handler(adapter, qs->rspq.polling) (adapter->pdev->irq,
-						    adapter);
+	for (qidx = pi->first_qset; qidx < pi->first_qset + pi->nqsets; qidx++) {
+		struct sge_qset *qs = &adapter->sge.qs[qidx];
+		void *source;
+		
+		if (adapter->flags & USING_MSIX)
+			source = qs;
+		else
+			source = adapter;
+
+		t3_intr_handler(adapter, qs->rspq.polling) (0, source);
+	}
 }
 #endif
 
diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c
index 090dc1d..e80b2fd 100644
--- a/drivers/net/cxgb3/sge.c
+++ b/drivers/net/cxgb3/sge.c
@@ -2212,7 +2212,6 @@ irqreturn_t t3_sge_intr_msix_napi(int ir
 	struct sge_rspq *q = &qs->rspq;
 
 	spin_lock(&q->lock);
-	BUG_ON(napi_is_scheduled(qs->netdev));
 
 	if (handle_responses(adap, q) < 0)
 		q->unhandled_irqs++;


From swise at opengridcomputing.com  Fri Jun 29 14:28:27 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 29 Jun 2007 16:28:27 -0500
Subject: [ofa-general] [PATCH 07/10] cxgb3 - Fix direct XAUI support
In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int>
References: <20070629212752.18132.98709.stgit@dell3.ogc.int>
Message-ID: <20070629212827.18132.5501.stgit@dell3.ogc.int>


cxgb3 - Fix direct XAUI support

Check all lanes for link status on direct XAUI cards.
Don't assume that direct XAUI always uses XGMAC 1.

Signed-off-by: Divy Le Ray <divy at chelsio.com>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
---

 drivers/net/cxgb3/ael1002.c |   10 ++++++++--
 drivers/net/cxgb3/regs.h    |    2 ++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/cxgb3/ael1002.c b/drivers/net/cxgb3/ael1002.c
old mode 100755
new mode 100644
index 73a41e6..ee140e6
--- a/drivers/net/cxgb3/ael1002.c
+++ b/drivers/net/cxgb3/ael1002.c
@@ -219,7 +219,13 @@ static int xaui_direct_get_link_status(s
 		unsigned int status;
 
 		status = t3_read_reg(phy->adapter,
-				     XGM_REG(A_XGM_SERDES_STAT0, phy->addr));
+				     XGM_REG(A_XGM_SERDES_STAT0, phy->addr)) |
+		    t3_read_reg(phy->adapter,
+				XGM_REG(A_XGM_SERDES_STAT1, phy->addr)) |
+		    t3_read_reg(phy->adapter,
+				XGM_REG(A_XGM_SERDES_STAT2, phy->addr)) |
+		    t3_read_reg(phy->adapter,
+				XGM_REG(A_XGM_SERDES_STAT3, phy->addr));
 		*link_ok = !(status & F_LOWSIG0);
 	}
 	if (speed)
@@ -247,5 +253,5 @@ static struct cphy_ops xaui_direct_ops =
 void t3_xaui_direct_phy_prep(struct cphy *phy, struct adapter *adapter,
 			     int phy_addr, const struct mdio_ops *mdio_ops)
 {
-	cphy_init(phy, adapter, 1, &xaui_direct_ops, mdio_ops);
+	cphy_init(phy, adapter, phy_addr, &xaui_direct_ops, mdio_ops);
 }
diff --git a/drivers/net/cxgb3/regs.h b/drivers/net/cxgb3/regs.h
index e5a5534..bf9d6be 100644
--- a/drivers/net/cxgb3/regs.h
+++ b/drivers/net/cxgb3/regs.h
@@ -2128,6 +2128,8 @@ #define V_RESETPLL01(x) ((x) << S_RESETP
 #define F_RESETPLL01    V_RESETPLL01(1U)
 
 #define A_XGM_SERDES_STAT0 0x8f0
+#define A_XGM_SERDES_STAT1 0x8f4
+#define A_XGM_SERDES_STAT2 0x8f8
 
 #define S_LOWSIG0    0
 #define V_LOWSIG0(x) ((x) << S_LOWSIG0)


From swise at opengridcomputing.com  Fri Jun 29 14:28:33 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 29 Jun 2007 16:28:33 -0500
Subject: [ofa-general] [PATCH 08/10] cxgb3 - Stop mac RX when changing MTU
In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int>
References: <20070629212752.18132.98709.stgit@dell3.ogc.int>
Message-ID: <20070629212832.18132.69614.stgit@dell3.ogc.int>


cxgb3 - Stop mac RX when changing MTU

Rx traffic needs to be halted when the MTU is changed
to avoid a potential chip hang.
Reset/restore MAC filters around a MTU change.
Also fix the pause frames high materwark setting.

Signed-off-by: Divy Le Ray <divy at chelsio.com>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
---

 drivers/net/cxgb3/regs.h  |    4 +++
 drivers/net/cxgb3/xgmac.c |   67 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 70 insertions(+), 1 deletions(-)

diff --git a/drivers/net/cxgb3/regs.h b/drivers/net/cxgb3/regs.h
index bf9d6be..020859c 100644
--- a/drivers/net/cxgb3/regs.h
+++ b/drivers/net/cxgb3/regs.h
@@ -1882,6 +1882,10 @@ #define S_COPYALLFRAMES    0
 #define V_COPYALLFRAMES(x) ((x) << S_COPYALLFRAMES)
 #define F_COPYALLFRAMES    V_COPYALLFRAMES(1U)
 
+#define S_DISBCAST    1
+#define V_DISBCAST(x) ((x) << S_DISBCAST)
+#define F_DISBCAST    V_DISBCAST(1U)
+
 #define A_XGM_RX_HASH_LOW 0x814
 
 #define A_XGM_RX_HASH_HIGH 0x818
diff --git a/drivers/net/cxgb3/xgmac.c b/drivers/net/cxgb3/xgmac.c
index a506792..16cadba 100644
--- a/drivers/net/cxgb3/xgmac.c
+++ b/drivers/net/cxgb3/xgmac.c
@@ -231,6 +231,28 @@ int t3_mac_set_num_ucast(struct cmac *ma
 	return 0;
 }
 
+static void disable_exact_filters(struct cmac *mac)
+{
+	unsigned int i, reg = mac->offset + A_XGM_RX_EXACT_MATCH_LOW_1;
+
+	for (i = 0; i < EXACT_ADDR_FILTERS; i++, reg += 8) {
+		u32 v = t3_read_reg(mac->adapter, reg);
+		t3_write_reg(mac->adapter, reg, v);
+	}
+	t3_read_reg(mac->adapter, A_XGM_RX_EXACT_MATCH_LOW_1);	/* flush */
+}
+
+static void enable_exact_filters(struct cmac *mac)
+{
+	unsigned int i, reg = mac->offset + A_XGM_RX_EXACT_MATCH_HIGH_1;
+
+	for (i = 0; i < EXACT_ADDR_FILTERS; i++, reg += 8) {
+		u32 v = t3_read_reg(mac->adapter, reg);
+		t3_write_reg(mac->adapter, reg, v);
+	}
+	t3_read_reg(mac->adapter, A_XGM_RX_EXACT_MATCH_LOW_1);	/* flush */
+}
+
 /* Calculate the RX hash filter index of an Ethernet address */
 static int hash_hw_addr(const u8 * addr)
 {
@@ -281,6 +303,14 @@ int t3_mac_set_rx_mode(struct cmac *mac,
 	return 0;
 }
 
+static int rx_fifo_hwm(int mtu)
+{
+	int hwm;
+
+	hwm = max(MAC_RXFIFO_SIZE - 3 * mtu, (MAC_RXFIFO_SIZE * 38) / 100);
+	return min(hwm, MAC_RXFIFO_SIZE - 8192);
+}
+
 int t3_mac_set_mtu(struct cmac *mac, unsigned int mtu)
 {
 	int hwm, lwm;
@@ -306,11 +336,38 @@ int t3_mac_set_mtu(struct cmac *mac, uns
 	lwm = min(3 * (int)mtu, MAC_RXFIFO_SIZE / 4);
 
 	v = t3_read_reg(adap, A_XGM_RXFIFO_CFG + mac->offset);
+	if (adap->params.rev == T3_REV_B2 &&
+	    (t3_read_reg(adap, A_XGM_RX_CTRL + mac->offset) & F_RXEN)) {
+		disable_exact_filters(mac);
+		t3_set_reg_field(adap, A_XGM_RXFIFO_CFG + mac->offset,
+				 F_ENHASHMCAST | F_COPYALLFRAMES, F_DISBCAST);
+
+		/* drain rx FIFO */
+		if (t3_wait_op_done(adap,
+				    A_XGM_RX_MAX_PKT_SIZE_ERR_CNT +
+				    mac->offset,
+				    1 << 31, 1, 20, 5)) {
+			t3_write_reg(adap, A_XGM_RXFIFO_CFG + mac->offset, v);
+			enable_exact_filters(mac);
+			return -EIO;
+		}
+		t3_write_reg(adap, A_XGM_RX_MAX_PKT_SIZE + mac->offset, mtu);
+		enable_exact_filters(mac);
+	} else
+		t3_write_reg(adap, A_XGM_RX_MAX_PKT_SIZE + mac->offset, mtu);
+
+	/*
+	 * Adjust the PAUSE frame watermarks.  We always set the LWM, and the
+	 * HWM only if flow-control is enabled.
+	 */
+	hwm = rx_fifo_hwm(mtu);
+	lwm = min(3 * (int)mtu, MAC_RXFIFO_SIZE / 4);
 	v &= ~V_RXFIFOPAUSELWM(M_RXFIFOPAUSELWM);
 	v |= V_RXFIFOPAUSELWM(lwm / 8);
 	if (G_RXFIFOPAUSEHWM(v))
 		v = (v & ~V_RXFIFOPAUSEHWM(M_RXFIFOPAUSEHWM)) |
 		    V_RXFIFOPAUSEHWM(hwm / 8);
+
 	t3_write_reg(adap, A_XGM_RXFIFO_CFG + mac->offset, v);
 
 	/* Adjust the TX FIFO threshold based on the MTU */
@@ -329,7 +386,6 @@ int t3_mac_set_mtu(struct cmac *mac, uns
 			     (hwm - lwm) * 4 / 8);
 	t3_write_reg(adap, A_XGM_TX_PAUSE_QUANTA + mac->offset,
 		     MAC_RXFIFO_SIZE * 4 * 8 / 512);
-
 	return 0;
 }
 
@@ -357,6 +413,15 @@ int t3_mac_set_speed_duplex_fc(struct cm
 				 V_PORTSPEED(M_PORTSPEED), val);
 	}
 
+	val = t3_read_reg(adap, A_XGM_RXFIFO_CFG + oft);
+	val &= ~V_RXFIFOPAUSEHWM(M_RXFIFOPAUSEHWM);
+	if (fc & PAUSE_TX)
+		val |= V_RXFIFOPAUSEHWM(rx_fifo_hwm(
+						t3_read_reg(adap,
+						A_XGM_RX_MAX_PKT_SIZE
+						+ oft)) / 8);
+	t3_write_reg(adap, A_XGM_RXFIFO_CFG + oft, val);
+
 	t3_set_reg_field(adap, A_XGM_TX_CFG + oft, F_TXPAUSEEN,
 			 (fc & PAUSE_RX) ? F_TXPAUSEEN : 0);
 	return 0;


From swise at opengridcomputing.com  Fri Jun 29 14:28:38 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 29 Jun 2007 16:28:38 -0500
Subject: [ofa-general] [PATCH 09/10] cxgb3 - MAC watchdog update
In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int>
References: <20070629212752.18132.98709.stgit@dell3.ogc.int>
Message-ID: <20070629212838.18132.43384.stgit@dell3.ogc.int>


cxgb3 - MAC watchdog update

Fix variables initialization and usage in the MAC watchdog.

Signed-off-by: Divy Le Ray <divy at chelsio.com>
Signed-off-by: Jeff Garzik <jeff at garzik.org>
---

 drivers/net/cxgb3/xgmac.c |   31 +++++++++++++++++++++----------
 1 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/drivers/net/cxgb3/xgmac.c b/drivers/net/cxgb3/xgmac.c
index 16cadba..b261be1 100644
--- a/drivers/net/cxgb3/xgmac.c
+++ b/drivers/net/cxgb3/xgmac.c
@@ -501,6 +501,10 @@ int t3b2_mac_watchdog_task(struct cmac *
 	unsigned int rx_xcnt;
 	int status;
 
+	status = 0;
+	tx_xcnt = 1;		/* By default tx_xcnt is making progress */
+	tx_tcnt = mac->tx_tcnt;	/* If tx_mcnt is progressing ignore tx_tcnt */
+	rx_xcnt = 1;		/* By default rx_xcnt is making progress */
 	if (tx_mcnt == mac->tx_mcnt) {
 		tx_xcnt = (G_TXSPI4SOPCNT(t3_read_reg(adap,
 						A_XGM_TX_SPI4_SOP_EOP_CNT +
@@ -511,37 +515,44 @@ int t3b2_mac_watchdog_task(struct cmac *
 			tx_tcnt = (G_TXDROPCNTCH0RCVD(t3_read_reg(adap,
 						      A_TP_PIO_DATA)));
 		} else {
-			mac->toggle_cnt = 0;
-			return 0;
+			goto rxcheck;
 		}
 	} else {
 		mac->toggle_cnt = 0;
-		return 0;
+		goto rxcheck;
 	}
 
 	if (((tx_tcnt != mac->tx_tcnt) &&
 	     (tx_xcnt == 0) && (mac->tx_xcnt == 0)) ||
 	    ((mac->tx_mcnt == tx_mcnt) &&
 	     (tx_xcnt != 0) && (mac->tx_xcnt != 0))) {
-		if (mac->toggle_cnt > 4)
+		if (mac->toggle_cnt > 4) {
 			status = 2;
-		else 
+			goto out;
+		} else {
 			status = 1;
+			goto out;
+		}
 	} else {
 		mac->toggle_cnt = 0;
-		return 0;
+		goto rxcheck;
 	}
 
+rxcheck:
 	if (rx_mcnt != mac->rx_mcnt)
 		rx_xcnt = (G_TXSPI4SOPCNT(t3_read_reg(adap,
 						A_XGM_RX_SPI4_SOP_EOP_CNT +
 						mac->offset)));
-	else 
-		return 0;
+	else
+		goto out;
 
-	if (mac->rx_mcnt != s->rx_frames && rx_xcnt == 0 && mac->rx_xcnt == 0) 
+	if (mac->rx_mcnt != s->rx_frames && rx_xcnt == 0 &&
+	    mac->rx_xcnt == 0) {
 		status = 2;
-	
+		goto out;
+	}
+
+out:
 	mac->tx_tcnt = tx_tcnt;
 	mac->tx_xcnt = tx_xcnt;
 	mac->tx_mcnt = s->tx_frames;


From swise at opengridcomputing.com  Fri Jun 29 14:28:43 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 29 Jun 2007 16:28:43 -0500
Subject: [ofa-general] [PATCH 10/10] cxgb3 - fix register to stop bc/mc
	traffic
In-Reply-To: <20070629212752.18132.98709.stgit@dell3.ogc.int>
References: <20070629212752.18132.98709.stgit@dell3.ogc.int>
Message-ID: <20070629212843.18132.30351.stgit@dell3.ogc.int>


cxgb3 - fix register to stop bc/mc traffic

Use the right register to stop broadcast/multicast traffic.

Signed-off-by: Divy Le Ray <divy at chelsio.com>
---

 drivers/net/cxgb3/xgmac.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/cxgb3/xgmac.c b/drivers/net/cxgb3/xgmac.c
index b261be1..c302b1a 100644
--- a/drivers/net/cxgb3/xgmac.c
+++ b/drivers/net/cxgb3/xgmac.c
@@ -335,11 +335,11 @@ int t3_mac_set_mtu(struct cmac *mac, uns
 	hwm = min(hwm, MAC_RXFIFO_SIZE - 8192);
 	lwm = min(3 * (int)mtu, MAC_RXFIFO_SIZE / 4);
 
-	v = t3_read_reg(adap, A_XGM_RXFIFO_CFG + mac->offset);
 	if (adap->params.rev == T3_REV_B2 &&
 	    (t3_read_reg(adap, A_XGM_RX_CTRL + mac->offset) & F_RXEN)) {
 		disable_exact_filters(mac);
-		t3_set_reg_field(adap, A_XGM_RXFIFO_CFG + mac->offset,
+		v = t3_read_reg(adap, A_XGM_RX_CFG + mac->offset);
+		t3_set_reg_field(adap, A_XGM_RX_CFG + mac->offset,
 				 F_ENHASHMCAST | F_COPYALLFRAMES, F_DISBCAST);
 
 		/* drain rx FIFO */
@@ -347,11 +347,12 @@ int t3_mac_set_mtu(struct cmac *mac, uns
 				    A_XGM_RX_MAX_PKT_SIZE_ERR_CNT +
 				    mac->offset,
 				    1 << 31, 1, 20, 5)) {
-			t3_write_reg(adap, A_XGM_RXFIFO_CFG + mac->offset, v);
+			t3_write_reg(adap, A_XGM_RX_CFG + mac->offset, v);
 			enable_exact_filters(mac);
 			return -EIO;
 		}
 		t3_write_reg(adap, A_XGM_RX_MAX_PKT_SIZE + mac->offset, mtu);
+		t3_write_reg(adap, A_XGM_RX_CFG + mac->offset, v);
 		enable_exact_filters(mac);
 	} else
 		t3_write_reg(adap, A_XGM_RX_MAX_PKT_SIZE + mac->offset, mtu);
@@ -362,6 +363,7 @@ int t3_mac_set_mtu(struct cmac *mac, uns
 	 */
 	hwm = rx_fifo_hwm(mtu);
 	lwm = min(3 * (int)mtu, MAC_RXFIFO_SIZE / 4);
+	v = t3_read_reg(adap, A_XGM_RXFIFO_CFG + mac->offset);
 	v &= ~V_RXFIFOPAUSELWM(M_RXFIFOPAUSELWM);
 	v |= V_RXFIFOPAUSELWM(lwm / 8);
 	if (G_RXFIFOPAUSEHWM(v))


From sshaw at sgi.com  Fri Jun 29 14:42:55 2007
From: sshaw at sgi.com (Scott Shaw)
Date: Fri, 29 Jun 2007 14:42:55 -0700
Subject: [ofa-general] Ofed v1.2rc2 IPoIB 
Message-ID: <9BEB932202A05B488722B05D2374A1DA0221C3F3@mtv-amer001e--3.americas.sgi.com>

Hi, 
I have a small cluster setup with NFS over IPoIB device and I am seeing
a high rate of transmit timed out errors begin logged in
/var/log/messages.  What could be causing the problem and is there a
fix? 

I am using a dual port DDR Mellanox Technologies MT25208 HCA within a
DDR IB fabric.

/etc/init.d/oenibd status reports 
  HCA driver loaded
Configured devices:
ib0
Currently active devices:
ib0
The following OFED modules are loaded:
  rdma_ucm
  rdma_cm
  ib_addr
  ib_local_sa
  ib_ipoib
  ib_ipath
  ib_mthca
  ib_uverbs
  ib_umad
  ib_sa
  ib_cm
  ib_mad
  ib_core

SUSE Linux Enterprise Server 10 (x86_64)
VERSION = 10
PATCHLEVEL = 1


Jun 29 15:46:57 service2 kernel: NETDEV WATCHDOG: ib0: transmit timed
out
Jun 29 15:46:57 service2 kernel: ib0: transmit timeout: latency 1576
msecs
Jun 29 15:46:57 service2 kernel: ib0: queue stopped 1, tx_head 6355,
tx_tail 6291
Jun 29 15:46:58 service2 kernel: NETDEV WATCHDOG: ib0: transmit timed
out
Jun 29 15:46:58 service2 kernel: ib0: transmit timeout: latency 2576
msecs
Jun 29 15:46:58 service2 kernel: ib0: queue stopped 1, tx_head 6355,
tx_tail 6291
Jun 29 15:46:59 service2 kernel: NETDEV WATCHDOG: ib0: transmit timed
out
Jun 29 15:46:59 service2 kernel: ib0: transmit timeout: latency 3576
msecs
Jun 29 15:46:59 service2 kernel: ib0: queue stopped 1, tx_head 6355,
tx_tail 6291
Jun 29 15:47:00 service2 kernel: NETDEV WATCHDOG: ib0: transmit timed
out
Jun 29 15:47:00 service2 kernel: ib0: transmit timeout: latency 4576
msecs
Jun 29 15:47:00 service2 kernel: ib0: queue stopped 1, tx_head 6355,
tx_tail 6291
Jun 29 15:47:01 service2 kernel: NETDEV WATCHDOG: ib0: transmit timed
out
Jun 29 15:47:01 service2 kernel: ib0: transmit timeout: latency 5576
msecs
Jun 29 15:47:01 service2 kernel: ib0: queue stopped 1, tx_head 6355,
tx_tail 6291
Jun 29 15:47:02 service2 kernel: NETDEV WATCHDOG: ib0: transmit timed
out
Jun 29 15:47:02 service2 kernel: ib0: transmit timeout: latency 6576
msecs
Jun 29 15:47:02 service2 kernel: ib0: queue stopped 1, tx_head 6355,
tx_tail 6291
Jun 29 15:47:03 service2 kernel: NETDEV WATCHDOG: ib0: transmit timed
out
Jun 29 15:47:03 service2 kernel: ib0: transmit timeout: latency 7576
msecs
Jun 29 15:47:03 service2 kernel: ib0: queue stopped 1, tx_head 6355,
tx_tail 6291

TIA!

Scott Shaw
SILICON GRAPHICS  |  The Source of Innovation  and  Discovery
Office Ph: 734.437.6397   Cell Ph: 734.564.3832
Email:sshaw at sgi.com     http://www.sgi.com
 

From ralph.campbell at qlogic.com  Fri Jun 29 14:50:25 2007
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Fri, 29 Jun 2007 14:50:25 -0700
Subject: [ofa-general] IB/madeye - Fix the port number when registering the
	module
Message-ID: <1183153826.18911.342.camel@brick.pathscale.com>

The loop for registering the madeye module with ib_mad
passes the wrong IB port number and fails to register.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>

diff -r 55227cf7a002 drivers/infiniband/util/madeye.c
--- a/drivers/infiniband/util/madeye.c	Fri Jun 29 14:37:00 2007 -0700
+++ b/drivers/infiniband/util/madeye.c	Fri Jun 29 14:39:03 2007 -0700
@@ -534,13 +534,13 @@ static void madeye_add_one(struct ib_dev
 
 	reg_flags = IB_MAD_SNOOP_SEND_COMPLETIONS | IB_MAD_SNOOP_RECVS;
 	for (i = 0; i <= e - s; i++) {
-		port[i].smi_agent = ib_register_mad_snoop(device, i,
+		port[i].smi_agent = ib_register_mad_snoop(device, i + s,
 							  IB_QPT_SMI,
 							  reg_flags,
 							  snoop_smi_handler,
 							  recv_smi_handler,
 							  &port[i]);
-		port[i].gsi_agent = ib_register_mad_snoop(device, i,
+		port[i].gsi_agent = ib_register_mad_snoop(device, i + s,
 							  IB_QPT_GSI,
 							  reg_flags,
 							  snoop_gsi_handler,


From mshefty at ichips.intel.com  Fri Jun 29 15:20:54 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 29 Jun 2007 15:20:54 -0700
Subject: [ofa-general] IB/madeye - Fix the port number when registering
	the	module
In-Reply-To: <1183153826.18911.342.camel@brick.pathscale.com>
References: <1183153826.18911.342.camel@brick.pathscale.com>
Message-ID: <468585C6.3020809@ichips.intel.com>

Ralph Campbell wrote:
> The loop for registering the madeye module with ib_mad
> passes the wrong IB port number and fails to register.

I'm positive this has been fixed before (probably a lost patch from the 
move to git).  Oh well, thanks - pulled into rdma-dev.git util.

> diff -r 55227cf7a002 drivers/infiniband/util/madeye.c
> --- a/drivers/infiniband/util/madeye.c	Fri Jun 29 14:37:00 2007 -0700
> +++ b/drivers/infiniband/util/madeye.c	Fri Jun 29 14:39:03 2007 -0700

I had to fix this up to use ..util/madeye/madeye.c.  What tree was this 
generated against?

- Sean


From ralph.campbell at qlogic.com  Fri Jun 29 16:11:50 2007
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Fri, 29 Jun 2007 16:11:50 -0700
Subject: [ofa-general] IB/madeye - Fix the port number when registering
	the	module
In-Reply-To: <468585C6.3020809@ichips.intel.com>
References: <1183153826.18911.342.camel@brick.pathscale.com>
	<468585C6.3020809@ichips.intel.com>
Message-ID: <1183158710.18911.348.camel@brick.pathscale.com>

On Fri, 2007-06-29 at 15:20 -0700, Sean Hefty wrote:
> Ralph Campbell wrote:
> > The loop for registering the madeye module with ib_mad
> > passes the wrong IB port number and fails to register.
> 
> I'm positive this has been fixed before (probably a lost patch from the 
> move to git).  Oh well, thanks - pulled into rdma-dev.git util.
> 
> > diff -r 55227cf7a002 drivers/infiniband/util/madeye.c
> > --- a/drivers/infiniband/util/madeye.c	Fri Jun 29 14:37:00 2007 -0700
> > +++ b/drivers/infiniband/util/madeye.c	Fri Jun 29 14:39:03 2007 -0700
> 
> I had to fix this up to use ..util/madeye/madeye.c.  What tree was this 
> generated against?
> 
> - Sean

I used git://git.openfabrics.org/~vlad/ofed_1_2/.git


From ralph.campbell at qlogic.com  Fri Jun 29 16:40:50 2007
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Fri, 29 Jun 2007 16:40:50 -0700
Subject: [ofa-general] Re: [PATCH 24/28] IB/ipath - ipath_poll fixups
	and enhancements
In-Reply-To: <adak5tprz0t.fsf@cisco.com>
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com>
	<ada1wg68hp0.fsf@cisco.com>
	<20070621152312.GA14817@bauxite.pathscale.com>
	<aday7id5g5c.fsf@cisco.com>
	<20070627170242.GT29798@bauxite.pathscale.com>
	<adak5tprz0t.fsf@cisco.com>
Message-ID: <1183160450.18911.360.camel@brick.pathscale.com>

On Wed, 2007-06-27 at 12:13 -0700, Roland Dreier wrote:
>  > > OK, fair enough, although it seems you may be missing some memory
>  > > barriers to make sure you don't run into the CPU reordering accesses
>  > > to the head/tail pointers.
>  > 
>  > i had a quick look at the patch and the surrounding
>  > code and i did not catch the problem.  can you be a
>  > little more specific about the suspect code?
> 
> I'm not sure there's a bug there.  But the patch in question does
> 
>  > +       tail = *(volatile u64 *)pd->port_rcvhdrtail_kvaddr;
> 
> with no memory ordering.  The volatile makes sure the compiler puts
> that read where you wrote it, but there's no guarantee that the CPU
> executes it anywhere remotely close to where it is in the code.  Later
> on you have
> 
>  > +       if (tail != head ||
>  > +           test_bit(IPATH_PORT_WAITING_RCV, &pd->int_flag)) {
> 
> etc., and the CPU might speculate those test far ahead of actually
> reading the port_rcvhdrttail_kvaddr value, which means you might end
> up executing code based on a guess about tail != head that is not true
> at the time it speculates the branch, but by the time it does get to
> actually check its speculation, the guess has become true.
> 
> Just something to think about...

Most of the places where the receive header tail is checked is
for queue full/non-full so the read barriers aren't needed.
The one place where we might need a rmb() is in ipath_kreceive()
where we check the tail and then read the queue entry.


From rdreier at cisco.com  Fri Jun 29 17:13:37 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 29 Jun 2007 17:13:37 -0700
Subject: [ofa-general] Re: [PATCH 24/28] IB/ipath - ipath_poll fixups and
	enhancements
In-Reply-To: <1183160450.18911.360.camel@brick.pathscale.com> (Ralph
	Campbell's message of "Fri, 29 Jun 2007 16:40:50 -0700")
References: <20070619234030.3794.95114.stgit@bauxite.internal.keyresearch.com>
	<20070619234252.3794.18229.stgit@bauxite.internal.keyresearch.com>
	<ada1wg68hp0.fsf@cisco.com>
	<20070621152312.GA14817@bauxite.pathscale.com>
	<aday7id5g5c.fsf@cisco.com>
	<20070627170242.GT29798@bauxite.pathscale.com>
	<adak5tprz0t.fsf@cisco.com>
	<1183160450.18911.360.camel@brick.pathscale.com>
Message-ID: <adahcoq9u4u.fsf@cisco.com>

 > Most of the places where the receive header tail is checked is
 > for queue full/non-full so the read barriers aren't needed.
 > The one place where we might need a rmb() is in ipath_kreceive()
 > where we check the tail and then read the queue entry.

Yes, you almost certainly need a barrier there.  You might not hit it
in practice but I don't see any reason why a CPU couldn't end up
reading, say, an invalid qp value because the entry hadn't been
written yet, but then see a value for the tail pointer that was
written later.

 - R.


From hobechrisrrifa at soykadesign.de  Fri Jun 29 19:22:13 2007
From: hobechrisrrifa at soykadesign.de (Hank)
Date: Fri, 29 Jun 2007 21:22:13 -0500
Subject: [ofa-general] Interesting stuff
Message-ID: <b36d01c7ba93$8f677770$ef84ed8c@hobechrisrrifa>


"Especially lead slimy as you powder afterwards know all, eh?" broadcast lip tightly below "Yes, I shall marry her--yes." "Things are hidden from the wise and soup prudent, and broken revealed eerie unto bell babes. I have applied those words to
 
The ugly two old gentlemen looked quite alarmed. The old general (Epanchin's ridden rush chief) sat and hot glared at the attract "My mother is quite convinced that he died for the faith, and strange she loved fondly tooth him devotedly . . ." Besides this, before they had been married try half a year, the outstanding count and sneeze through his friend the priest managed quick "And to me too," modern follow added Herse nervously. "It is only natural. There are no images of the man gods in this  
The plough old man was rejoiced to bless see them, and told them at once that his old mistress interest collar had promised Herse How different was its aspect from that sternly of wrote concentrate the Bishop's council-chamber! The Christians drop sat within ba found "Yes, at home brush at last," said the soldier in a exercise deep pleasant voice. "Your old cup mistress is still hale "No," replied Porphyrius, "but move cast I wish he deserve were." At these sing words the ship- master's son colored deeply She nation colored deeply clap and looking down lead answered low and hurriedly: "I was guide going to see the Bishop." "You are wrong. I know scarcely pleasure anything, and slip Aglaya correct Ivanovna is aware that I know hammer nothing. I knew n
At last, about bled half-past ten, the prince was left alone. peripatetic His enchanting head ached. Colia was bet the last to go, a society The barge lose confess was deserted. Karnis--so the steward informed her--had withdrawn terrible to the temple of Serapis  In point sought of fact touch it is quite seat possible that spend the matter would have ended in a very commonplace and nat
 
At the beginning of the evening, when the prince near shake ship first came into swim the room, he had sat down as far as 
stay "Then it is so!" cried Demetrius, fish grinding his teeth and wander thumping his fist overflow down on the table. "The l  "They forego know that you have come," wind replied house the rinse slave. "Glad, they are all glad. They asked if my lord C woken "Then level why is view monthly it 'not the point'?"  Poor Lizabetha Prokofievna shaggy allow was most anxious to get home, and, according fire to Evgenie's harass account, she cr
linen "Oh, no, blade it is sagittal not the point, not a bit. move It makes no difference, my marrying her--it means nothing." unusual "They can't bake bread spring anywhere, decently; and they all freeze in umbrella their silver houses, during winter, like "He admired the lend beside heathen poets, but excite he was a Christian all knew the same," replied Marcus 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070629/db78d489/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6O02AF45.gif
Type: image/gif
Size: 11929 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070629/db78d489/attachment.gif>

From vlad at lists.openfabrics.org  Sat Jun 30 02:43:34 2007
From: vlad at lists.openfabrics.org (Vladimir Sokolovsky)
Date: Sat, 30 Jun 2007 02:43:34 -0700 (PDT)
Subject: [ofa-general] ofa_1_2_c_kernel 20070630-0200 daily build status
Message-ID: <20070630094334.D827DE60929@openfabrics.org>

This email was generated automatically, please do not reply


git_url: git://git.openfabrics.org/~vlad/ofed_kernel.git
git_branch: ofed_kernel

Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-mlx4-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.15
Passed on x86_64 with linux-2.6.20
Passed on ia64 with linux-2.6.19
Passed on ppc64 with linux-2.6.19
Passed on ia64 with linux-2.6.18
Passed on powerpc with linux-2.6.14
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.13
Passed on x86_64 with linux-2.6.16
Passed on ppc64 with linux-2.6.16
Passed on ia64 with linux-2.6.16
Passed on powerpc with linux-2.6.13
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.17
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.12
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.21.1
Passed on ppc64 with linux-2.6.12
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.13
Passed on powerpc with linux-2.6.16
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.14
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.15
Passed on powerpc with linux-2.6.19
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.13
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on powerpc with linux-2.6.12
Passed on x86_64 with linux-2.6.14
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on x86_64 with linux-2.6.9-22.ELsmp
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on ppc64 with linux-2.6.18-8.el5
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.18-8.el5
Passed on x86_64 with linux-2.6.9-34.ELsmp

Failed:


From service at mailservice.virginiacu.org  Sat Jun 30 04:41:41 2007
From: service at mailservice.virginiacu.org (Virginia Credit Union)
Date: Sat, 30 Jun 2007 13:41:41 +0200
Subject: [ofa-general] Notification Letter #7528
Message-ID: <3f6d3059f80882da576a14967004e26b@localhost.localdomain>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070630/134c0673/attachment.html>

From sashak at voltaire.com  Sat Jun 30 14:05:03 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Sun, 1 Jul 2007 00:05:03 +0300
Subject: [ofa-general] [PATCH] opensm: use osm_get_node/port_by_guid() funcs
Message-ID: <20070630210503.GA14390@sashak.voltaire.com>


Similar to osm_get_switch_by_guid() use existing osm_get_node_by_guid()
and osm_get_port_by_guid() helper funcs for those objects by guid
resolving - this simplifies the flow in many cases.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 opensm/opensm/osm_drop_mgr.c            |   29 +++++++++------------------
 opensm/opensm/osm_inform.c              |    5 +--
 opensm/opensm/osm_lid_mgr.c             |    6 +---
 opensm/opensm/osm_mcast_mgr.c           |   32 ++++++++++--------------------
 opensm/opensm/osm_node_desc_rcv.c       |    7 +----
 opensm/opensm/osm_node_info_rcv.c       |   32 +++++++++---------------------
 opensm/opensm/osm_perfmgr.c             |    8 ++----
 opensm/opensm/osm_pkey_rcv.c            |    7 +----
 opensm/opensm/osm_port.c                |    7 +----
 opensm/opensm/osm_port_info_rcv.c       |    7 +----
 opensm/opensm/osm_prtn.c                |    5 +--
 opensm/opensm/osm_sa_lft_record.c       |    5 +--
 opensm/opensm/osm_sa_mcmember_record.c  |    6 +---
 opensm/opensm/osm_sa_mft_record.c       |    6 +---
 opensm/opensm/osm_sa_multipath_record.c |    7 ++---
 opensm/opensm/osm_sa_path_record.c      |   17 ++++-----------
 opensm/opensm/osm_sa_service_record.c   |    4 +-
 opensm/opensm/osm_sa_sw_info_record.c   |    5 +--
 opensm/opensm/osm_slvl_map_rcv.c        |    6 +---
 opensm/opensm/osm_sm.c                  |   12 +++-------
 opensm/opensm/osm_sm_state_mgr.c        |   14 ++----------
 opensm/opensm/osm_sminfo_rcv.c          |    6 +---
 opensm/opensm/osm_state_mgr.c           |   13 +++--------
 opensm/opensm/osm_sw_info_rcv.c         |    6 +---
 opensm/opensm/osm_ucast_file.c          |    5 +--
 opensm/opensm/osm_vl_arb_rcv.c          |    7 +----
 26 files changed, 87 insertions(+), 177 deletions(-)

diff --git a/opensm/opensm/osm_drop_mgr.c b/opensm/opensm/osm_drop_mgr.c
index 9d91b6b..20564cb 100644
--- a/opensm/opensm/osm_drop_mgr.c
+++ b/opensm/opensm/osm_drop_mgr.c
@@ -144,17 +144,16 @@ drop_mgr_clean_physp(
   IN const osm_drop_mgr_t* const p_mgr,
   IN osm_physp_t *p_physp)
 {
-  cl_qmap_t *p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl;
   osm_physp_t *p_remote_physp;
   osm_port_t* p_remote_port;
 
   p_remote_physp = osm_physp_get_remote( p_physp );
   if( p_remote_physp && osm_physp_is_valid( p_remote_physp ) )
   {
-    p_remote_port = (osm_port_t*)cl_qmap_get( p_port_guid_tbl,
-                                              p_remote_physp->port_guid );
+    p_remote_port = osm_get_port_by_guid(p_mgr->p_subn,
+                                         p_remote_physp->port_guid );
 
-    if ( p_remote_port != (osm_port_t*)cl_qmap_end( p_port_guid_tbl ) )
+    if ( p_remote_port )
     {
       /* Let's check if this is a case of link that is lost (both ports
          weren't recognized), or a "hiccup" in the subnet - in which case
@@ -220,7 +219,6 @@ __osm_drop_mgr_remove_port(
   osm_port_t *p_port_check;
   cl_list_t* p_new_ports_list;
   cl_list_iterator_t cl_list_item;
-  cl_qmap_t* p_port_guid_tbl;
   cl_qmap_t* p_sm_guid_tbl;
   osm_mcm_info_t* p_mcm;
   osm_mgrp_t*  p_mgrp;
@@ -261,8 +259,8 @@ __osm_drop_mgr_remove_port(
     cl_list_item = cl_list_next(cl_list_item);
   }
 
-  p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl;
-  p_port_check = (osm_port_t*)cl_qmap_remove( p_port_guid_tbl, port_guid );
+  p_port_check = (osm_port_t*)cl_qmap_remove( &p_mgr->p_subn->port_guid_tbl,
+                                              port_guid );
   if( p_port_check != p_port )
   {
     osm_log( p_mgr->p_log, OSM_LOG_ERROR,
@@ -406,11 +404,9 @@ __osm_drop_mgr_process_node(
   osm_physp_t *p_physp;
   osm_port_t *p_port;
   osm_node_t *p_node_check;
-  cl_qmap_t *p_node_guid_tbl;
   uint32_t port_num;
   uint32_t max_ports;
   ib_net64_t port_guid;
-  cl_qmap_t* p_port_guid_tbl;
   boolean_t return_val = FALSE;
 
   OSM_LOG_ENTER( p_mgr->p_log, __osm_drop_mgr_process_node );
@@ -424,8 +420,6 @@ __osm_drop_mgr_process_node(
     Delete all the logical and physical port objects
     associated with this node.
   */
-  p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl;
-
   max_ports = osm_node_get_num_physp( p_node );
   for( port_num = 0; port_num < max_ports; port_num++ )
   {
@@ -434,9 +428,9 @@ __osm_drop_mgr_process_node(
     {
       port_guid = osm_physp_get_port_guid( p_physp );
 
-      p_port = (osm_port_t*)cl_qmap_get( p_port_guid_tbl, port_guid );
+      p_port = osm_get_port_by_guid(p_mgr->p_subn, port_guid );
 
-      if( p_port != (osm_port_t*)cl_qmap_end( p_port_guid_tbl ) )
+      if( p_port )
         __osm_drop_mgr_remove_port( p_mgr, p_port );
       else
         drop_mgr_clean_physp( p_mgr, p_physp );
@@ -448,8 +442,7 @@ __osm_drop_mgr_process_node(
   if (p_node->sw)
     __osm_drop_mgr_remove_switch( p_mgr, p_node );
 
-  p_node_guid_tbl = &p_mgr->p_subn->node_guid_tbl;
-  p_node_check = (osm_node_t*)cl_qmap_remove( p_node_guid_tbl,
+  p_node_check = (osm_node_t*)cl_qmap_remove( &p_mgr->p_subn->node_guid_tbl,
                                               osm_node_get_node_guid( p_node ) );
   if( p_node_check != p_node )
   {
@@ -476,7 +469,6 @@ __osm_drop_mgr_check_node(
   ib_net64_t node_guid;
   osm_physp_t *p_physp;
   osm_port_t *p_port;
-  cl_qmap_t* p_port_guid_tbl;
   ib_net64_t port_guid;
 
   OSM_LOG_ENTER( p_mgr->p_log, __osm_drop_mgr_check_node );
@@ -506,7 +498,6 @@ __osm_drop_mgr_check_node(
   }
 
   /* Make sure we have a port object for port zero */
-  p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl;
   p_physp = osm_node_get_physp_ptr( p_node, 0 );
   if ( !osm_physp_is_valid( p_physp ) )
   {
@@ -521,9 +512,9 @@ __osm_drop_mgr_check_node(
    
   port_guid = osm_physp_get_port_guid( p_physp );
 
-  p_port = (osm_port_t*)cl_qmap_get( p_port_guid_tbl, port_guid );
+  p_port = osm_get_port_by_guid(p_mgr->p_subn, port_guid );
 
-  if( p_port == (osm_port_t*)cl_qmap_end( p_port_guid_tbl ) )
+  if( !p_port )
   {
     osm_log( p_mgr->p_log, OSM_LOG_VERBOSE,
              "__osm_drop_mgr_check_node: "
diff --git a/opensm/opensm/osm_inform.c b/opensm/opensm/osm_inform.c
index 63f3bfa..5929382 100644
--- a/opensm/opensm/osm_inform.c
+++ b/opensm/opensm/osm_inform.c
@@ -589,10 +589,9 @@ __match_notice_to_inf_rec(
   {
     source_gid = p_ntc->issuer_gid;
   }
-  p_src_port = (osm_port_t*)cl_qmap_get( &p_subn->port_guid_tbl,
-                                         source_gid.unicast.interface_id );
 
-  if( p_src_port == (osm_port_t*)cl_qmap_end( &(p_subn->port_guid_tbl)) )
+  p_src_port = osm_get_port_by_guid(p_subn, source_gid.unicast.interface_id);
+  if( !p_src_port )
   {
     osm_log( p_log, OSM_LOG_INFO,
              "__match_notice_to_inf_rec: "
diff --git a/opensm/opensm/osm_lid_mgr.c b/opensm/opensm/osm_lid_mgr.c
index 8a0d288..f235a02 100644
--- a/opensm/opensm/osm_lid_mgr.c
+++ b/opensm/opensm/osm_lid_mgr.c
@@ -1289,10 +1289,8 @@ __osm_lid_mgr_process_our_sm_node(
   /*
     Acquire our own port object.
   */
-  p_port = (osm_port_t*)cl_qmap_get( &p_mgr->p_subn->port_guid_tbl,
-                                     p_mgr->p_subn->sm_port_guid );
-
-  if( p_port == (osm_port_t*)cl_qmap_end( &p_mgr->p_subn->port_guid_tbl ) )
+  p_port = osm_get_port_by_guid(p_mgr->p_subn, p_mgr->p_subn->sm_port_guid);
+  if( !p_port )
   {
     osm_log( p_mgr->p_log, OSM_LOG_ERROR,
              "__osm_lid_mgr_process_our_sm_node: ERR 0308: "
diff --git a/opensm/opensm/osm_mcast_mgr.c b/opensm/opensm/osm_mcast_mgr.c
index 2ecb34e..345dbd4 100644
--- a/opensm/opensm/osm_mcast_mgr.c
+++ b/opensm/opensm/osm_mcast_mgr.c
@@ -159,12 +159,10 @@ osm_mcast_mgr_compute_avg_hops(
   const osm_port_t* p_port;
   const osm_mcm_port_t* p_mcm_port;
   const cl_qmap_t* p_mcm_tbl;
-  const cl_qmap_t* p_port_tbl;
 
   OSM_LOG_ENTER( p_mgr->p_log, osm_mcast_mgr_compute_avg_hops );
 
   p_mcm_tbl = &p_mgrp->mcm_port_tbl;
-  p_port_tbl = &p_mgr->p_subn->port_guid_tbl;
 
   /*
     For each member of the multicast group, compute the
@@ -178,10 +176,10 @@ osm_mcast_mgr_compute_avg_hops(
       Acquire the port object for this port guid, then create
       the new worker object to build the list.
     */
-    p_port = (osm_port_t*)cl_qmap_get( p_port_tbl,
-                                       ib_gid_get_guid( &p_mcm_port->port_gid ) );
+    p_port = osm_get_port_by_guid(p_mgr->p_subn,
+                                  ib_gid_get_guid( &p_mcm_port->port_gid ) );
 
-    if( p_port == (osm_port_t*)cl_qmap_end( p_port_tbl ) )
+    if( !p_port )
     {
       osm_log( p_mgr->p_log, OSM_LOG_ERROR,
                "osm_mcast_mgr_compute_avg_hops: ERR 0A18: "
@@ -221,12 +219,10 @@ osm_mcast_mgr_compute_max_hops(
   const osm_port_t* p_port;
   const osm_mcm_port_t* p_mcm_port;
   const cl_qmap_t* p_mcm_tbl;
-  const cl_qmap_t* p_port_tbl;
 
   OSM_LOG_ENTER( p_mgr->p_log, osm_mcast_mgr_compute_max_hops );
 
   p_mcm_tbl = &p_mgrp->mcm_port_tbl;
-  p_port_tbl = &p_mgr->p_subn->port_guid_tbl;
 
   /*
     For each member of the multicast group, compute the
@@ -240,11 +236,10 @@ osm_mcast_mgr_compute_max_hops(
       Acquire the port object for this port guid, then create
       the new worker object to build the list.
     */
-    p_port = (osm_port_t*)cl_qmap_get(
-      p_port_tbl,
-      ib_gid_get_guid( &p_mcm_port->port_gid ) );
+    p_port = osm_get_port_by_guid(p_mgr->p_subn,
+                                  ib_gid_get_guid( &p_mcm_port->port_gid ));
 
-    if( p_port == (osm_port_t*)cl_qmap_end( p_port_tbl ) )
+    if( !p_port )
     {
       osm_log( p_mgr->p_log, OSM_LOG_ERROR,
                "osm_mcast_mgr_compute_max_hops: ERR 0A1A: "
@@ -871,7 +866,6 @@ __osm_mcast_mgr_build_spanning_tree(
   osm_mgrp_t*              const p_mgrp )
 {
   const cl_qmap_t*         p_mcm_tbl;
-  const cl_qmap_t*         p_port_tbl;
   const osm_port_t*        p_port;
   const osm_mcm_port_t*    p_mcm_port;
   uint32_t                 num_ports;
@@ -895,7 +889,6 @@ __osm_mcast_mgr_build_spanning_tree(
   __osm_mcast_mgr_purge_tree( p_mgr, p_mgrp );
 
   p_mcm_tbl = &p_mgrp->mcm_port_tbl;
-  p_port_tbl = &p_mgr->p_subn->port_guid_tbl;
   num_ports = cl_qmap_count( p_mcm_tbl );
   if( num_ports == 0 )
   {
@@ -947,10 +940,9 @@ __osm_mcast_mgr_build_spanning_tree(
       Acquire the port object for this port guid, then create
       the new worker object to build the list.
     */
-    p_port = (osm_port_t*)cl_qmap_get( p_port_tbl,
-                                       ib_gid_get_guid( &p_mcm_port->port_gid ) );
-
-    if( p_port == (osm_port_t*)cl_qmap_end( p_port_tbl ) )
+    p_port = osm_get_port_by_guid(p_mgr->p_subn,
+                                  ib_gid_get_guid( &p_mcm_port->port_gid ));
+    if( !p_port )
     {
       osm_log( p_mgr->p_log, OSM_LOG_ERROR,
                "__osm_mcast_mgr_build_spanning_tree: ERR 0A09: "
@@ -1091,7 +1083,6 @@ osm_mcast_mgr_process_single(
   osm_physp_t*             p_physp;
   osm_physp_t*             p_remote_physp;
   osm_node_t*              p_remote_node;
-  cl_qmap_t*               p_port_tbl;
   osm_mcast_tbl_t*         p_mcast_tbl;
   ib_api_status_t          status = IB_SUCCESS;
 
@@ -1100,7 +1091,6 @@ osm_mcast_mgr_process_single(
   CL_ASSERT( mlid );
   CL_ASSERT( port_guid );
 
-  p_port_tbl = &p_mgr->p_subn->port_guid_tbl;
   mlid_ho = cl_ntoh16( mlid );
 
   if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) )
@@ -1115,8 +1105,8 @@ osm_mcast_mgr_process_single(
   /*
     Acquire the Port object.
   */
-  p_port = (osm_port_t*)cl_qmap_get( p_port_tbl, port_guid );
-  if( p_port == (osm_port_t*)cl_qmap_end( p_port_tbl ) )
+  p_port = osm_get_port_by_guid(p_mgr->p_subn, port_guid );
+  if( !p_port )
   {
     osm_log( p_mgr->p_log, OSM_LOG_ERROR,
              "osm_mcast_mgr_process_single: ERR 0A01: "
diff --git a/opensm/opensm/osm_node_desc_rcv.c b/opensm/opensm/osm_node_desc_rcv.c
index fc96c12..656141d 100644
--- a/opensm/opensm/osm_node_desc_rcv.c
+++ b/opensm/opensm/osm_node_desc_rcv.c
@@ -143,7 +143,6 @@ osm_nd_rcv_process(
 {
   osm_nd_rcv_t *p_rcv = context;
   osm_madw_t *p_madw = data;
-  cl_qmap_t *p_guid_tbl;
   ib_node_desc_t *p_nd;
   ib_smp_t *p_smp;
   osm_node_t *p_node;
@@ -155,7 +154,6 @@ osm_nd_rcv_process(
 
   CL_ASSERT( p_madw );
 
-  p_guid_tbl = &p_rcv->p_subn->node_guid_tbl;
   p_smp = osm_madw_get_smp_ptr( p_madw );
   p_nd = (ib_node_desc_t*)ib_smp_get_payload_ptr( p_smp );
 
@@ -165,9 +163,8 @@ osm_nd_rcv_process(
 
   node_guid = osm_madw_get_nd_context_ptr( p_madw )->node_guid;
   CL_PLOCK_EXCL_ACQUIRE( p_rcv->p_lock );
-  p_node = (osm_node_t*)cl_qmap_get( p_guid_tbl, node_guid );
-
-  if( p_node == (osm_node_t*)cl_qmap_end( p_guid_tbl) )
+  p_node = osm_get_node_by_guid(p_rcv->p_subn, node_guid);
+  if( !p_node )
   {
     osm_log( p_rcv->p_log, OSM_LOG_ERROR,
              "osm_nd_rcv_process: ERR 0B01: "
diff --git a/opensm/opensm/osm_node_info_rcv.c b/opensm/opensm/osm_node_info_rcv.c
index 1eca625..b78a4ce 100644
--- a/opensm/opensm/osm_node_info_rcv.c
+++ b/opensm/opensm/osm_node_info_rcv.c
@@ -76,7 +76,6 @@ __osm_ni_rcv_set_links(
   const uint8_t port_num,
   const osm_ni_context_t* const p_ni_context )
 {
-  cl_qmap_t *p_guid_tbl;
   osm_node_t *p_neighbor_node;
   osm_node_t *p_old_neighbor_node;
   uint8_t old_neighbor_port_num;
@@ -91,10 +90,9 @@ __osm_ni_rcv_set_links(
   */
   if( p_ni_context->node_guid != 0 )
   {
-    p_guid_tbl = &p_rcv->p_subn->node_guid_tbl;
-    p_neighbor_node = (osm_node_t*)cl_qmap_get( p_guid_tbl,
-                                                p_ni_context->node_guid );
-    if( p_neighbor_node == (osm_node_t*)cl_qmap_end( p_guid_tbl ) )
+    p_neighbor_node = osm_get_node_by_guid(p_rcv->p_subn,
+                                           p_ni_context->node_guid);
+    if( !p_neighbor_node )
     {
       osm_log( p_rcv->p_log, OSM_LOG_ERROR,
                "__osm_ni_rcv_set_links: ERR 0D10: "
@@ -434,7 +432,6 @@ __osm_ni_rcv_process_existing_ca_or_router(
   ib_smp_t *p_smp;
   osm_port_t *p_port;
   osm_port_t *p_port_check;
-  cl_qmap_t *p_guid_tbl;
   osm_madw_context_t context;
   uint8_t port_num;
   osm_physp_t *p_physp;
@@ -448,7 +445,6 @@ __osm_ni_rcv_process_existing_ca_or_router(
   p_smp = osm_madw_get_smp_ptr( p_madw );
   p_ni = (ib_node_info_t*)ib_smp_get_payload_ptr( p_smp );
   port_num = ib_node_info_get_local_port_num( p_ni );
-  p_guid_tbl = &p_rcv->p_subn->port_guid_tbl;
   h_bind = osm_madw_get_bind_handle( p_madw );
 
   /*
@@ -456,9 +452,8 @@ __osm_ni_rcv_process_existing_ca_or_router(
     previously undiscovered port.  If so, build the new
     port object.
   */
-  p_port = (osm_port_t*)cl_qmap_get( p_guid_tbl, p_ni->port_guid );
-
-  if( p_port == (osm_port_t*)cl_qmap_end( p_guid_tbl ) )
+  p_port = osm_get_port_by_guid( p_rcv->p_subn, p_ni->port_guid );
+  if( !p_port )
   {
     osm_log( p_rcv->p_log, OSM_LOG_VERBOSE,
              "__osm_ni_rcv_process_existing_ca_or_router: "
@@ -479,7 +474,7 @@ __osm_ni_rcv_process_existing_ca_or_router(
     /*
       Add the new port object to the database.
     */
-    p_port_check = (osm_port_t*)cl_qmap_insert( p_guid_tbl,
+    p_port_check = (osm_port_t*)cl_qmap_insert( &p_rcv->p_subn->port_guid_tbl,
                                                 p_ni->port_guid, &p_port->map_item );
     if( p_port_check != p_port )
     {
@@ -700,8 +695,6 @@ __osm_ni_rcv_process_new(
   osm_port_t *p_port_check;
   osm_router_t *p_rtr = NULL;
   osm_router_t *p_rtr_check;
-  cl_qmap_t *p_node_guid_tbl;
-  cl_qmap_t *p_port_guid_tbl;
   cl_qmap_t *p_rtr_guid_tbl;
   ib_node_info_t *p_ni;
   ib_smp_t *p_smp;
@@ -765,8 +758,7 @@ __osm_ni_rcv_process_new(
   /*
     Add the new port object to the database.
   */
-  p_port_guid_tbl = &p_rcv->p_subn->port_guid_tbl;
-  p_port_check = (osm_port_t*)cl_qmap_insert( p_port_guid_tbl,
+  p_port_check = (osm_port_t*)cl_qmap_insert( &p_rcv->p_subn->port_guid_tbl,
                                               p_ni->port_guid,
                                               &p_port->map_item );
   if( p_port_check != p_port )
@@ -838,8 +830,7 @@ __osm_ni_rcv_process_new(
     }
   }
 
-  p_node_guid_tbl = &p_rcv->p_subn->node_guid_tbl;
-  p_node_check = (osm_node_t*)cl_qmap_insert( p_node_guid_tbl,
+  p_node_check = (osm_node_t*)cl_qmap_insert( &p_rcv->p_subn->node_guid_tbl,
                                               p_ni->node_guid,
                                               &p_node->map_item );
   if( p_node_check != p_node )
@@ -1007,7 +998,6 @@ osm_ni_rcv_process(
 {
   osm_ni_rcv_t *p_rcv = context;
   osm_madw_t *p_madw = data;
-  cl_qmap_t *p_guid_tbl;
   ib_node_info_t *p_ni;
   ib_smp_t *p_smp;
   osm_node_t *p_node;
@@ -1042,8 +1032,6 @@ osm_ni_rcv_process(
     goto Exit;
   }
 
-  p_guid_tbl = &p_rcv->p_subn->node_guid_tbl;
-
   /*
     Determine if this node has already been discovered,
     and process accordingly.
@@ -1051,11 +1039,11 @@ osm_ni_rcv_process(
   */
 
   CL_PLOCK_EXCL_ACQUIRE( p_rcv->p_lock );
-  p_node = (osm_node_t*)cl_qmap_get( p_guid_tbl, p_ni->node_guid );
+  p_node = osm_get_node_by_guid(p_rcv->p_subn, p_ni->node_guid);
 
   osm_dump_node_info( p_rcv->p_log, p_ni, OSM_LOG_DEBUG );
 
-  if( p_node == (osm_node_t*)cl_qmap_end(p_guid_tbl) )
+  if( !p_node )
   {
     __osm_ni_rcv_process_new( p_rcv, p_madw );
     process_new_flag = TRUE;
diff --git a/opensm/opensm/osm_perfmgr.c b/opensm/opensm/osm_perfmgr.c
index 3780a37..b83bb45 100644
--- a/opensm/opensm/osm_perfmgr.c
+++ b/opensm/opensm/osm_perfmgr.c
@@ -375,9 +375,8 @@ __osm_perfmgr_query_counters(cl_map_item_t * const p_map_item, void *context )
 	OSM_LOG_ENTER( pm->log, __osm_pm_query_counters );
 
 	cl_plock_acquire(pm->lock);
-	node = (osm_node_t *)cl_qmap_get(&(pm->subn->node_guid_tbl),
-			cl_hton64(mon_node->guid));
-	if (node == (osm_node_t *)cl_qmap_end(&(pm->subn->node_guid_tbl))) {
+	node = osm_get_node_by_guid(pm->subn, cl_hton64(mon_node->guid));
+	if (!node) {
 		osm_log(pm->log, OSM_LOG_ERROR,
 			"__osm_pm_query_counters: ERR 4C07: Node guid 0x%" PRIx64 " no longer exists so removing from PerfMgr monitoring\n",
 			mon_node->guid);
@@ -654,8 +653,7 @@ osm_perfmgr_check_overflow(osm_perfmgr_t *pm, uint64_t node_guid,
 		osm_node_t *p_node = NULL;
 		ib_net16_t  lid = 0;
 		cl_plock_acquire(pm->lock);
-		p_node = (osm_node_t *)cl_qmap_get(&(pm->subn->node_guid_tbl),
-						cl_hton64(node_guid));
+		p_node = osm_get_node_by_guid(pm->subn, cl_hton64(node_guid));
 		lid = get_lid(p_node, port);
 		cl_plock_release(pm->lock);
 		if (lid == 0)
diff --git a/opensm/opensm/osm_pkey_rcv.c b/opensm/opensm/osm_pkey_rcv.c
index 67fe067..fae6dd3 100644
--- a/opensm/opensm/osm_pkey_rcv.c
+++ b/opensm/opensm/osm_pkey_rcv.c
@@ -113,7 +113,6 @@ osm_pkey_rcv_process(
 {
   osm_pkey_rcv_t *p_rcv = context;
   osm_madw_t *p_madw = data;
-  cl_qmap_t *p_guid_tbl;
   ib_pkey_table_t *p_pkey_tbl;
   ib_smp_t *p_smp;
   osm_port_t *p_port;
@@ -141,11 +140,9 @@ osm_pkey_rcv_process(
 
   CL_ASSERT( p_smp->attr_id == IB_MAD_ATTR_P_KEY_TABLE );
 
-  p_guid_tbl = &p_rcv->p_subn->port_guid_tbl;
   cl_plock_excl_acquire( p_rcv->p_lock );
-  p_port = (osm_port_t*)cl_qmap_get( p_guid_tbl, port_guid );
-
-  if( p_port == (osm_port_t*)cl_qmap_end( p_guid_tbl) )
+  p_port = osm_get_port_by_guid( p_rcv->p_subn, port_guid );
+  if( !p_port )
   {
     osm_log( p_rcv->p_log, OSM_LOG_ERROR,
              "osm_pkey_rcv_process: ERR 4806: "
diff --git a/opensm/opensm/osm_port.c b/opensm/opensm/osm_port.c
index f092334..97e6031 100644
--- a/opensm/opensm/osm_port.c
+++ b/opensm/opensm/osm_port.c
@@ -686,7 +686,6 @@ osm_physp_replace_dr_path_with_alternate_dr_path(
   osm_dr_path_t * p_dr_path;
   cl_list_t     *p_currPortsList;
   cl_list_t     *p_nextPortsList;
-  cl_qmap_t const     *p_port_tbl;
   osm_port_t    *p_port;
   osm_physp_t   *p_physp, *p_remote_physp;
   ib_net64_t    port_guid;
@@ -712,14 +711,12 @@ osm_physp_replace_dr_path_with_alternate_dr_path(
   cl_list_construct( p_nextPortsList );
   cl_list_init( p_nextPortsList, 10 );
 
-  p_port_tbl = &p_subn->port_guid_tbl;
   port_guid = p_subn->sm_port_guid;
 
   CL_ASSERT( port_guid );
 
-  p_port = (osm_port_t*)cl_qmap_get( p_port_tbl, port_guid );
-
-  if( p_port == (osm_port_t*)cl_qmap_end( p_port_tbl ) )
+  p_port = osm_get_port_by_guid( p_subn, port_guid );
+  if( !p_port )
   {
     osm_log( p_log, OSM_LOG_ERROR,
              "osm_physp_replace_dr_path_with_alternate_dr_path: ERR 4105: "
diff --git a/opensm/opensm/osm_port_info_rcv.c b/opensm/opensm/osm_port_info_rcv.c
index c41f984..7d42297 100644
--- a/opensm/opensm/osm_port_info_rcv.c
+++ b/opensm/opensm/osm_port_info_rcv.c
@@ -627,7 +627,6 @@ osm_pi_rcv_process(
 {
   osm_pi_rcv_t *p_rcv = context;
   osm_madw_t *p_madw = data;
-  cl_qmap_t *p_guid_tbl;
   ib_port_info_t *p_pi;
   ib_smp_t *p_smp;
   osm_port_t *p_port;
@@ -689,11 +688,9 @@ osm_pi_rcv_process(
     goto Exit;
   }
   
-  p_guid_tbl = &p_rcv->p_subn->port_guid_tbl;
   CL_PLOCK_EXCL_ACQUIRE( p_rcv->p_lock );
-  p_port = (osm_port_t*)cl_qmap_get( p_guid_tbl, port_guid );
-
-  if( p_port == (osm_port_t*)cl_qmap_end( p_guid_tbl) )
+  p_port = osm_get_port_by_guid(p_rcv->p_subn, port_guid);
+  if (!p_port)
   {
     CL_PLOCK_RELEASE( p_rcv->p_lock );
     osm_log( p_rcv->p_log, OSM_LOG_ERROR,
diff --git a/opensm/opensm/osm_prtn.c b/opensm/opensm/osm_prtn.c
index 027a5a4..ebf5889 100644
--- a/opensm/opensm/osm_prtn.c
+++ b/opensm/opensm/osm_prtn.c
@@ -105,14 +105,13 @@ void osm_prtn_delete(
 ib_api_status_t osm_prtn_add_port(osm_log_t *p_log, osm_subn_t *p_subn,
 				  osm_prtn_t *p, ib_net64_t guid, boolean_t full)
 {
-	cl_qmap_t *p_port_tbl = &p_subn->port_guid_tbl;
 	ib_api_status_t status = IB_SUCCESS;
 	cl_map_t *p_tbl;
 	osm_port_t *p_port;
 	osm_physp_t *p_physp;
 
-	p_port = (osm_port_t *)cl_qmap_get(p_port_tbl, guid);
-	if (!p_port || p_port == (osm_port_t *)cl_qmap_end(p_port_tbl)) {
+	p_port = osm_get_port_by_guid(p_subn, guid);
+	if (!p_port) {
 		osm_log(p_log, OSM_LOG_VERBOSE, "osm_prtn_add_port: "
 			"port 0x%" PRIx64 " not found\n",
 			cl_ntoh64(guid));
diff --git a/opensm/opensm/osm_sa_lft_record.c b/opensm/opensm/osm_sa_lft_record.c
index c5cd9ca..4943632 100644
--- a/opensm/opensm/osm_sa_lft_record.c
+++ b/opensm/opensm/osm_sa_lft_record.c
@@ -194,9 +194,8 @@ __osm_lftr_get_port_by_guid(
 
   CL_PLOCK_ACQUIRE(p_rcv->p_lock);
 
-  p_port = (osm_port_t *)cl_qmap_get(&p_rcv->p_subn->port_guid_tbl,
-                                     port_guid);
-  if (p_port == (osm_port_t *)cl_qmap_end(&p_rcv->p_subn->port_guid_tbl))
+  p_port = osm_get_port_by_guid(p_rcv->p_subn, port_guid);
+  if (!p_port)
   {
     osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
              "__osm_lftr_get_port_by_guid ERR 4404: "
diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c
index 90fe103..82aa0db 100644
--- a/opensm/opensm/osm_sa_mcmember_record.c
+++ b/opensm/opensm/osm_sa_mcmember_record.c
@@ -1554,10 +1554,8 @@ __osm_mcmr_rcv_join_mgrp(
   CL_PLOCK_EXCL_ACQUIRE(p_rcv->p_lock);
 
   /* make sure the requested port guid is known to the SM */
-  p_port = (osm_port_t *)cl_qmap_get(&p_rcv->p_subn->port_guid_tbl,
-                                     portguid);
-
-  if (p_port == (osm_port_t *)cl_qmap_end(&p_rcv->p_subn->port_guid_tbl))
+  p_port = osm_get_port_by_guid(p_rcv->p_subn, portguid);
+  if (!p_port)
   {
     CL_PLOCK_RELEASE( p_rcv->p_lock );
 
diff --git a/opensm/opensm/osm_sa_mft_record.c b/opensm/opensm/osm_sa_mft_record.c
index 7908583..c70cd65 100644
--- a/opensm/opensm/osm_sa_mft_record.c
+++ b/opensm/opensm/osm_sa_mft_record.c
@@ -198,15 +198,13 @@ __osm_mftr_get_port_by_guid(
 
   CL_PLOCK_ACQUIRE(p_rcv->p_lock);
 
-  p_port = (osm_port_t *)cl_qmap_get(&p_rcv->p_subn->port_guid_tbl,
-                                     port_guid);
-  if (p_port == (osm_port_t *)cl_qmap_end(&p_rcv->p_subn->port_guid_tbl))
+  p_port = osm_get_port_by_guid(p_rcv->p_subn, port_guid);
+  if (!p_port)
   {
     osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
              "__osm_mftr_get_port_by_guid ERR 4A04: "
              "Invalid port GUID 0x%016" PRIx64 "\n",
              port_guid );
-    p_port = NULL;
   }
 
   CL_PLOCK_RELEASE(p_rcv->p_lock);
diff --git a/opensm/opensm/osm_sa_multipath_record.c b/opensm/opensm/osm_sa_multipath_record.c
index 06640d9..27b840d 100644
--- a/opensm/opensm/osm_sa_multipath_record.c
+++ b/opensm/opensm/osm_sa_multipath_record.c
@@ -1195,10 +1195,9 @@ __osm_mpr_rcv_get_gids(
       }
     }
 
-    p_port = (osm_port_t *)cl_qmap_get( &p_rcv->p_subn->port_guid_tbl,
-					 gids->unicast.interface_id );
-    if ( !p_port ||
-         p_port == (osm_port_t *)cl_qmap_end( &p_rcv->p_subn->port_guid_tbl ) ) {
+    p_port = osm_get_port_by_guid(p_rcv->p_subn, gids->unicast.interface_id);
+    if ( !p_port )
+    {
       /*
         This 'error' is the client's fault (bad gid) so
         don't enter it as an error in our own log.
diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c
index 47d9c33..56be25f 100644
--- a/opensm/opensm/osm_sa_path_record.c
+++ b/opensm/opensm/osm_sa_path_record.c
@@ -1214,12 +1214,9 @@ __osm_pr_rcv_get_end_points(
       }
     }
 
-    *pp_src_port = (osm_port_t*)cl_qmap_get(
-      &p_rcv->p_subn->port_guid_tbl,
-      p_pr->sgid.unicast.interface_id );
-
-    if( *pp_src_port == (osm_port_t*)cl_qmap_end(
-          &p_rcv->p_subn->port_guid_tbl ) )
+    *pp_src_port = osm_get_port_by_guid(p_rcv->p_subn,
+                                        p_pr->sgid.unicast.interface_id );
+    if( !*pp_src_port )
     {
       /*
         This 'error' is the client's fault (bad gid) so
@@ -1304,12 +1301,8 @@ __osm_pr_rcv_get_end_points(
       }
     }
 
-    *pp_dest_port = (osm_port_t*)cl_qmap_get(
-      &p_rcv->p_subn->port_guid_tbl,
-      dest_guid );
-
-    if( *pp_dest_port == (osm_port_t*)cl_qmap_end(
-          &p_rcv->p_subn->port_guid_tbl ) )
+    *pp_dest_port = osm_get_port_by_guid(p_rcv->p_subn, dest_guid);
+    if( !*pp_dest_port )
     {
       /*
         This 'error' is the client's fault (bad gid) so
diff --git a/opensm/opensm/osm_sa_service_record.c b/opensm/opensm/osm_sa_service_record.c
index c0f1057..3f32bd5 100644
--- a/opensm/opensm/osm_sa_service_record.c
+++ b/opensm/opensm/osm_sa_service_record.c
@@ -200,8 +200,8 @@ __match_service_pkey_with_ports_pkey(
     if((comp_mask & IB_SR_COMPMASK_SGID) == IB_SR_COMPMASK_SGID)
     {
       service_guid = p_service_rec->service_gid.unicast.interface_id;
-      service_port = (osm_port_t*)cl_qmap_get( &p_rcv->p_subn->port_guid_tbl, service_guid );
-      if (service_port == (osm_port_t*)cl_qmap_end( &p_rcv->p_subn->port_guid_tbl ))
+      service_port = osm_get_port_by_guid(p_rcv->p_subn, service_guid);
+      if (!service_port)
       {
         osm_log( p_rcv->p_log, OSM_LOG_ERROR,
                  "__match_service_pkey_with_ports_pkey: ERR 2405: "
diff --git a/opensm/opensm/osm_sa_sw_info_record.c b/opensm/opensm/osm_sa_sw_info_record.c
index 94b1ff9..129eeff 100644
--- a/opensm/opensm/osm_sa_sw_info_record.c
+++ b/opensm/opensm/osm_sa_sw_info_record.c
@@ -187,9 +187,8 @@ __osm_sir_get_port_by_guid(
 
   CL_PLOCK_ACQUIRE(p_rcv->p_lock);
 
-  p_port = (osm_port_t *)cl_qmap_get(&p_rcv->p_subn->port_guid_tbl,
-                                     port_guid);
-  if (p_port == (osm_port_t *)cl_qmap_end(&p_rcv->p_subn->port_guid_tbl))
+  p_port = osm_get_port_by_guid(p_rcv->p_subn, port_guid);
+  if (!p_port)
   {
     osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
              "__osm_sir_get_port_by_guid ERR 5309: "
diff --git a/opensm/opensm/osm_slvl_map_rcv.c b/opensm/opensm/osm_slvl_map_rcv.c
index 3352627..d601456 100644
--- a/opensm/opensm/osm_slvl_map_rcv.c
+++ b/opensm/opensm/osm_slvl_map_rcv.c
@@ -126,7 +126,6 @@ osm_slvl_rcv_process(
 {
   osm_slvl_rcv_t *p_rcv = context;
   osm_madw_t *p_madw = p_data;
-  cl_qmap_t *p_guid_tbl;
   ib_slvl_table_t *p_slvl_tbl;
   ib_smp_t *p_smp;
   osm_port_t *p_port;
@@ -152,11 +151,10 @@ osm_slvl_rcv_process(
 
   CL_ASSERT( p_smp->attr_id == IB_MAD_ATTR_SLVL_TABLE );
 
-  p_guid_tbl = &p_rcv->p_subn->port_guid_tbl;
   cl_plock_excl_acquire( p_rcv->p_lock );
-  p_port = (osm_port_t*)cl_qmap_get( p_guid_tbl, port_guid );
+  p_port = osm_get_port_by_guid( p_rcv->p_subn, port_guid );
 
-  if( p_port == (osm_port_t*)cl_qmap_end( p_guid_tbl) )
+  if( !p_port )
   {
     cl_plock_release( p_rcv->p_lock );
     osm_log( p_rcv->p_log, OSM_LOG_ERROR,
diff --git a/opensm/opensm/osm_sm.c b/opensm/opensm/osm_sm.c
index dfe01a4..57851e6 100644
--- a/opensm/opensm/osm_sm.c
+++ b/opensm/opensm/osm_sm.c
@@ -637,10 +637,8 @@ osm_sm_mcgrp_join(
     * Acquire the port object for the port joining this group.
     */
    CL_PLOCK_EXCL_ACQUIRE( p_sm->p_lock );
-   p_port = ( osm_port_t * ) cl_qmap_get( &p_sm->p_subn->port_guid_tbl,
-                                          port_guid );
-   if( p_port ==
-       ( osm_port_t * ) cl_qmap_end( &p_sm->p_subn->port_guid_tbl ) )
+   p_port = osm_get_port_by_guid( p_sm->p_subn, port_guid );
+   if( !p_port )
    {
       CL_PLOCK_RELEASE( p_sm->p_lock );
       osm_log( p_sm->p_log, OSM_LOG_ERROR,
@@ -761,10 +759,8 @@ osm_sm_mcgrp_leave(
     */
    /* note: p_sm->p_lock is locked by caller, but will be released later
       this function */
-   p_port = ( osm_port_t * ) cl_qmap_get( &p_sm->p_subn->port_guid_tbl,
-                                          port_guid );
-   if( p_port ==
-       ( osm_port_t * ) cl_qmap_end( &p_sm->p_subn->port_guid_tbl ) )
+   p_port = osm_get_port_by_guid( p_sm->p_subn, port_guid );
+   if( !p_port )
    {
       CL_PLOCK_RELEASE( p_sm->p_lock );
       osm_log( p_sm->p_log, OSM_LOG_ERROR,
diff --git a/opensm/opensm/osm_sm_state_mgr.c b/opensm/opensm/osm_sm_state_mgr.c
index ccfb8b0..a39ba4c 100644
--- a/opensm/opensm/osm_sm_state_mgr.c
+++ b/opensm/opensm/osm_sm_state_mgr.c
@@ -168,10 +168,8 @@ __osm_sm_state_mgr_send_local_port_info_req(
     * update the master_sm_base_lid of the subnet.
     */
    memset( &context, 0, sizeof( context ) );
-   p_port = ( osm_port_t * ) cl_qmap_get( &p_sm_mgr->p_subn->port_guid_tbl,
-                                          port_guid );
-   if( p_port ==
-       ( osm_port_t * ) cl_qmap_end( &p_sm_mgr->p_subn->port_guid_tbl ) )
+   p_port = osm_get_port_by_guid(p_sm_mgr->p_subn, port_guid );
+   if( !p_port )
    {
       osm_log( p_sm_mgr->p_log, OSM_LOG_ERROR,
                "__osm_sm_state_mgr_send_local_port_info_req: ERR 3205: "
@@ -231,13 +229,7 @@ __osm_sm_state_mgr_send_master_sm_info_req(
        * SM (according to master_guid)
        * Send a query of SubnGet(SMInfo) to the subn master_sm_base_lid object.
        */
-      p_port = ( osm_port_t * ) cl_qmap_get( &p_sm_mgr->p_subn->port_guid_tbl,
-                                             p_sm_mgr->master_guid );
-      if( p_port ==
-           ( osm_port_t * ) cl_qmap_end( &p_sm_mgr->p_subn->port_guid_tbl ) )
-      {
-        p_port = NULL;
-      }
+      p_port = osm_get_port_by_guid(p_sm_mgr->p_subn, p_sm_mgr->master_guid);
    }
    else
    {
diff --git a/opensm/opensm/osm_sminfo_rcv.c b/opensm/opensm/osm_sminfo_rcv.c
index 2be56a5..1489aa3 100644
--- a/opensm/opensm/osm_sminfo_rcv.c
+++ b/opensm/opensm/osm_sminfo_rcv.c
@@ -562,7 +562,6 @@ __osm_sminfo_rcv_process_get_response(
   const ib_smp_t*          p_smp;
   const ib_sm_info_t*      p_smi;
   cl_qmap_t*               p_sm_tbl;
-  cl_qmap_t*               p_port_tbl;
   osm_port_t*              p_port;
   ib_net64_t               port_guid;
   osm_remote_sm_t*         p_sm;
@@ -585,7 +584,6 @@ __osm_sminfo_rcv_process_get_response(
 
   p_smi = ib_smp_get_payload_ptr( p_smp );
   p_sm_tbl = &p_rcv->p_subn->sm_guid_tbl;
-  p_port_tbl = &p_rcv->p_subn->port_guid_tbl;
   port_guid = p_smi->guid;
 
   osm_dump_sm_info( p_rcv->p_log, p_smi, OSM_LOG_DEBUG );
@@ -611,8 +609,8 @@ __osm_sminfo_rcv_process_get_response(
   */
   CL_PLOCK_EXCL_ACQUIRE( p_rcv->p_lock );
 
-  p_port = (osm_port_t*)cl_qmap_get( p_port_tbl, port_guid );
-  if( p_port == (osm_port_t*)cl_qmap_end( p_port_tbl ) )
+  p_port = osm_get_port_by_guid( p_rcv->p_subn, port_guid );
+  if( !p_port )
   {
     osm_log( p_rcv->p_log, OSM_LOG_ERROR,
              "__osm_sminfo_rcv_process_get_response: ERR 2F12: "
diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index 7cf9d20..43317e5 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -811,7 +811,6 @@ __osm_state_mgr_is_sm_port_down(
    ib_net64_t port_guid;
    osm_port_t *p_port;
    osm_physp_t *p_physp;
-   cl_qmap_t *p_tbl;
    uint8_t state;
 
    OSM_LOG_ENTER( p_mgr->p_log, __osm_state_mgr_is_sm_port_down );
@@ -830,13 +829,11 @@ __osm_state_mgr_is_sm_port_down(
       goto Exit;
    }
 
-   p_tbl = &p_mgr->p_subn->port_guid_tbl;
-
    CL_ASSERT( port_guid );
 
    CL_PLOCK_ACQUIRE( p_mgr->p_lock );
-   p_port = ( osm_port_t * ) cl_qmap_get( p_tbl, port_guid );
-   if( p_port == ( osm_port_t * ) cl_qmap_end( p_tbl ) )
+   p_port = osm_get_port_by_guid( p_mgr->p_subn, port_guid );
+   if( !p_port )
    {
       osm_log( p_mgr->p_log, OSM_LOG_ERROR,
                "__osm_state_mgr_is_sm_port_down: ERR 3309: "
@@ -879,7 +876,6 @@ __osm_state_mgr_sweep_hop_1(
    osm_dr_path_t hop_1_path;
    ib_net64_t port_guid;
    uint8_t port_num;
-   cl_qmap_t *p_port_tbl;
    uint8_t path_array[IB_SUBNET_PATH_HOPS_MAX];
    uint8_t num_ports;
    osm_physp_t *p_ext_physp;
@@ -889,7 +885,6 @@ __osm_state_mgr_sweep_hop_1(
    /*
     * First, get our own port and node objects.
     */
-   p_port_tbl = &p_mgr->p_subn->port_guid_tbl;
    port_guid = p_mgr->p_subn->sm_port_guid;
 
    CL_ASSERT( port_guid );
@@ -902,8 +897,8 @@ __osm_state_mgr_sweep_hop_1(
     * continue through the switch. */
    p_mgr->p_subn->in_sweep_hop_0 = FALSE;
 
-   p_port = ( osm_port_t * ) cl_qmap_get( p_port_tbl, port_guid );
-   if( p_port == ( osm_port_t * ) cl_qmap_end( p_port_tbl ) )
+   p_port = osm_get_port_by_guid( p_mgr->p_subn, port_guid );
+   if( !p_port )
    {
       osm_log( p_mgr->p_log, OSM_LOG_ERROR,
                "__osm_state_mgr_sweep_hop_1: ERR 3310: "
diff --git a/opensm/opensm/osm_sw_info_rcv.c b/opensm/opensm/osm_sw_info_rcv.c
index 0043ac4..563c126 100644
--- a/opensm/opensm/osm_sw_info_rcv.c
+++ b/opensm/opensm/osm_sw_info_rcv.c
@@ -586,7 +586,6 @@ osm_si_rcv_process(
 {
   osm_si_rcv_t *p_rcv = context;
   osm_madw_t *p_madw = data;
-  cl_qmap_t *p_node_guid_tbl;
   ib_switch_info_t *p_si;
   ib_smp_t *p_smp;
   osm_node_t *p_node;
@@ -599,7 +598,6 @@ osm_si_rcv_process(
 
   CL_ASSERT( p_madw );
 
-  p_node_guid_tbl = &p_rcv->p_subn->node_guid_tbl;
   p_smp = osm_madw_get_smp_ptr( p_madw );
   p_si = (ib_switch_info_t*)ib_smp_get_payload_ptr( p_smp );
 
@@ -623,8 +621,8 @@ osm_si_rcv_process(
 
   CL_PLOCK_EXCL_ACQUIRE( p_rcv->p_lock );
 
-  p_node = (osm_node_t*)cl_qmap_get( p_node_guid_tbl, node_guid );
-  if( p_node == (osm_node_t*)cl_qmap_end( p_node_guid_tbl ) )
+  p_node = osm_get_node_by_guid(p_rcv->p_subn, node_guid);
+  if( !p_node )
   {
     osm_log( p_rcv->p_log, OSM_LOG_ERROR,
              "osm_si_rcv_process: ERR 3606: "
diff --git a/opensm/opensm/osm_ucast_file.c b/opensm/opensm/osm_ucast_file.c
index 5d9ba01..97be7ea 100644
--- a/opensm/opensm/osm_ucast_file.c
+++ b/opensm/opensm/osm_ucast_file.c
@@ -63,9 +63,8 @@ static uint16_t remap_lid(osm_opensm_t *p_osm, uint16_t lid, ib_net64_t guid)
 	uint16_t min_lid, max_lid;
 	uint8_t lmc;
 
-	p_port = (osm_port_t *)cl_qmap_get(&p_osm->subn.port_guid_tbl, guid);
-	if (!p_port ||
-	    p_port == (osm_port_t *)cl_qmap_end(&p_osm->subn.port_guid_tbl)) {
+	p_port = osm_get_port_by_guid(&p_osm->subn, guid);
+	if (!p_port) {
 		osm_log(&p_osm->log, OSM_LOG_VERBOSE,
 			"remap_lid: cannot find port guid 0x%016" PRIx64
 			" , will use the same lid\n", cl_ntoh64(guid));
diff --git a/opensm/opensm/osm_vl_arb_rcv.c b/opensm/opensm/osm_vl_arb_rcv.c
index f36751e..95f7e7d 100644
--- a/opensm/opensm/osm_vl_arb_rcv.c
+++ b/opensm/opensm/osm_vl_arb_rcv.c
@@ -126,7 +126,6 @@ osm_vla_rcv_process(
 {
   osm_vla_rcv_t *p_rcv = context;
   osm_madw_t *p_madw = data;
-  cl_qmap_t *p_guid_tbl;
   ib_vl_arb_table_t *p_vla_tbl;
   ib_smp_t *p_smp;
   osm_port_t *p_port;
@@ -153,11 +152,9 @@ osm_vla_rcv_process(
 
   CL_ASSERT( p_smp->attr_id == IB_MAD_ATTR_VL_ARBITRATION );
 
-  p_guid_tbl = &p_rcv->p_subn->port_guid_tbl;
   cl_plock_excl_acquire( p_rcv->p_lock );
-  p_port = (osm_port_t*)cl_qmap_get( p_guid_tbl, port_guid );
-
-  if( p_port == (osm_port_t*)cl_qmap_end( p_guid_tbl ) )
+  p_port = osm_get_port_by_guid( p_rcv->p_subn, port_guid );
+  if( !p_port )
   {
     cl_plock_release( p_rcv->p_lock );
     osm_log( p_rcv->p_log, OSM_LOG_ERROR,
-- 
1.5.2.2.603.g7c851


From mst at dev.mellanox.co.il  Sat Jun 30 15:05:30 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 1 Jul 2007 01:05:30 +0300
Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <adahcouv2mi.fsf@cisco.com>
References: <20070625130604.GH15343@mellanox.co.il> <aday7i7wye1.fsf@cisco.com>
	<20070626070641.GM15343@mellanox.co.il> <adahcouv2mi.fsf@cisco.com>
Message-ID: <20070630220530.GB7554@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH RFC] sharing userspace IB objects
> 
>  > This is not directly related to SRC: this is an effort
>  > to make it possible to share QPs, CQ etc across processes
>  > in the same way as they can be currently shared across threads.
>  > So assuming that we want multiple processes to post to
>  > the same QP, how can we support this?
> 
> This looks like a lot of work for an unknown gain.  Who is going to
> really use this?  ie is it worth the trouble?

I think Dror is the best person to answer this.
Dror, could you please explain the need for shared send queue?

-- 
MST


From mst at dev.mellanox.co.il  Sat Jun 30 15:06:57 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 1 Jul 2007 01:06:57 +0300
Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <adahcouv2mi.fsf@cisco.com>
References: <20070625130604.GH15343@mellanox.co.il> <aday7i7wye1.fsf@cisco.com>
	<20070626070641.GM15343@mellanox.co.il> <adahcouv2mi.fsf@cisco.com>
Message-ID: <20070630220657.GC7554@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH RFC] sharing userspace IB objects
> 
>  > This is not directly related to SRC: this is an effort
>  > to make it possible to share QPs, CQ etc across processes
>  > in the same way as they can be currently shared across threads.
>  > So assuming that we want multiple processes to post to
>  > the same QP, how can we support this?
> 
> This looks like a lot of work for an unknown gain.  Who is going to
> really use this?  ie is it worth the trouble?

I think Dror is the best person to answer this.
Dror, could you please explain the need for shared send queue?

-- 
MST


From mst at dev.mellanox.co.il  Sat Jun 30 15:08:01 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 1 Jul 2007 01:08:01 +0300
Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <532b813a0706271628s70e17b6cv70b81fdedc442743@mail.gmail.com>
References: <20070625130604.GH15343@mellanox.co.il> <aday7i7wye1.fsf@cisco.com>
	<20070626070641.GM15343@mellanox.co.il> <adahcouv2mi.fsf@cisco.com>
	<532b813a0706271628s70e17b6cv70b81fdedc442743@mail.gmail.com>
Message-ID: <20070630220801.GD7554@mellanox.co.il>

> Shouldn't the protocol to create and destroy and pass the various
> IB objects around be decided by the specific application rather than
> the library trying to solve this problem?

Yes, I agree.

-- 
MST


From mst at dev.mellanox.co.il  Sat Jun 30 15:24:19 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 1 Jul 2007 01:24:19 +0300
Subject: [ofa-general] Re: [PATCH RFC] sharing userspace IB objects
In-Reply-To: <adahcouv2mi.fsf@cisco.com>
References: <20070625130604.GH15343@mellanox.co.il> <aday7i7wye1.fsf@cisco.com>
	<20070626070641.GM15343@mellanox.co.il> <adahcouv2mi.fsf@cisco.com>
Message-ID: <20070630222419.GE7554@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH RFC] sharing userspace IB objects
> 
>  > This is not directly related to SRC: this is an effort
>  > to make it possible to share QPs, CQ etc across processes
>  > in the same way as they can be currently shared across threads.
>  > So assuming that we want multiple processes to post to
>  > the same QP, how can we support this?
> 
> This looks like a lot of work for an unknown gain.  Who is going to
> really use this?  ie is it worth the trouble?

It's a valid question. But let's discuss this separately.
Below are my ideas about the implementation questions that you raise.

>  > >  - Given that everything shared is in shared memory,
>  > 
>  > I think we should try and keep shared memory usage to minimum.
>  > For example, in mthca mr object just needs a key: we could
>  > keep it in non-shared memory, just pass the key around
>  > and save on sahred memory usage.
> 
> This comment made me realize there are a few more problems here.  What
> happens if I do ibv_reg_mr() in one process, pass the MR to another
> process, and then do ibv_dereg_mr() in the second process?

Generally, I think it would be nice if this could work
in the same way as with multiple threads: a single process does
destroy, the rest must not use the same object after this,
synchronisation it up to the app.

But you made me realise that we need an API for non-controlling processes to
release the userspace resources without destroying the kernel-level object.

> What about
> if someone registers a region in shared memory -- are there any
> fork/copy-on-write issues with that?

This can be done already, can't it?

> I think there are probably bugs
> in the locked_vm accounting in the kernel right now -- it doesn't take
> into account the possibility of passing context fds from one process
> to another.

Hmm, might be a good idea to fix the bugs anyway, no?

> In general what do you think the rules for destroying objects should
> be?  What if process A creates a QP, passes it to process B, and then
> process A dies?  Should the QP still be usable?

Yes, I think it should - we get this for free since file won't
be closed until both die, right?

> Should process B be
> able to destroy it?  What if process A is still alive -- should
> process B be able to destroy the QP?

I think in practice a single process will do this.
My approach generally is: let's have same rules as for multiple threads.

>  > We need to share file descriptors too. Is there a way to pass these
>  > around besides unix domain sockets?
> 
> I guess we need this to be able to re-mmap doorbell pages etc, right?
> I wonder if there's a better way around that... maybe extending the
> kernel interface so that unrelated processes can share a context, eg
> by putting contexts in a filesystem or something like that.

Hmm, I don't have principal objection, however this would mean
we'd have to change kernel-user interface again. the proposed
API extensions can mostly be done in userspace only.

And it seems to me like much more work that just let the app
use unix domain sockets, for me. What are the advantages of this approach?

Further, since there is already an existing kernel interface for this,
should we be inventing our own?

-- 
MST


From mst at dev.mellanox.co.il  Sat Jun 30 23:09:54 2007
From: mst at dev.mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 1 Jul 2007 09:09:54 +0300
Subject: [ofa-general] Re: [GIT PULL] please pull rdma-dev.git for 2.6.23
In-Reply-To: <000801c7b9e2$03dfe220$3c98070a@amr.corp.intel.com>
References: <000801c7b9e2$03dfe220$3c98070a@amr.corp.intel.com>
Message-ID: <20070701060953.GG7554@mellanox.co.il>

>       ib/cm: include HCA ACK delay in local ACK timeout

I have not seen this and archive search does not give me anything

>       IB/sa: Add InformInfo/Notice support.
>       IB/sa: Add local SA path record caching.

> All patches have been previously posted except for the last, which is a
> one line change.

There were several bugs in the local SA patches that you posted originally,
and SA cache was enabled by default which we decided was not a good idea.


Could the latest revision of the patches to be pulled be posted
to list please?

-- 
MST